public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] tree-optimization/114589 - remove profile based sink heuristics
@ 2024-05-15  9:38 Richard Biener
  2024-05-18  3:27 ` Hans-Peter Nilsson
  2024-06-14 13:08 ` Li, Pan2
  0 siblings, 2 replies; 5+ messages in thread
From: Richard Biener @ 2024-05-15  9:38 UTC (permalink / raw)
  To: gcc-patches

The following removes the profile based heuristic limiting sinking
and instead uses post-dominators to avoid sinking to places that
are executed under the same conditions as the earlier location which
the profile based heuristic should have guaranteed as well.

To avoid regressing this moves the empty-latch check to cover all
sink cases.

It also stream-lines the resulting select_best_block a bit but avoids
adjusting heuristics more with this change.  gfortran.dg/streamio_9.f90
starts execute failing with this on x86_64 with -m32 because the
(float)i * 9.9999...e-7 compute is sunk across a STOP causing it
to be no longer spilled and thus the compare failing due to excess
precision.  The patch adds -ffloat-store to avoid this, following
other similar testcases.

This change doesn't fix the testcase in the PR on itself.

Bootstrapped on x86_64-unknown-linux-gnu, re-testing in progress.

	PR tree-optimization/114589
	* tree-ssa-sink.cc (select_best_block): Remove profile-based
	heuristics.  Instead reject sink locations that sink
        to post-dominators.  Move empty latch check here from
	statement_sink_location.  Also consider early_bb for the
	loop depth check.
	(statement_sink_location): Remove superfluous check.  Remove
	empty latch check.
	(pass_sink_code::execute): Compute/release post-dominators.

	* gfortran.dg/streamio_9.f90: Use -ffloat-store to avoid
	excess precision when not spilling.
---
 gcc/testsuite/gfortran.dg/streamio_9.f90 |  1 +
 gcc/tree-ssa-sink.cc                     | 62 ++++++++----------------
 2 files changed, 20 insertions(+), 43 deletions(-)

diff --git a/gcc/testsuite/gfortran.dg/streamio_9.f90 b/gcc/testsuite/gfortran.dg/streamio_9.f90
index b6bddb973f8..f29ded6ba54 100644
--- a/gcc/testsuite/gfortran.dg/streamio_9.f90
+++ b/gcc/testsuite/gfortran.dg/streamio_9.f90
@@ -1,4 +1,5 @@
 ! { dg-do run }
+! { dg-options "-ffloat-store" }
 ! PR29053 Stream IO test 9.
 ! Contributed by Jerry DeLisle <jvdelisle@gcc.gnu.org>.
 ! Test case derived from that given in PR by Steve Kargl.
diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
index 2f90acb7ef4..2188b7523c7 100644
--- a/gcc/tree-ssa-sink.cc
+++ b/gcc/tree-ssa-sink.cc
@@ -178,15 +178,7 @@ nearest_common_dominator_of_uses (def_operand_p def_p, bool *debug_stmts)
 
    We want the most control dependent block in the shallowest loop nest.
 
-   If the resulting block is in a shallower loop nest, then use it.  Else
-   only use the resulting block if it has significantly lower execution
-   frequency than EARLY_BB to avoid gratuitous statement movement.  We
-   consider statements with VOPS more desirable to move.
-
-   This pass would obviously benefit from PDO as it utilizes block
-   frequencies.  It would also benefit from recomputing frequencies
-   if profile data is not available since frequencies often get out
-   of sync with reality.  */
+   If the resulting block is in a shallower loop nest, then use it.  */
 
 static basic_block
 select_best_block (basic_block early_bb,
@@ -195,18 +187,17 @@ select_best_block (basic_block early_bb,
 {
   basic_block best_bb = late_bb;
   basic_block temp_bb = late_bb;
-  int threshold;
 
   while (temp_bb != early_bb)
     {
+      /* Walk up the dominator tree, hopefully we'll find a shallower
+	 loop nest.  */
+      temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
+
       /* If we've moved into a lower loop nest, then that becomes
 	 our best block.  */
       if (bb_loop_depth (temp_bb) < bb_loop_depth (best_bb))
 	best_bb = temp_bb;
-
-      /* Walk up the dominator tree, hopefully we'll find a shallower
- 	 loop nest.  */
-      temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
     }
 
   /* Placing a statement before a setjmp-like function would be invalid
@@ -221,6 +212,16 @@ select_best_block (basic_block early_bb,
   if (bb_loop_depth (best_bb) < bb_loop_depth (early_bb))
     return best_bb;
 
+  /* Do not move stmts to post-dominating places on the same loop depth.  */
+  if (dominated_by_p (CDI_POST_DOMINATORS, early_bb, best_bb))
+    return early_bb;
+
+  /* If the latch block is empty, don't make it non-empty by sinking
+     something into it.  */
+  if (best_bb == early_bb->loop_father->latch
+      && empty_block_p (best_bb))
+    return early_bb;
+
   /* Avoid turning an unconditional read into a conditional one when we
      still might want to perform vectorization.  */
   if (best_bb->loop_father == early_bb->loop_father
@@ -233,28 +234,7 @@ select_best_block (basic_block early_bb,
       && !dominated_by_p (CDI_DOMINATORS, best_bb->loop_father->latch, best_bb))
     return early_bb;
 
-  /* Get the sinking threshold.  If the statement to be moved has memory
-     operands, then increase the threshold by 7% as those are even more
-     profitable to avoid, clamping at 100%.  */
-  threshold = param_sink_frequency_threshold;
-  if (gimple_vuse (stmt) || gimple_vdef (stmt))
-    {
-      threshold += 7;
-      if (threshold > 100)
-	threshold = 100;
-    }
-
-  /* If BEST_BB is at the same nesting level, then require it to have
-     significantly lower execution frequency to avoid gratuitous movement.  */
-  if (bb_loop_depth (best_bb) == bb_loop_depth (early_bb)
-      /* If result of comparsion is unknown, prefer EARLY_BB.
-	 Thus use !(...>=..) rather than (...<...)  */
-      && !(best_bb->count * 100 >= early_bb->count * threshold))
-    return best_bb;
-
-  /* No better block found, so return EARLY_BB, which happens to be the
-     statement's original block.  */
-  return early_bb;
+  return best_bb;
 }
 
 /* Given a statement (STMT) and the basic block it is currently in (FROMBB),
@@ -452,13 +432,7 @@ statement_sink_location (gimple *stmt, basic_block frombb,
     return false;
   
   sinkbb = select_best_block (frombb, sinkbb, stmt);
-  if (!sinkbb || sinkbb == frombb)
-    return false;
-
-  /* If the latch block is empty, don't make it non-empty by sinking
-     something into it.  */
-  if (sinkbb == frombb->loop_father->latch
-      && empty_block_p (sinkbb))
+  if (sinkbb == frombb)
     return false;
 
   *togsi = gsi_after_labels (sinkbb);
@@ -822,6 +796,7 @@ pass_sink_code::execute (function *fun)
   mark_dfs_back_edges (fun);
   memset (&sink_stats, 0, sizeof (sink_stats));
   calculate_dominance_info (CDI_DOMINATORS);
+  calculate_dominance_info (CDI_POST_DOMINATORS);
 
   virtual_operand_live vop_live;
 
@@ -833,6 +808,7 @@ pass_sink_code::execute (function *fun)
 
   statistics_counter_event (fun, "Sunk statements", sink_stats.sunk);
   statistics_counter_event (fun, "Commoned stores", sink_stats.commoned);
+  free_dominance_info (CDI_POST_DOMINATORS);
   remove_fake_exit_edges ();
   loop_optimizer_finalize ();
 
-- 
2.35.3

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] tree-optimization/114589 - remove profile based sink heuristics
  2024-05-15  9:38 [PATCH] tree-optimization/114589 - remove profile based sink heuristics Richard Biener
@ 2024-05-18  3:27 ` Hans-Peter Nilsson
  2024-06-14 13:08 ` Li, Pan2
  1 sibling, 0 replies; 5+ messages in thread
From: Hans-Peter Nilsson @ 2024-05-18  3:27 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

> Date: Wed, 15 May 2024 11:38:58 +0200 (CEST)
> From: Richard Biener <rguenther@suse.de>

> The following removes the profile based heuristic limiting sinking
> and instead uses post-dominators to avoid sinking to places that
> are executed under the same conditions as the earlier location which
> the profile based heuristic should have guaranteed as well.
> 
> To avoid regressing this moves the empty-latch check to cover all
> sink cases.
> 
> It also stream-lines the resulting select_best_block a bit but avoids
> adjusting heuristics more with this change.  gfortran.dg/streamio_9.f90
> starts execute failing with this on x86_64 with -m32 because the
> (float)i * 9.9999...e-7 compute is sunk across a STOP causing it
> to be no longer spilled and thus the compare failing due to excess
> precision.  The patch adds -ffloat-store to avoid this, following
> other similar testcases.
> 
> This change doesn't fix the testcase in the PR on itself.

It may come as no surprise that this patch (commit
r15-518-g99b1daae18c095) caused a regression for some codes,
for some targets.

I entered PR115144 for the one that came to my attention.
TL;DR: not sure what to do about it, if anything; it
corresponds to the random_bitstring function in
gcc.c-torture/execute/arith-rand-ll.c compiled for cris-elf
with -O2 -march=v10 but there's no regression for coremark.
The random_bitstring code does intense operations on "long
long", i.e. lots of int64_t two-register operations and
arithmetic library calls.

I'd be grateful if you had a quick glance at
random_bitstring in gcc.c-torture/execute/arith-rand-ll.c in
case that code rings a bell related to the r15-518 change;
maybe there's a trivial improvement you see.

brgds, H-P

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH] tree-optimization/114589 - remove profile based sink heuristics
  2024-05-15  9:38 [PATCH] tree-optimization/114589 - remove profile based sink heuristics Richard Biener
  2024-05-18  3:27 ` Hans-Peter Nilsson
@ 2024-06-14 13:08 ` Li, Pan2
  2024-06-14 13:10   ` Richard Biener
  1 sibling, 1 reply; 5+ messages in thread
From: Li, Pan2 @ 2024-06-14 13:08 UTC (permalink / raw)
  To: Richard Biener, gcc-patches

Hi Richard,

Here is one PR related to this patch (by git bisect), details as below.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115458

I am still trying to narrow down which change caused this failure or any hints here?

Pan

-----Original Message-----
From: Richard Biener <rguenther@suse.de> 
Sent: Wednesday, May 15, 2024 5:39 PM
To: gcc-patches@gcc.gnu.org
Subject: [PATCH] tree-optimization/114589 - remove profile based sink heuristics

The following removes the profile based heuristic limiting sinking
and instead uses post-dominators to avoid sinking to places that
are executed under the same conditions as the earlier location which
the profile based heuristic should have guaranteed as well.

To avoid regressing this moves the empty-latch check to cover all
sink cases.

It also stream-lines the resulting select_best_block a bit but avoids
adjusting heuristics more with this change.  gfortran.dg/streamio_9.f90
starts execute failing with this on x86_64 with -m32 because the
(float)i * 9.9999...e-7 compute is sunk across a STOP causing it
to be no longer spilled and thus the compare failing due to excess
precision.  The patch adds -ffloat-store to avoid this, following
other similar testcases.

This change doesn't fix the testcase in the PR on itself.

Bootstrapped on x86_64-unknown-linux-gnu, re-testing in progress.

	PR tree-optimization/114589
	* tree-ssa-sink.cc (select_best_block): Remove profile-based
	heuristics.  Instead reject sink locations that sink
        to post-dominators.  Move empty latch check here from
	statement_sink_location.  Also consider early_bb for the
	loop depth check.
	(statement_sink_location): Remove superfluous check.  Remove
	empty latch check.
	(pass_sink_code::execute): Compute/release post-dominators.

	* gfortran.dg/streamio_9.f90: Use -ffloat-store to avoid
	excess precision when not spilling.
---
 gcc/testsuite/gfortran.dg/streamio_9.f90 |  1 +
 gcc/tree-ssa-sink.cc                     | 62 ++++++++----------------
 2 files changed, 20 insertions(+), 43 deletions(-)

diff --git a/gcc/testsuite/gfortran.dg/streamio_9.f90 b/gcc/testsuite/gfortran.dg/streamio_9.f90
index b6bddb973f8..f29ded6ba54 100644
--- a/gcc/testsuite/gfortran.dg/streamio_9.f90
+++ b/gcc/testsuite/gfortran.dg/streamio_9.f90
@@ -1,4 +1,5 @@
 ! { dg-do run }
+! { dg-options "-ffloat-store" }
 ! PR29053 Stream IO test 9.
 ! Contributed by Jerry DeLisle <jvdelisle@gcc.gnu.org>.
 ! Test case derived from that given in PR by Steve Kargl.
diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
index 2f90acb7ef4..2188b7523c7 100644
--- a/gcc/tree-ssa-sink.cc
+++ b/gcc/tree-ssa-sink.cc
@@ -178,15 +178,7 @@ nearest_common_dominator_of_uses (def_operand_p def_p, bool *debug_stmts)
 
    We want the most control dependent block in the shallowest loop nest.
 
-   If the resulting block is in a shallower loop nest, then use it.  Else
-   only use the resulting block if it has significantly lower execution
-   frequency than EARLY_BB to avoid gratuitous statement movement.  We
-   consider statements with VOPS more desirable to move.
-
-   This pass would obviously benefit from PDO as it utilizes block
-   frequencies.  It would also benefit from recomputing frequencies
-   if profile data is not available since frequencies often get out
-   of sync with reality.  */
+   If the resulting block is in a shallower loop nest, then use it.  */
 
 static basic_block
 select_best_block (basic_block early_bb,
@@ -195,18 +187,17 @@ select_best_block (basic_block early_bb,
 {
   basic_block best_bb = late_bb;
   basic_block temp_bb = late_bb;
-  int threshold;
 
   while (temp_bb != early_bb)
     {
+      /* Walk up the dominator tree, hopefully we'll find a shallower
+	 loop nest.  */
+      temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
+
       /* If we've moved into a lower loop nest, then that becomes
 	 our best block.  */
       if (bb_loop_depth (temp_bb) < bb_loop_depth (best_bb))
 	best_bb = temp_bb;
-
-      /* Walk up the dominator tree, hopefully we'll find a shallower
- 	 loop nest.  */
-      temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
     }
 
   /* Placing a statement before a setjmp-like function would be invalid
@@ -221,6 +212,16 @@ select_best_block (basic_block early_bb,
   if (bb_loop_depth (best_bb) < bb_loop_depth (early_bb))
     return best_bb;
 
+  /* Do not move stmts to post-dominating places on the same loop depth.  */
+  if (dominated_by_p (CDI_POST_DOMINATORS, early_bb, best_bb))
+    return early_bb;
+
+  /* If the latch block is empty, don't make it non-empty by sinking
+     something into it.  */
+  if (best_bb == early_bb->loop_father->latch
+      && empty_block_p (best_bb))
+    return early_bb;
+
   /* Avoid turning an unconditional read into a conditional one when we
      still might want to perform vectorization.  */
   if (best_bb->loop_father == early_bb->loop_father
@@ -233,28 +234,7 @@ select_best_block (basic_block early_bb,
       && !dominated_by_p (CDI_DOMINATORS, best_bb->loop_father->latch, best_bb))
     return early_bb;
 
-  /* Get the sinking threshold.  If the statement to be moved has memory
-     operands, then increase the threshold by 7% as those are even more
-     profitable to avoid, clamping at 100%.  */
-  threshold = param_sink_frequency_threshold;
-  if (gimple_vuse (stmt) || gimple_vdef (stmt))
-    {
-      threshold += 7;
-      if (threshold > 100)
-	threshold = 100;
-    }
-
-  /* If BEST_BB is at the same nesting level, then require it to have
-     significantly lower execution frequency to avoid gratuitous movement.  */
-  if (bb_loop_depth (best_bb) == bb_loop_depth (early_bb)
-      /* If result of comparsion is unknown, prefer EARLY_BB.
-	 Thus use !(...>=..) rather than (...<...)  */
-      && !(best_bb->count * 100 >= early_bb->count * threshold))
-    return best_bb;
-
-  /* No better block found, so return EARLY_BB, which happens to be the
-     statement's original block.  */
-  return early_bb;
+  return best_bb;
 }
 
 /* Given a statement (STMT) and the basic block it is currently in (FROMBB),
@@ -452,13 +432,7 @@ statement_sink_location (gimple *stmt, basic_block frombb,
     return false;
   
   sinkbb = select_best_block (frombb, sinkbb, stmt);
-  if (!sinkbb || sinkbb == frombb)
-    return false;
-
-  /* If the latch block is empty, don't make it non-empty by sinking
-     something into it.  */
-  if (sinkbb == frombb->loop_father->latch
-      && empty_block_p (sinkbb))
+  if (sinkbb == frombb)
     return false;
 
   *togsi = gsi_after_labels (sinkbb);
@@ -822,6 +796,7 @@ pass_sink_code::execute (function *fun)
   mark_dfs_back_edges (fun);
   memset (&sink_stats, 0, sizeof (sink_stats));
   calculate_dominance_info (CDI_DOMINATORS);
+  calculate_dominance_info (CDI_POST_DOMINATORS);
 
   virtual_operand_live vop_live;
 
@@ -833,6 +808,7 @@ pass_sink_code::execute (function *fun)
 
   statistics_counter_event (fun, "Sunk statements", sink_stats.sunk);
   statistics_counter_event (fun, "Commoned stores", sink_stats.commoned);
+  free_dominance_info (CDI_POST_DOMINATORS);
   remove_fake_exit_edges ();
   loop_optimizer_finalize ();
 
-- 
2.35.3

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH] tree-optimization/114589 - remove profile based sink heuristics
  2024-06-14 13:08 ` Li, Pan2
@ 2024-06-14 13:10   ` Richard Biener
  2024-06-14 13:23     ` Li, Pan2
  0 siblings, 1 reply; 5+ messages in thread
From: Richard Biener @ 2024-06-14 13:10 UTC (permalink / raw)
  To: Li, Pan2; +Cc: gcc-patches

On Fri, 14 Jun 2024, Li, Pan2 wrote:

> Hi Richard,
> 
> Here is one PR related to this patch (by git bisect), details as below.
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115458
> 
> I am still trying to narrow down which change caused this failure or any hints here?

It definitely looks like a latent issue being triggered.  Either in LRA
or in how the target presents itself.

Richard.

> Pan
> 
> -----Original Message-----
> From: Richard Biener <rguenther@suse.de> 
> Sent: Wednesday, May 15, 2024 5:39 PM
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH] tree-optimization/114589 - remove profile based sink heuristics
> 
> The following removes the profile based heuristic limiting sinking
> and instead uses post-dominators to avoid sinking to places that
> are executed under the same conditions as the earlier location which
> the profile based heuristic should have guaranteed as well.
> 
> To avoid regressing this moves the empty-latch check to cover all
> sink cases.
> 
> It also stream-lines the resulting select_best_block a bit but avoids
> adjusting heuristics more with this change.  gfortran.dg/streamio_9.f90
> starts execute failing with this on x86_64 with -m32 because the
> (float)i * 9.9999...e-7 compute is sunk across a STOP causing it
> to be no longer spilled and thus the compare failing due to excess
> precision.  The patch adds -ffloat-store to avoid this, following
> other similar testcases.
> 
> This change doesn't fix the testcase in the PR on itself.
> 
> Bootstrapped on x86_64-unknown-linux-gnu, re-testing in progress.
> 
> 	PR tree-optimization/114589
> 	* tree-ssa-sink.cc (select_best_block): Remove profile-based
> 	heuristics.  Instead reject sink locations that sink
>         to post-dominators.  Move empty latch check here from
> 	statement_sink_location.  Also consider early_bb for the
> 	loop depth check.
> 	(statement_sink_location): Remove superfluous check.  Remove
> 	empty latch check.
> 	(pass_sink_code::execute): Compute/release post-dominators.
> 
> 	* gfortran.dg/streamio_9.f90: Use -ffloat-store to avoid
> 	excess precision when not spilling.
> ---
>  gcc/testsuite/gfortran.dg/streamio_9.f90 |  1 +
>  gcc/tree-ssa-sink.cc                     | 62 ++++++++----------------
>  2 files changed, 20 insertions(+), 43 deletions(-)
> 
> diff --git a/gcc/testsuite/gfortran.dg/streamio_9.f90 b/gcc/testsuite/gfortran.dg/streamio_9.f90
> index b6bddb973f8..f29ded6ba54 100644
> --- a/gcc/testsuite/gfortran.dg/streamio_9.f90
> +++ b/gcc/testsuite/gfortran.dg/streamio_9.f90
> @@ -1,4 +1,5 @@
>  ! { dg-do run }
> +! { dg-options "-ffloat-store" }
>  ! PR29053 Stream IO test 9.
>  ! Contributed by Jerry DeLisle <jvdelisle@gcc.gnu.org>.
>  ! Test case derived from that given in PR by Steve Kargl.
> diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
> index 2f90acb7ef4..2188b7523c7 100644
> --- a/gcc/tree-ssa-sink.cc
> +++ b/gcc/tree-ssa-sink.cc
> @@ -178,15 +178,7 @@ nearest_common_dominator_of_uses (def_operand_p def_p, bool *debug_stmts)
>  
>     We want the most control dependent block in the shallowest loop nest.
>  
> -   If the resulting block is in a shallower loop nest, then use it.  Else
> -   only use the resulting block if it has significantly lower execution
> -   frequency than EARLY_BB to avoid gratuitous statement movement.  We
> -   consider statements with VOPS more desirable to move.
> -
> -   This pass would obviously benefit from PDO as it utilizes block
> -   frequencies.  It would also benefit from recomputing frequencies
> -   if profile data is not available since frequencies often get out
> -   of sync with reality.  */
> +   If the resulting block is in a shallower loop nest, then use it.  */
>  
>  static basic_block
>  select_best_block (basic_block early_bb,
> @@ -195,18 +187,17 @@ select_best_block (basic_block early_bb,
>  {
>    basic_block best_bb = late_bb;
>    basic_block temp_bb = late_bb;
> -  int threshold;
>  
>    while (temp_bb != early_bb)
>      {
> +      /* Walk up the dominator tree, hopefully we'll find a shallower
> +	 loop nest.  */
> +      temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
> +
>        /* If we've moved into a lower loop nest, then that becomes
>  	 our best block.  */
>        if (bb_loop_depth (temp_bb) < bb_loop_depth (best_bb))
>  	best_bb = temp_bb;
> -
> -      /* Walk up the dominator tree, hopefully we'll find a shallower
> - 	 loop nest.  */
> -      temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
>      }
>  
>    /* Placing a statement before a setjmp-like function would be invalid
> @@ -221,6 +212,16 @@ select_best_block (basic_block early_bb,
>    if (bb_loop_depth (best_bb) < bb_loop_depth (early_bb))
>      return best_bb;
>  
> +  /* Do not move stmts to post-dominating places on the same loop depth.  */
> +  if (dominated_by_p (CDI_POST_DOMINATORS, early_bb, best_bb))
> +    return early_bb;
> +
> +  /* If the latch block is empty, don't make it non-empty by sinking
> +     something into it.  */
> +  if (best_bb == early_bb->loop_father->latch
> +      && empty_block_p (best_bb))
> +    return early_bb;
> +
>    /* Avoid turning an unconditional read into a conditional one when we
>       still might want to perform vectorization.  */
>    if (best_bb->loop_father == early_bb->loop_father
> @@ -233,28 +234,7 @@ select_best_block (basic_block early_bb,
>        && !dominated_by_p (CDI_DOMINATORS, best_bb->loop_father->latch, best_bb))
>      return early_bb;
>  
> -  /* Get the sinking threshold.  If the statement to be moved has memory
> -     operands, then increase the threshold by 7% as those are even more
> -     profitable to avoid, clamping at 100%.  */
> -  threshold = param_sink_frequency_threshold;
> -  if (gimple_vuse (stmt) || gimple_vdef (stmt))
> -    {
> -      threshold += 7;
> -      if (threshold > 100)
> -	threshold = 100;
> -    }
> -
> -  /* If BEST_BB is at the same nesting level, then require it to have
> -     significantly lower execution frequency to avoid gratuitous movement.  */
> -  if (bb_loop_depth (best_bb) == bb_loop_depth (early_bb)
> -      /* If result of comparsion is unknown, prefer EARLY_BB.
> -	 Thus use !(...>=..) rather than (...<...)  */
> -      && !(best_bb->count * 100 >= early_bb->count * threshold))
> -    return best_bb;
> -
> -  /* No better block found, so return EARLY_BB, which happens to be the
> -     statement's original block.  */
> -  return early_bb;
> +  return best_bb;
>  }
>  
>  /* Given a statement (STMT) and the basic block it is currently in (FROMBB),
> @@ -452,13 +432,7 @@ statement_sink_location (gimple *stmt, basic_block frombb,
>      return false;
>    
>    sinkbb = select_best_block (frombb, sinkbb, stmt);
> -  if (!sinkbb || sinkbb == frombb)
> -    return false;
> -
> -  /* If the latch block is empty, don't make it non-empty by sinking
> -     something into it.  */
> -  if (sinkbb == frombb->loop_father->latch
> -      && empty_block_p (sinkbb))
> +  if (sinkbb == frombb)
>      return false;
>  
>    *togsi = gsi_after_labels (sinkbb);
> @@ -822,6 +796,7 @@ pass_sink_code::execute (function *fun)
>    mark_dfs_back_edges (fun);
>    memset (&sink_stats, 0, sizeof (sink_stats));
>    calculate_dominance_info (CDI_DOMINATORS);
> +  calculate_dominance_info (CDI_POST_DOMINATORS);
>  
>    virtual_operand_live vop_live;
>  
> @@ -833,6 +808,7 @@ pass_sink_code::execute (function *fun)
>  
>    statistics_counter_event (fun, "Sunk statements", sink_stats.sunk);
>    statistics_counter_event (fun, "Commoned stores", sink_stats.commoned);
> +  free_dominance_info (CDI_POST_DOMINATORS);
>    remove_fake_exit_edges ();
>    loop_optimizer_finalize ();
>  
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH] tree-optimization/114589 - remove profile based sink heuristics
  2024-06-14 13:10   ` Richard Biener
@ 2024-06-14 13:23     ` Li, Pan2
  0 siblings, 0 replies; 5+ messages in thread
From: Li, Pan2 @ 2024-06-14 13:23 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

> It definitely looks like a latent issue being triggered.  Either in LRA
> or in how the target presents itself.

Thanks Richard, will have a try and keep you posted.

Pan

-----Original Message-----
From: Richard Biener <rguenther@suse.de> 
Sent: Friday, June 14, 2024 9:11 PM
To: Li, Pan2 <pan2.li@intel.com>
Cc: gcc-patches@gcc.gnu.org
Subject: RE: [PATCH] tree-optimization/114589 - remove profile based sink heuristics

On Fri, 14 Jun 2024, Li, Pan2 wrote:

> Hi Richard,
> 
> Here is one PR related to this patch (by git bisect), details as below.
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115458
> 
> I am still trying to narrow down which change caused this failure or any hints here?

It definitely looks like a latent issue being triggered.  Either in LRA
or in how the target presents itself.

Richard.

> Pan
> 
> -----Original Message-----
> From: Richard Biener <rguenther@suse.de> 
> Sent: Wednesday, May 15, 2024 5:39 PM
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH] tree-optimization/114589 - remove profile based sink heuristics
> 
> The following removes the profile based heuristic limiting sinking
> and instead uses post-dominators to avoid sinking to places that
> are executed under the same conditions as the earlier location which
> the profile based heuristic should have guaranteed as well.
> 
> To avoid regressing this moves the empty-latch check to cover all
> sink cases.
> 
> It also stream-lines the resulting select_best_block a bit but avoids
> adjusting heuristics more with this change.  gfortran.dg/streamio_9.f90
> starts execute failing with this on x86_64 with -m32 because the
> (float)i * 9.9999...e-7 compute is sunk across a STOP causing it
> to be no longer spilled and thus the compare failing due to excess
> precision.  The patch adds -ffloat-store to avoid this, following
> other similar testcases.
> 
> This change doesn't fix the testcase in the PR on itself.
> 
> Bootstrapped on x86_64-unknown-linux-gnu, re-testing in progress.
> 
> 	PR tree-optimization/114589
> 	* tree-ssa-sink.cc (select_best_block): Remove profile-based
> 	heuristics.  Instead reject sink locations that sink
>         to post-dominators.  Move empty latch check here from
> 	statement_sink_location.  Also consider early_bb for the
> 	loop depth check.
> 	(statement_sink_location): Remove superfluous check.  Remove
> 	empty latch check.
> 	(pass_sink_code::execute): Compute/release post-dominators.
> 
> 	* gfortran.dg/streamio_9.f90: Use -ffloat-store to avoid
> 	excess precision when not spilling.
> ---
>  gcc/testsuite/gfortran.dg/streamio_9.f90 |  1 +
>  gcc/tree-ssa-sink.cc                     | 62 ++++++++----------------
>  2 files changed, 20 insertions(+), 43 deletions(-)
> 
> diff --git a/gcc/testsuite/gfortran.dg/streamio_9.f90 b/gcc/testsuite/gfortran.dg/streamio_9.f90
> index b6bddb973f8..f29ded6ba54 100644
> --- a/gcc/testsuite/gfortran.dg/streamio_9.f90
> +++ b/gcc/testsuite/gfortran.dg/streamio_9.f90
> @@ -1,4 +1,5 @@
>  ! { dg-do run }
> +! { dg-options "-ffloat-store" }
>  ! PR29053 Stream IO test 9.
>  ! Contributed by Jerry DeLisle <jvdelisle@gcc.gnu.org>.
>  ! Test case derived from that given in PR by Steve Kargl.
> diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
> index 2f90acb7ef4..2188b7523c7 100644
> --- a/gcc/tree-ssa-sink.cc
> +++ b/gcc/tree-ssa-sink.cc
> @@ -178,15 +178,7 @@ nearest_common_dominator_of_uses (def_operand_p def_p, bool *debug_stmts)
>  
>     We want the most control dependent block in the shallowest loop nest.
>  
> -   If the resulting block is in a shallower loop nest, then use it.  Else
> -   only use the resulting block if it has significantly lower execution
> -   frequency than EARLY_BB to avoid gratuitous statement movement.  We
> -   consider statements with VOPS more desirable to move.
> -
> -   This pass would obviously benefit from PDO as it utilizes block
> -   frequencies.  It would also benefit from recomputing frequencies
> -   if profile data is not available since frequencies often get out
> -   of sync with reality.  */
> +   If the resulting block is in a shallower loop nest, then use it.  */
>  
>  static basic_block
>  select_best_block (basic_block early_bb,
> @@ -195,18 +187,17 @@ select_best_block (basic_block early_bb,
>  {
>    basic_block best_bb = late_bb;
>    basic_block temp_bb = late_bb;
> -  int threshold;
>  
>    while (temp_bb != early_bb)
>      {
> +      /* Walk up the dominator tree, hopefully we'll find a shallower
> +	 loop nest.  */
> +      temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
> +
>        /* If we've moved into a lower loop nest, then that becomes
>  	 our best block.  */
>        if (bb_loop_depth (temp_bb) < bb_loop_depth (best_bb))
>  	best_bb = temp_bb;
> -
> -      /* Walk up the dominator tree, hopefully we'll find a shallower
> - 	 loop nest.  */
> -      temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
>      }
>  
>    /* Placing a statement before a setjmp-like function would be invalid
> @@ -221,6 +212,16 @@ select_best_block (basic_block early_bb,
>    if (bb_loop_depth (best_bb) < bb_loop_depth (early_bb))
>      return best_bb;
>  
> +  /* Do not move stmts to post-dominating places on the same loop depth.  */
> +  if (dominated_by_p (CDI_POST_DOMINATORS, early_bb, best_bb))
> +    return early_bb;
> +
> +  /* If the latch block is empty, don't make it non-empty by sinking
> +     something into it.  */
> +  if (best_bb == early_bb->loop_father->latch
> +      && empty_block_p (best_bb))
> +    return early_bb;
> +
>    /* Avoid turning an unconditional read into a conditional one when we
>       still might want to perform vectorization.  */
>    if (best_bb->loop_father == early_bb->loop_father
> @@ -233,28 +234,7 @@ select_best_block (basic_block early_bb,
>        && !dominated_by_p (CDI_DOMINATORS, best_bb->loop_father->latch, best_bb))
>      return early_bb;
>  
> -  /* Get the sinking threshold.  If the statement to be moved has memory
> -     operands, then increase the threshold by 7% as those are even more
> -     profitable to avoid, clamping at 100%.  */
> -  threshold = param_sink_frequency_threshold;
> -  if (gimple_vuse (stmt) || gimple_vdef (stmt))
> -    {
> -      threshold += 7;
> -      if (threshold > 100)
> -	threshold = 100;
> -    }
> -
> -  /* If BEST_BB is at the same nesting level, then require it to have
> -     significantly lower execution frequency to avoid gratuitous movement.  */
> -  if (bb_loop_depth (best_bb) == bb_loop_depth (early_bb)
> -      /* If result of comparsion is unknown, prefer EARLY_BB.
> -	 Thus use !(...>=..) rather than (...<...)  */
> -      && !(best_bb->count * 100 >= early_bb->count * threshold))
> -    return best_bb;
> -
> -  /* No better block found, so return EARLY_BB, which happens to be the
> -     statement's original block.  */
> -  return early_bb;
> +  return best_bb;
>  }
>  
>  /* Given a statement (STMT) and the basic block it is currently in (FROMBB),
> @@ -452,13 +432,7 @@ statement_sink_location (gimple *stmt, basic_block frombb,
>      return false;
>    
>    sinkbb = select_best_block (frombb, sinkbb, stmt);
> -  if (!sinkbb || sinkbb == frombb)
> -    return false;
> -
> -  /* If the latch block is empty, don't make it non-empty by sinking
> -     something into it.  */
> -  if (sinkbb == frombb->loop_father->latch
> -      && empty_block_p (sinkbb))
> +  if (sinkbb == frombb)
>      return false;
>  
>    *togsi = gsi_after_labels (sinkbb);
> @@ -822,6 +796,7 @@ pass_sink_code::execute (function *fun)
>    mark_dfs_back_edges (fun);
>    memset (&sink_stats, 0, sizeof (sink_stats));
>    calculate_dominance_info (CDI_DOMINATORS);
> +  calculate_dominance_info (CDI_POST_DOMINATORS);
>  
>    virtual_operand_live vop_live;
>  
> @@ -833,6 +808,7 @@ pass_sink_code::execute (function *fun)
>  
>    statistics_counter_event (fun, "Sunk statements", sink_stats.sunk);
>    statistics_counter_event (fun, "Commoned stores", sink_stats.commoned);
> +  free_dominance_info (CDI_POST_DOMINATORS);
>    remove_fake_exit_edges ();
>    loop_optimizer_finalize ();
>  
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-06-14 13:23 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-15  9:38 [PATCH] tree-optimization/114589 - remove profile based sink heuristics Richard Biener
2024-05-18  3:27 ` Hans-Peter Nilsson
2024-06-14 13:08 ` Li, Pan2
2024-06-14 13:10   ` Richard Biener
2024-06-14 13:23     ` Li, Pan2

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).