public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [RFC/PATCH] sched: Consider debug insn in no_real_insns_p [PR108273]
@ 2023-03-20  6:31 Kewen.Lin
  2023-03-20  8:03 ` Alexander Monakov
  0 siblings, 1 reply; 7+ messages in thread
From: Kewen.Lin @ 2023-03-20  6:31 UTC (permalink / raw)
  To: GCC Patches
  Cc: Segher Boessenkool, Peter Bergner, Richard Biener, Jeff Law,
	vladimir Makarov, amonakov, Richard Sandiford

Hi,

As PR108273 shows, when there is one block which only has
NOTE_P and LABEL_P insns at non-debug mode while has some
extra DEBUG_INSN_P insns at debug mode, after scheduling
it, the DFA states would be different between debug mode
and non-debug mode.  Since at non-debug mode, the block
meets no_real_insns_p, it gets skipped; while at debug
mode, it gets scheduled, even it only has NOTE_P, LABEL_P
and DEBUG_INSN_P, the call of function advance_one_cycle
will change the DFA state.  PR108519 also shows this issue
issue can be exposed by some scheduler changes.

This patch is to take debug insn into account in function
no_real_insns_p, which make us not try to schedule for the
block having only NOTE_P, LABEL_P and DEBUG_INSN_P insns,
resulting in consistent DFA states between non-debug and
debug mode.  Changing no_real_insns_p caused ICE when doing
free_block_dependencies, the root cause is that we create
dependencies for debug insns, those dependencies are
expected to be resolved during scheduling insns which gets
skipped after the change in no_real_insns_p.  By checking
the code, it looks it's reasonable to skip to compute block
dependencies for no_real_insns_p blocks.  It can be
bootstrapped and regtested but it hit one ICE when built
SPEC2017 bmks at option -O2 -g.  The root cause is that
initially there are no no_real_insns_p blocks in a region,
but in the later scheduling one block has one insn scheduled
speculatively then becomes no_real_insns_p, so we compute
dependencies and rgn_n_insns for this special block before
scheduling, later it gets skipped so not scheduled, the
following counts would mismatch:

    /* Sanity check: verify that all region insns were scheduled.  */
      gcc_assert (sched_rgn_n_insns == rgn_n_insns);

, and we miss to release the allocated dependencies.

To avoid the unexpected mis-matchings, this patch adds one
bitmap to track this kind of special block which isn't
no_real_insns_p but becomes no_real_insns_p later, then we
can adjust the count and free deps for it.

This patch can be bootstrapped and regress-tested on
x86_64-redhat-linux, aarch64-linux-gnu and
powerpc64{,le}-linux-gnu.

I also verified this patch can pass SPEC2017 both intrate
and fprate bmks building at -g -O2/-O3.

This is for next stage 1, but since I know little on the
scheduler, I'd like to post it early for more comments.

Is it on the right track?  Any thoughts?

BR,
Kewen
-----
	PR rtl-optimization/108273

gcc/ChangeLog:

	* haifa-sched.cc (no_real_insns_p): Consider DEBUG_INSN_P insn.
	* sched-rgn.cc (no_real_insns): New static bitmap variable.
	(compute_block_dependences): Skip for no_real_insns_p.
	(free_deps_for_bb_no_real_insns_p): New function.
	(free_block_dependencies): Call free_deps_for_bb_no_real_insns_p for
	no_real_insns_p bb.
	(schedule_region): Fix up sched_rgn_n_insns for some block for which
	rgn_n_insns is computed before, and move sched_rgn_local_finish after
	free_block_dependencies loop.
	(sched_rgn_local_init): Allocate and compute no_real_insns.
	(sched_rgn_local_free): Free no_real_insns.
---
 gcc/haifa-sched.cc |  8 ++++-
 gcc/sched-rgn.cc   | 84 +++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 87 insertions(+), 5 deletions(-)

diff --git a/gcc/haifa-sched.cc b/gcc/haifa-sched.cc
index 48b53776fa9..378f3b34cc0 100644
--- a/gcc/haifa-sched.cc
+++ b/gcc/haifa-sched.cc
@@ -5040,7 +5040,13 @@ no_real_insns_p (const rtx_insn *head, const rtx_insn *tail)
 {
   while (head != NEXT_INSN (tail))
     {
-      if (!NOTE_P (head) && !LABEL_P (head))
+      /* Take debug insn into account here, otherwise we can have different
+	 DFA states after scheduling a block which only has NOTE_P, LABEL_P
+	 and DEBUG_P (debug mode) insns between non-debug and debug modes,
+	 it could cause -fcompare-debug failure.  */
+      if (!NOTE_P (head)
+	  && !LABEL_P (head)
+	  && !DEBUG_INSN_P (head))
 	return 0;
       head = NEXT_INSN (head);
     }
diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc
index f2751f62450..211b62e2b4a 100644
--- a/gcc/sched-rgn.cc
+++ b/gcc/sched-rgn.cc
@@ -213,6 +213,11 @@ static int rgn_nr_edges;
 /* Array of size rgn_nr_edges.  */
 static edge *rgn_edges;

+/* For basic block i, the corresponding set bit i in bitmap indicates this basic
+   block meets predicate no_real_insns_p before scheduling any basic blocks in
+   the region.  */
+static bitmap no_real_insns;
+
 /* Mapping from each edge in the graph to its number in the rgn.  */
 #define EDGE_TO_BIT(edge) ((int)(size_t)(edge)->aux)
 #define SET_EDGE_TO_BIT(edge,nr) ((edge)->aux = (void *)(size_t)(nr))
@@ -2730,6 +2735,15 @@ compute_block_dependences (int bb)
   gcc_assert (EBB_FIRST_BB (bb) == EBB_LAST_BB (bb));
   get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);

+  /* Don't compute block dependencies if there are no real insns.  */
+  if (no_real_insns_p (head, tail))
+    {
+      if (current_nr_blocks > 1)
+	propagate_deps (bb, &tmp_deps);
+      free_deps (&tmp_deps);
+      return;
+    }
+
   sched_analyze (&tmp_deps, head, tail);

   add_branch_dependences (head, tail);
@@ -2744,6 +2758,42 @@ compute_block_dependences (int bb)
     targetm.sched.dependencies_evaluation_hook (head, tail);
 }

+/* The basic block without any real insns (no_real_insns_p) would be
+   skipped in compute_block_dependences, so for most no_real_insns_p
+   basic block, we don't need to free dependencies for them.  But
+   sometimes some basic block which isn't no_real_insns_p before
+   scheduling can become no_real_insns_p with speculative scheduling,
+   so we still need to free dependencies computed early for it.  So
+   this function is to free dependencies if need.  */
+
+static void
+free_deps_for_bb_no_real_insns_p (rtx_insn *head, rtx_insn *tail, int bb)
+{
+  gcc_assert (no_real_insns_p (head, tail));
+
+  /* Don't bother if there is only one block.  */
+  if (current_nr_blocks == 1)
+    return;
+
+  /* We don't compute dependencies before.  */
+  if (bitmap_bit_p (no_real_insns, bb))
+    return;
+
+  rtx_insn *insn;
+  rtx_insn *next_tail = NEXT_INSN (tail);
+  sd_iterator_def sd_it;
+  dep_t dep;
+
+  /* There could be some insns which get skipped in scheduling but we
+     compute dependencies for them previously, so make them resolved.  */
+  for (insn = head; insn != next_tail; insn = NEXT_INSN (insn))
+    for (sd_it = sd_iterator_start (insn, SD_LIST_FORW);
+	 sd_iterator_cond (&sd_it, &dep);)
+      sd_resolve_dep (sd_it);
+
+  sched_free_deps (head, tail, true);
+}
+
 /* Free dependencies of instructions inside BB.  */
 static void
 free_block_dependencies (int bb)
@@ -2754,7 +2804,10 @@ free_block_dependencies (int bb)
   get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);

   if (no_real_insns_p (head, tail))
-    return;
+    {
+      free_deps_for_bb_no_real_insns_p (head, tail, bb);
+      return;
+    }

   sched_free_deps (head, tail, true);
 }
@@ -3177,6 +3230,18 @@ schedule_region (int rgn)
 	{
 	  gcc_assert (first_bb == last_bb);
 	  save_state_for_fallthru_edge (last_bb, bb_state[first_bb->index]);
+	  /* We have counted this block when computing rgn_n_insns
+	     previously, so need to fix up sched_rgn_n_insns if we
+	     skip this.  */
+	  if (current_nr_blocks > 1 && !bitmap_bit_p (no_real_insns, bb))
+	    {
+	      while (head != NEXT_INSN (tail))
+		{
+		  if (INSN_P (head))
+		    sched_rgn_n_insns++;
+		  head = NEXT_INSN (head);
+		}
+	    }
 	  continue;
 	}

@@ -3218,13 +3283,13 @@ schedule_region (int rgn)

   sched_finish_ready_list ();

-  /* Done with this region.  */
-  sched_rgn_local_finish ();
-
   /* Free dependencies.  */
   for (bb = 0; bb < current_nr_blocks; ++bb)
     free_block_dependencies (bb);

+  /* Done with this region.  */
+  sched_rgn_local_finish ();
+
   gcc_assert (haifa_recovery_bb_ever_added_p
 	      || deps_pools_are_empty_p ());
 }
@@ -3444,6 +3509,16 @@ sched_rgn_local_init (int rgn)
 	  FOR_EACH_EDGE (e, ei, block->succs)
 	    e->aux = NULL;
         }
+
+      /* Compute no_real_insns.  */
+      no_real_insns = BITMAP_ALLOC (NULL);
+      for (bb = 0; bb < current_nr_blocks; bb++)
+	{
+	  rtx_insn *head, *tail;
+	  get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);
+	  if (no_real_insns_p (head, tail))
+	    bitmap_set_bit (no_real_insns, bb);
+	}
     }
 }

@@ -3456,6 +3531,7 @@ sched_rgn_local_free (void)
   sbitmap_vector_free (pot_split);
   sbitmap_vector_free (ancestor_edges);
   free (rgn_edges);
+  BITMAP_FREE (no_real_insns);
 }

 /* Free data computed for the finished region.  */
--
2.39.1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC/PATCH] sched: Consider debug insn in no_real_insns_p [PR108273]
  2023-03-20  6:31 [RFC/PATCH] sched: Consider debug insn in no_real_insns_p [PR108273] Kewen.Lin
@ 2023-03-20  8:03 ` Alexander Monakov
  2023-03-20  9:48   ` Kewen.Lin
  0 siblings, 1 reply; 7+ messages in thread
From: Alexander Monakov @ 2023-03-20  8:03 UTC (permalink / raw)
  To: Kewen.Lin
  Cc: GCC Patches, Segher Boessenkool, Peter Bergner, Richard Biener,
	Jeff Law, Vladimir Makarov, Richard Sandiford


On Mon, 20 Mar 2023, Kewen.Lin wrote:

> Hi,

Hi. Thank you for the thorough analysis. Since I analyzed
PR108519, I'd like to offer my comments.

> As PR108273 shows, when there is one block which only has
> NOTE_P and LABEL_P insns at non-debug mode while has some
> extra DEBUG_INSN_P insns at debug mode, after scheduling
> it, the DFA states would be different between debug mode
> and non-debug mode.  Since at non-debug mode, the block
> meets no_real_insns_p, it gets skipped; while at debug
> mode, it gets scheduled, even it only has NOTE_P, LABEL_P
> and DEBUG_INSN_P, the call of function advance_one_cycle
> will change the DFA state.  PR108519 also shows this issue
> issue can be exposed by some scheduler changes.

(yes, so an alternative is to avoid extraneous advance_one_cycle
calls, but I think adjusting no_real_insns_p is preferable)

> This patch is to take debug insn into account in function
> no_real_insns_p, which make us not try to schedule for the
> block having only NOTE_P, LABEL_P and DEBUG_INSN_P insns,
> resulting in consistent DFA states between non-debug and
> debug mode.  Changing no_real_insns_p caused ICE when doing
> free_block_dependencies, the root cause is that we create
> dependencies for debug insns, those dependencies are
> expected to be resolved during scheduling insns which gets
> skipped after the change in no_real_insns_p.  By checking
> the code, it looks it's reasonable to skip to compute block
> dependencies for no_real_insns_p blocks.  It can be
> bootstrapped and regtested but it hit one ICE when built
> SPEC2017 bmks at option -O2 -g.  The root cause is that
> initially there are no no_real_insns_p blocks in a region,
> but in the later scheduling one block has one insn scheduled
> speculatively then becomes no_real_insns_p, so we compute
> dependencies and rgn_n_insns for this special block before
> scheduling, later it gets skipped so not scheduled, the
> following counts would mismatch:
> 
>     /* Sanity check: verify that all region insns were scheduled.  */
>       gcc_assert (sched_rgn_n_insns == rgn_n_insns);
> 
> , and we miss to release the allocated dependencies.

Hm, but it is quite normal for BBs to become empty via speculative
scheduling in non-debug mode as well. So I don't think it's the
right way to frame the problem.

I think the main issue here is that debug_insns are "not real insns"
except we add them together with normal insns in the dependency graph,
and then we verify that the graph was exhausted by the scheduler.

We already handle a situation when dbg_cnt is telling the scheduler
to skip blocks. I guess the dbg_cnt handling is broken for a similar
reason?

Can we fix this issue together with the debug_cnt issue by adjusting
dbg_cnt handling in schedule_region, i.e. if no_real_insns_p || !dbg_cnt
then adjust sched_rgn_n_insns and manually resolve+free dependencies?

> To avoid the unexpected mis-matchings, this patch adds one
> bitmap to track this kind of special block which isn't
> no_real_insns_p but becomes no_real_insns_p later, then we
> can adjust the count and free deps for it.

Per above, I hope a simpler solution is possible.

(some comments on the patch below)

> This patch can be bootstrapped and regress-tested on
> x86_64-redhat-linux, aarch64-linux-gnu and
> powerpc64{,le}-linux-gnu.
> 
> I also verified this patch can pass SPEC2017 both intrate
> and fprate bmks building at -g -O2/-O3.
> 
> This is for next stage 1, but since I know little on the
> scheduler, I'd like to post it early for more comments.
> 
> Is it on the right track?  Any thoughts?
> 
> BR,
> Kewen
> -----
> 	PR rtl-optimization/108273
> 
> gcc/ChangeLog:
> 
> 	* haifa-sched.cc (no_real_insns_p): Consider DEBUG_INSN_P insn.
> 	* sched-rgn.cc (no_real_insns): New static bitmap variable.
> 	(compute_block_dependences): Skip for no_real_insns_p.
> 	(free_deps_for_bb_no_real_insns_p): New function.
> 	(free_block_dependencies): Call free_deps_for_bb_no_real_insns_p for
> 	no_real_insns_p bb.
> 	(schedule_region): Fix up sched_rgn_n_insns for some block for which
> 	rgn_n_insns is computed before, and move sched_rgn_local_finish after
> 	free_block_dependencies loop.
> 	(sched_rgn_local_init): Allocate and compute no_real_insns.
> 	(sched_rgn_local_free): Free no_real_insns.
> ---
>  gcc/haifa-sched.cc |  8 ++++-
>  gcc/sched-rgn.cc   | 84 +++++++++++++++++++++++++++++++++++++++++++---
>  2 files changed, 87 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/haifa-sched.cc b/gcc/haifa-sched.cc
> index 48b53776fa9..378f3b34cc0 100644
> --- a/gcc/haifa-sched.cc
> +++ b/gcc/haifa-sched.cc
> @@ -5040,7 +5040,13 @@ no_real_insns_p (const rtx_insn *head, const rtx_insn *tail)
>  {
>    while (head != NEXT_INSN (tail))
>      {
> -      if (!NOTE_P (head) && !LABEL_P (head))
> +      /* Take debug insn into account here, otherwise we can have different
> +	 DFA states after scheduling a block which only has NOTE_P, LABEL_P
> +	 and DEBUG_P (debug mode) insns between non-debug and debug modes,
> +	 it could cause -fcompare-debug failure.  */

Sorry, I don't think this comment is appropriate. I'd suggest to rename the
function to no_real_nondebug_insns_p (so when you adjust all callers it
becomes apparent what's affected), and then no comment will be necessary.

> +      if (!NOTE_P (head)
> +	  && !LABEL_P (head)
> +	  && !DEBUG_INSN_P (head))
>  	return 0;
>        head = NEXT_INSN (head);
>      }
> diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc
> index f2751f62450..211b62e2b4a 100644
> --- a/gcc/sched-rgn.cc
> +++ b/gcc/sched-rgn.cc
> @@ -213,6 +213,11 @@ static int rgn_nr_edges;
>  /* Array of size rgn_nr_edges.  */
>  static edge *rgn_edges;
> 
> +/* For basic block i, the corresponding set bit i in bitmap indicates this basic
> +   block meets predicate no_real_insns_p before scheduling any basic blocks in
> +   the region.  */
> +static bitmap no_real_insns;
> +
>  /* Mapping from each edge in the graph to its number in the rgn.  */
>  #define EDGE_TO_BIT(edge) ((int)(size_t)(edge)->aux)
>  #define SET_EDGE_TO_BIT(edge,nr) ((edge)->aux = (void *)(size_t)(nr))
> @@ -2730,6 +2735,15 @@ compute_block_dependences (int bb)
>    gcc_assert (EBB_FIRST_BB (bb) == EBB_LAST_BB (bb));
>    get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);
> 
> +  /* Don't compute block dependencies if there are no real insns.  */
> +  if (no_real_insns_p (head, tail))
> +    {
> +      if (current_nr_blocks > 1)
> +	propagate_deps (bb, &tmp_deps);
> +      free_deps (&tmp_deps);
> +      return;
> +    }
> +
>    sched_analyze (&tmp_deps, head, tail);
> 
>    add_branch_dependences (head, tail);
> @@ -2744,6 +2758,42 @@ compute_block_dependences (int bb)
>      targetm.sched.dependencies_evaluation_hook (head, tail);
>  }
> 
> +/* The basic block without any real insns (no_real_insns_p) would be
> +   skipped in compute_block_dependences, so for most no_real_insns_p
> +   basic block, we don't need to free dependencies for them.  But
> +   sometimes some basic block which isn't no_real_insns_p before
> +   scheduling can become no_real_insns_p with speculative scheduling,
> +   so we still need to free dependencies computed early for it.  So
> +   this function is to free dependencies if need.  */

Please try to keep comments more concise. The following might have been
more appropriate:

/* Artificially resolve and free dependencies for instructions HEAD to TAIL.  */

and the explanation why we are doing that would go to the caller.

> +
> +static void
> +free_deps_for_bb_no_real_insns_p (rtx_insn *head, rtx_insn *tail, int bb)
> +{
> +  gcc_assert (no_real_insns_p (head, tail));
> +
> +  /* Don't bother if there is only one block.  */
> +  if (current_nr_blocks == 1)
> +    return;

Sorry, can you explain this check?

> +
> +  /* We don't compute dependencies before.  */
> +  if (bitmap_bit_p (no_real_insns, bb))
> +    return;

This looks more appropriate to check in the caller.

Alexander

> +
> +  rtx_insn *insn;
> +  rtx_insn *next_tail = NEXT_INSN (tail);
> +  sd_iterator_def sd_it;
> +  dep_t dep;
> +
> +  /* There could be some insns which get skipped in scheduling but we
> +     compute dependencies for them previously, so make them resolved.  */
> +  for (insn = head; insn != next_tail; insn = NEXT_INSN (insn))
> +    for (sd_it = sd_iterator_start (insn, SD_LIST_FORW);
> +	 sd_iterator_cond (&sd_it, &dep);)
> +      sd_resolve_dep (sd_it);
> +
> +  sched_free_deps (head, tail, true);
> +}
> +
>  /* Free dependencies of instructions inside BB.  */
>  static void
>  free_block_dependencies (int bb)
> @@ -2754,7 +2804,10 @@ free_block_dependencies (int bb)
>    get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);
> 
>    if (no_real_insns_p (head, tail))
> -    return;
> +    {
> +      free_deps_for_bb_no_real_insns_p (head, tail, bb);
> +      return;
> +    }
> 
>    sched_free_deps (head, tail, true);
>  }
> @@ -3177,6 +3230,18 @@ schedule_region (int rgn)
>  	{
>  	  gcc_assert (first_bb == last_bb);
>  	  save_state_for_fallthru_edge (last_bb, bb_state[first_bb->index]);
> +	  /* We have counted this block when computing rgn_n_insns
> +	     previously, so need to fix up sched_rgn_n_insns if we
> +	     skip this.  */
> +	  if (current_nr_blocks > 1 && !bitmap_bit_p (no_real_insns, bb))
> +	    {
> +	      while (head != NEXT_INSN (tail))
> +		{
> +		  if (INSN_P (head))
> +		    sched_rgn_n_insns++;
> +		  head = NEXT_INSN (head);
> +		}
> +	    }
>  	  continue;
>  	}
> 
> @@ -3218,13 +3283,13 @@ schedule_region (int rgn)
> 
>    sched_finish_ready_list ();
> 
> -  /* Done with this region.  */
> -  sched_rgn_local_finish ();
> -
>    /* Free dependencies.  */
>    for (bb = 0; bb < current_nr_blocks; ++bb)
>      free_block_dependencies (bb);
> 
> +  /* Done with this region.  */
> +  sched_rgn_local_finish ();
> +
>    gcc_assert (haifa_recovery_bb_ever_added_p
>  	      || deps_pools_are_empty_p ());
>  }
> @@ -3444,6 +3509,16 @@ sched_rgn_local_init (int rgn)
>  	  FOR_EACH_EDGE (e, ei, block->succs)
>  	    e->aux = NULL;
>          }
> +
> +      /* Compute no_real_insns.  */
> +      no_real_insns = BITMAP_ALLOC (NULL);
> +      for (bb = 0; bb < current_nr_blocks; bb++)
> +	{
> +	  rtx_insn *head, *tail;
> +	  get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);
> +	  if (no_real_insns_p (head, tail))
> +	    bitmap_set_bit (no_real_insns, bb);
> +	}
>      }
>  }
> 
> @@ -3456,6 +3531,7 @@ sched_rgn_local_free (void)
>    sbitmap_vector_free (pot_split);
>    sbitmap_vector_free (ancestor_edges);
>    free (rgn_edges);
> +  BITMAP_FREE (no_real_insns);
>  }
> 
>  /* Free data computed for the finished region.  */
> --
> 2.39.1
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC/PATCH] sched: Consider debug insn in no_real_insns_p [PR108273]
  2023-03-20  8:03 ` Alexander Monakov
@ 2023-03-20  9:48   ` Kewen.Lin
  2023-03-29  7:18     ` [PATCH v2] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273] Kewen.Lin
  0 siblings, 1 reply; 7+ messages in thread
From: Kewen.Lin @ 2023-03-20  9:48 UTC (permalink / raw)
  To: Alexander Monakov
  Cc: GCC Patches, Segher Boessenkool, Peter Bergner, Richard Biener,
	Jeff Law, Vladimir Makarov, Richard Sandiford

Hi Alexander,

Thanks a lot for your comments and suggestions!

on 2023/3/20 16:03, Alexander Monakov wrote:
> 
> On Mon, 20 Mar 2023, Kewen.Lin wrote:
> 
>> Hi,
> 
> Hi. Thank you for the thorough analysis. Since I analyzed
> PR108519, I'd like to offer my comments.
> 
>> As PR108273 shows, when there is one block which only has
>> NOTE_P and LABEL_P insns at non-debug mode while has some
>> extra DEBUG_INSN_P insns at debug mode, after scheduling
>> it, the DFA states would be different between debug mode
>> and non-debug mode.  Since at non-debug mode, the block
>> meets no_real_insns_p, it gets skipped; while at debug
>> mode, it gets scheduled, even it only has NOTE_P, LABEL_P
>> and DEBUG_INSN_P, the call of function advance_one_cycle
>> will change the DFA state.  PR108519 also shows this issue
>> issue can be exposed by some scheduler changes.
> 
> (yes, so an alternative is to avoid extraneous advance_one_cycle
> calls, but I think adjusting no_real_insns_p is preferable)
> 

Yeah, it looks no value to enter schedule_block for this kind
of no_real_nondebug_insns_p block.

>> This patch is to take debug insn into account in function
>> no_real_insns_p, which make us not try to schedule for the
>> block having only NOTE_P, LABEL_P and DEBUG_INSN_P insns,
>> resulting in consistent DFA states between non-debug and
>> debug mode.  Changing no_real_insns_p caused ICE when doing
>> free_block_dependencies, the root cause is that we create
>> dependencies for debug insns, those dependencies are
>> expected to be resolved during scheduling insns which gets
>> skipped after the change in no_real_insns_p.  By checking
>> the code, it looks it's reasonable to skip to compute block
>> dependencies for no_real_insns_p blocks.  It can be
>> bootstrapped and regtested but it hit one ICE when built
>> SPEC2017 bmks at option -O2 -g.  The root cause is that
>> initially there are no no_real_insns_p blocks in a region,
>> but in the later scheduling one block has one insn scheduled
>> speculatively then becomes no_real_insns_p, so we compute
>> dependencies and rgn_n_insns for this special block before
>> scheduling, later it gets skipped so not scheduled, the
>> following counts would mismatch:
>>
>>     /* Sanity check: verify that all region insns were scheduled.  */
>>       gcc_assert (sched_rgn_n_insns == rgn_n_insns);
>>
>> , and we miss to release the allocated dependencies.
> 
> Hm, but it is quite normal for BBs to become empty via speculative
> scheduling in non-debug mode as well. So I don't think it's the
> right way to frame the problem.
> 
> I think the main issue here is that debug_insns are "not real insns"
> except we add them together with normal insns in the dependency graph,
> and then we verify that the graph was exhausted by the scheduler.

Thanks for your better description and explanation!  Yeah, it creates
dependencies and makes statistics for debug insns like the other normal
insns.

> 
> We already handle a situation when dbg_cnt is telling the scheduler
> to skip blocks. I guess the dbg_cnt handling is broken for a similar
> reason?

Yes, the dbg_cnt handlings looks wrong, I didn't dig into the history
but I guessed the original meaning (usage) of rgn_n_insns or
sched_rgn_n_insns have changed over time.

> 
> Can we fix this issue together with the debug_cnt issue by adjusting
> dbg_cnt handling in schedule_region, i.e. if no_real_insns_p || !dbg_cnt
> then adjust sched_rgn_n_insns and manually resolve+free dependencies?

Good idea!  I'll expand this to handle dbg_cnt issue.  One thing I'm not
sure about is that if it's a good idea to skip dependencies computing for
this kind of no_real_nondebug_insns_p block.  The pro is that it's more
efficient by skipping, while the con is that it needs an extra bitmap and
some checks.

> 
>> To avoid the unexpected mis-matchings, this patch adds one
>> bitmap to track this kind of special block which isn't
>> no_real_insns_p but becomes no_real_insns_p later, then we
>> can adjust the count and free deps for it.
> 
> Per above, I hope a simpler solution is possible.
> 
> (some comments on the patch below)
> 
>> This patch can be bootstrapped and regress-tested on
>> x86_64-redhat-linux, aarch64-linux-gnu and
>> powerpc64{,le}-linux-gnu.
>>
>> I also verified this patch can pass SPEC2017 both intrate
>> and fprate bmks building at -g -O2/-O3.
>>
>> This is for next stage 1, but since I know little on the
>> scheduler, I'd like to post it early for more comments.
>>
>> Is it on the right track?  Any thoughts?
>>
>> BR,
>> Kewen
>> -----
>> 	PR rtl-optimization/108273
>>
>> gcc/ChangeLog:
>>
>> 	* haifa-sched.cc (no_real_insns_p): Consider DEBUG_INSN_P insn.
>> 	* sched-rgn.cc (no_real_insns): New static bitmap variable.
>> 	(compute_block_dependences): Skip for no_real_insns_p.
>> 	(free_deps_for_bb_no_real_insns_p): New function.
>> 	(free_block_dependencies): Call free_deps_for_bb_no_real_insns_p for
>> 	no_real_insns_p bb.
>> 	(schedule_region): Fix up sched_rgn_n_insns for some block for which
>> 	rgn_n_insns is computed before, and move sched_rgn_local_finish after
>> 	free_block_dependencies loop.
>> 	(sched_rgn_local_init): Allocate and compute no_real_insns.
>> 	(sched_rgn_local_free): Free no_real_insns.
>> ---
>>  gcc/haifa-sched.cc |  8 ++++-
>>  gcc/sched-rgn.cc   | 84 +++++++++++++++++++++++++++++++++++++++++++---
>>  2 files changed, 87 insertions(+), 5 deletions(-)
>>
>> diff --git a/gcc/haifa-sched.cc b/gcc/haifa-sched.cc
>> index 48b53776fa9..378f3b34cc0 100644
>> --- a/gcc/haifa-sched.cc
>> +++ b/gcc/haifa-sched.cc
>> @@ -5040,7 +5040,13 @@ no_real_insns_p (const rtx_insn *head, const rtx_insn *tail)
>>  {
>>    while (head != NEXT_INSN (tail))
>>      {
>> -      if (!NOTE_P (head) && !LABEL_P (head))
>> +      /* Take debug insn into account here, otherwise we can have different
>> +	 DFA states after scheduling a block which only has NOTE_P, LABEL_P
>> +	 and DEBUG_P (debug mode) insns between non-debug and debug modes,
>> +	 it could cause -fcompare-debug failure.  */
> 
> Sorry, I don't think this comment is appropriate. I'd suggest to rename the
> function to no_real_nondebug_insns_p (so when you adjust all callers it
> becomes apparent what's affected), and then no comment will be necessary.

Good idea, will update, thanks!

> 
>> +      if (!NOTE_P (head)
>> +	  && !LABEL_P (head)
>> +	  && !DEBUG_INSN_P (head))
>>  	return 0;
>>        head = NEXT_INSN (head);
>>      }
>> diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc
>> index f2751f62450..211b62e2b4a 100644
>> --- a/gcc/sched-rgn.cc
>> +++ b/gcc/sched-rgn.cc
>> @@ -213,6 +213,11 @@ static int rgn_nr_edges;
>>  /* Array of size rgn_nr_edges.  */
>>  static edge *rgn_edges;
>>
>> +/* For basic block i, the corresponding set bit i in bitmap indicates this basic
>> +   block meets predicate no_real_insns_p before scheduling any basic blocks in
>> +   the region.  */
>> +static bitmap no_real_insns;
>> +
>>  /* Mapping from each edge in the graph to its number in the rgn.  */
>>  #define EDGE_TO_BIT(edge) ((int)(size_t)(edge)->aux)
>>  #define SET_EDGE_TO_BIT(edge,nr) ((edge)->aux = (void *)(size_t)(nr))
>> @@ -2730,6 +2735,15 @@ compute_block_dependences (int bb)
>>    gcc_assert (EBB_FIRST_BB (bb) == EBB_LAST_BB (bb));
>>    get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);
>>
>> +  /* Don't compute block dependencies if there are no real insns.  */
>> +  if (no_real_insns_p (head, tail))
>> +    {
>> +      if (current_nr_blocks > 1)
>> +	propagate_deps (bb, &tmp_deps);
>> +      free_deps (&tmp_deps);
>> +      return;
>> +    }
>> +
>>    sched_analyze (&tmp_deps, head, tail);
>>
>>    add_branch_dependences (head, tail);
>> @@ -2744,6 +2758,42 @@ compute_block_dependences (int bb)
>>      targetm.sched.dependencies_evaluation_hook (head, tail);
>>  }
>>
>> +/* The basic block without any real insns (no_real_insns_p) would be
>> +   skipped in compute_block_dependences, so for most no_real_insns_p
>> +   basic block, we don't need to free dependencies for them.  But
>> +   sometimes some basic block which isn't no_real_insns_p before
>> +   scheduling can become no_real_insns_p with speculative scheduling,
>> +   so we still need to free dependencies computed early for it.  So
>> +   this function is to free dependencies if need.  */
> 
> Please try to keep comments more concise. The following might have been
> more appropriate:
> 
> /* Artificially resolve and free dependencies for instructions HEAD to TAIL.  */
> > and the explanation why we are doing that would go to the caller.

Will update, thanks!

> 
>> +
>> +static void
>> +free_deps_for_bb_no_real_insns_p (rtx_insn *head, rtx_insn *tail, int bb)
>> +{
>> +  gcc_assert (no_real_insns_p (head, tail));
>> +
>> +  /* Don't bother if there is only one block.  */
>> +  if (current_nr_blocks == 1)
>> +    return;
> 
> Sorry, can you explain this check?

Since bitmap no_real_insns only gets allocated when current_nr_blocks > 1, this
is to avoid an access to an un-allocated bitmap.

> 
>> +
>> +  /* We don't compute dependencies before.  */
>> +  if (bitmap_bit_p (no_real_insns, bb))
>> +    return;
> 
> This looks more appropriate to check in the caller.

Actually in the initial draft I checked this in the call site, but later I thought
maybe it's to make the call site concise by putting all the checks here.  Will
adjust it back.  :)

Thanks again!

BR,
Kewen

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]
  2023-03-20  9:48   ` Kewen.Lin
@ 2023-03-29  7:18     ` Kewen.Lin
  2023-05-17  6:20       ` Kewen.Lin
  0 siblings, 1 reply; 7+ messages in thread
From: Kewen.Lin @ 2023-03-29  7:18 UTC (permalink / raw)
  To: GCC Patches
  Cc: Segher Boessenkool, Peter Bergner, Richard Biener, Jeff Law,
	Vladimir Makarov, Richard Sandiford, Alexander Monakov

Hi,

By addressing Alexander's comments, against v1 this
patch v2 mainly:

  - Rename no_real_insns_p to no_real_nondebug_insns_p;
  - Introduce enum rgn_bb_deps_free_action for three
    kinds of actions to free deps;
  - Change function free_deps_for_bb_no_real_insns_p to
    resolve_forw_deps which only focuses on forward deps;
  - Extend the handlings to cover dbg-cnt sched_block,
    add one test case for it;
  - Move free_trg_info call in schedule_region to an
    appropriate place.

One thing I'm not sure about is the change in function
sched_rgn_local_finish, currently the invocation to
sched_rgn_local_free is guarded with !sel_sched_p (),
so I just follow it, but the initialization of those
structures (in sched_rgn_local_init) isn't guarded
with !sel_sched_p (), it looks odd.

----

As PR108273 shows, when there is one block which only has
NOTE_P and LABEL_P insns at non-debug mode while has some
extra DEBUG_INSN_P insns at debug mode, after scheduling
it, the DFA states would be different between debug mode
and non-debug mode.  Since at non-debug mode, the block
meets no_real_insns_p, it gets skipped; while at debug
mode, it gets scheduled, even it only has NOTE_P, LABEL_P
and DEBUG_INSN_P, the call of function advance_one_cycle
will change the DFA state.  PR108519 also shows this issue
issue can be exposed by some scheduler changes.

This patch is to change function no_real_insns_p to
function no_real_nondebug_insns_p by taking debug insn into
account, which make us not try to schedule for the block
having only NOTE_P, LABEL_P and DEBUG_INSN_P insns,
resulting in consistent DFA states between non-debug and
debug mode.

Changing no_real_insns_p to no_real_nondebug_insns_p caused
ICE when doing free_block_dependencies, the root cause is
that we create dependencies for debug insns, those
dependencies are expected to be resolved during scheduling
insns, but which gets skipped after this change.
By checking the code, it looks it's reasonable to skip to
compute block dependences for no_real_nondebug_insns_p
blocks.  There is also another issue, which gets exposed
in SPEC2017 bmks build at option -O2 -g, is that we could
skip to schedule some block, which already gets dependency
graph built so has dependencies computed and rgn_n_insns
accumulated, then the later verification on if the graph
becomes exhausted by scheduling would fail as follow:

  /* Sanity check: verify that all region insns were
     scheduled.  */
    gcc_assert (sched_rgn_n_insns == rgn_n_insns);

, and also some forward deps aren't resovled.

As Alexander pointed out, the current debug count handling
also suffers the similar issue, so this patch handles these
two cases together: one is for some block gets skipped by
!dbg_cnt (sched_block), the other is for some block which
is not no_real_nondebug_insns_p initially but becomes
no_real_nondebug_insns_p due to speculative scheduling.

This patch can be bootstrapped and regress-tested on
x86_64-redhat-linux, aarch64-linux-gnu and
powerpc64{,le}-linux-gnu.

I also verified this patch can pass SPEC2017 both intrate
and fprate bmks building at -g -O2/-O3.

Any thoughts?

BR,
Kewen
----
	PR rtl-optimization/108273

gcc/ChangeLog:

	* haifa-sched.cc (no_real_insns_p): Rename to ...
	(no_real_nondebug_insns_p): ... this, and consider DEBUG_INSN_P insn.
	* sched-ebb.cc (schedule_ebb): Replace no_real_insns_p with
	no_real_nondebug_insns_p.
	* sched-int.h (no_real_insns_p): Rename to ...
	(no_real_nondebug_insns_p): ... this.
	* sched-rgn.cc (enum rgn_bb_deps_free_action): New enum.
	(bb_deps_free_actions): New static variable.
	(compute_block_dependences): Skip for no_real_nondebug_insns_p.
	(resolve_forw_deps): New function.
	(free_block_dependencies): Check bb_deps_free_actions and call
	function resolve_forw_deps for RGN_BB_DEPS_FREE_ARTICIAL.
	(compute_priorities): Replace no_real_insns_p with
	no_real_nondebug_insns_p.
	(schedule_region): Replace no_real_insns_p with
	no_real_nondebug_insns_p, set RGN_BB_DEPS_FREE_ARTICIAL if the block
	get dependencies computed before but skipped now, fix up count
	sched_rgn_n_insns for it too.  Call free_trg_info when the block
	gets scheduled, and move sched_rgn_local_finish after the loop
	of free_block_dependencies loop.
	(sched_rgn_local_init): Allocate and compute bb_deps_free_actions.
	(sched_rgn_local_finish): Free bb_deps_free_actions.
	* sel-sched.cc (sel_region_target_finish): Replace no_real_insns_p with
	no_real_nondebug_insns_p.

gcc/testsuite/ChangeLog:

	* gcc.target/powerpc/pr108273.c: New test.
---
 gcc/haifa-sched.cc                          |   9 +-
 gcc/sched-ebb.cc                            |   2 +-
 gcc/sched-int.h                             |   2 +-
 gcc/sched-rgn.cc                            | 148 +++++++++++++++-----
 gcc/sel-sched.cc                            |   3 +-
 gcc/testsuite/gcc.target/powerpc/pr108273.c |  26 ++++
 6 files changed, 150 insertions(+), 40 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108273.c

diff --git a/gcc/haifa-sched.cc b/gcc/haifa-sched.cc
index 48b53776fa9..371de486eb0 100644
--- a/gcc/haifa-sched.cc
+++ b/gcc/haifa-sched.cc
@@ -5033,14 +5033,17 @@ get_ebb_head_tail (basic_block beg, basic_block end,
   *tailp = end_tail;
 }

-/* Return nonzero if there are no real insns in the range [ HEAD, TAIL ].  */
+/* Return nonzero if there are no real nondebug insns in the range
+   [ HEAD, TAIL ].  */

 int
-no_real_insns_p (const rtx_insn *head, const rtx_insn *tail)
+no_real_nondebug_insns_p (const rtx_insn *head, const rtx_insn *tail)
 {
   while (head != NEXT_INSN (tail))
     {
-      if (!NOTE_P (head) && !LABEL_P (head))
+      if (!NOTE_P (head)
+	  && !LABEL_P (head)
+	  && !DEBUG_INSN_P (head))
 	return 0;
       head = NEXT_INSN (head);
     }
diff --git a/gcc/sched-ebb.cc b/gcc/sched-ebb.cc
index 3eb6a24f187..e3bb0d57159 100644
--- a/gcc/sched-ebb.cc
+++ b/gcc/sched-ebb.cc
@@ -491,7 +491,7 @@ schedule_ebb (rtx_insn *head, rtx_insn *tail, bool modulo_scheduling)
   first_bb = BLOCK_FOR_INSN (head);
   last_bb = BLOCK_FOR_INSN (tail);

-  if (no_real_insns_p (head, tail))
+  if (no_real_nondebug_insns_p (head, tail))
     return BLOCK_FOR_INSN (tail);

   gcc_assert (INSN_P (head) && INSN_P (tail));
diff --git a/gcc/sched-int.h b/gcc/sched-int.h
index 97b7d2d319b..eb4e8acec9f 100644
--- a/gcc/sched-int.h
+++ b/gcc/sched-int.h
@@ -1397,7 +1397,7 @@ extern void free_global_sched_pressure_data (void);
 extern int haifa_classify_insn (const_rtx);
 extern void get_ebb_head_tail (basic_block, basic_block,
 			       rtx_insn **, rtx_insn **);
-extern int no_real_insns_p (const rtx_insn *, const rtx_insn *);
+extern int no_real_nondebug_insns_p (const rtx_insn *, const rtx_insn *);

 extern int insn_sched_cost (rtx_insn *);
 extern int dep_cost_1 (dep_t, dw_t);
diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc
index f2751f62450..3508a26aeb6 100644
--- a/gcc/sched-rgn.cc
+++ b/gcc/sched-rgn.cc
@@ -213,6 +213,22 @@ static int rgn_nr_edges;
 /* Array of size rgn_nr_edges.  */
 static edge *rgn_edges;

+/* Possible actions for dependencies freeing.  */
+enum rgn_bb_deps_free_action
+{
+  /* This block doesn't get dependencies computed so don't need to free.  */
+  RGN_BB_DEPS_FREE_NO,
+  /* This block gets scheduled normally so free dependencies as usual.  */
+  RGN_BB_DEPS_FREE_NORMAL,
+  /* This block gets skipped in scheduling but has dependencies computed early,
+     need to free the forward list articially.  */
+  RGN_BB_DEPS_FREE_ARTICIAL
+};
+
+/* For basic block i, bb_deps_free_actions[i] indicates which action needs
+   to be taken for freeing its dependencies.  */
+static enum rgn_bb_deps_free_action *bb_deps_free_actions;
+
 /* Mapping from each edge in the graph to its number in the rgn.  */
 #define EDGE_TO_BIT(edge) ((int)(size_t)(edge)->aux)
 #define SET_EDGE_TO_BIT(edge,nr) ((edge)->aux = (void *)(size_t)(nr))
@@ -2730,6 +2746,15 @@ compute_block_dependences (int bb)
   gcc_assert (EBB_FIRST_BB (bb) == EBB_LAST_BB (bb));
   get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);

+  /* Don't compute block dependencies if there are no real nondebug insns.  */
+  if (no_real_nondebug_insns_p (head, tail))
+    {
+      if (current_nr_blocks > 1)
+	propagate_deps (bb, &tmp_deps);
+      free_deps (&tmp_deps);
+      return;
+    }
+
   sched_analyze (&tmp_deps, head, tail);

   add_branch_dependences (head, tail);
@@ -2744,6 +2769,24 @@ compute_block_dependences (int bb)
     targetm.sched.dependencies_evaluation_hook (head, tail);
 }

+/* Artificially resolve forward dependencies for instructions HEAD to TAIL.  */
+
+static void
+resolve_forw_deps (rtx_insn *head, rtx_insn *tail)
+{
+  rtx_insn *insn;
+  rtx_insn *next_tail = NEXT_INSN (tail);
+  sd_iterator_def sd_it;
+  dep_t dep;
+
+  /* There could be some insns which get skipped in scheduling but we compute
+     dependencies for them previously, so make them resolved.  */
+  for (insn = head; insn != next_tail; insn = NEXT_INSN (insn))
+    for (sd_it = sd_iterator_start (insn, SD_LIST_FORW);
+	 sd_iterator_cond (&sd_it, &dep);)
+      sd_resolve_dep (sd_it);
+}
+
 /* Free dependencies of instructions inside BB.  */
 static void
 free_block_dependencies (int bb)
@@ -2753,9 +2796,12 @@ free_block_dependencies (int bb)

   get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);

-  if (no_real_insns_p (head, tail))
+  if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_NO)
     return;

+  if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_ARTICIAL)
+    resolve_forw_deps (head, tail);
+
   sched_free_deps (head, tail, true);
 }

@@ -3019,7 +3065,7 @@ compute_priorities (void)
       gcc_assert (EBB_FIRST_BB (bb) == EBB_LAST_BB (bb));
       get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);

-      if (no_real_insns_p (head, tail))
+      if (no_real_nondebug_insns_p (head, tail))
 	continue;

       rgn_n_insns += set_priorities (head, tail);
@@ -3153,7 +3199,7 @@ schedule_region (int rgn)

 	  get_ebb_head_tail (first_bb, last_bb, &head, &tail);

-	  if (no_real_insns_p (head, tail))
+	  if (no_real_nondebug_insns_p (head, tail))
 	    {
 	      gcc_assert (first_bb == last_bb);
 	      continue;
@@ -3173,44 +3219,62 @@ schedule_region (int rgn)

       get_ebb_head_tail (first_bb, last_bb, &head, &tail);

-      if (no_real_insns_p (head, tail))
+      if (no_real_nondebug_insns_p (head, tail))
 	{
 	  gcc_assert (first_bb == last_bb);
 	  save_state_for_fallthru_edge (last_bb, bb_state[first_bb->index]);
-	  continue;
+
+	  if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_NO)
+	    continue;
+
+	  /* As it's not no_real_nondebug_insns_p initially, then it has some
+	     dependencies computed so free it articially.  */
+	  bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_ARTICIAL;
 	}
+      else
+	{
+	  current_sched_info->prev_head = PREV_INSN (head);
+	  current_sched_info->next_tail = NEXT_INSN (tail);

-      current_sched_info->prev_head = PREV_INSN (head);
-      current_sched_info->next_tail = NEXT_INSN (tail);
+	  remove_notes (head, tail);

-      remove_notes (head, tail);
+	  unlink_bb_notes (first_bb, last_bb);

-      unlink_bb_notes (first_bb, last_bb);
+	  target_bb = bb;

-      target_bb = bb;
+	  gcc_assert (flag_schedule_interblock || current_nr_blocks == 1);
+	  current_sched_info->queue_must_finish_empty = current_nr_blocks == 1;

-      gcc_assert (flag_schedule_interblock || current_nr_blocks == 1);
-      current_sched_info->queue_must_finish_empty = current_nr_blocks == 1;
+	  curr_bb = first_bb;
+	  if (dbg_cnt (sched_block))
+	    {
+	      int saved_last_basic_block = last_basic_block_for_fn (cfun);

-      curr_bb = first_bb;
-      if (dbg_cnt (sched_block))
-        {
-	  int saved_last_basic_block = last_basic_block_for_fn (cfun);
+	      schedule_block (&curr_bb, bb_state[first_bb->index]);
+	      gcc_assert (EBB_FIRST_BB (bb) == first_bb);
+	      sched_rgn_n_insns += sched_n_insns;
+	      realloc_bb_state_array (saved_last_basic_block);
+	      save_state_for_fallthru_edge (last_bb, curr_state);

-	  schedule_block (&curr_bb, bb_state[first_bb->index]);
-	  gcc_assert (EBB_FIRST_BB (bb) == first_bb);
-	  sched_rgn_n_insns += sched_n_insns;
-	  realloc_bb_state_array (saved_last_basic_block);
-	  save_state_for_fallthru_edge (last_bb, curr_state);
-        }
-      else
-        {
-          sched_rgn_n_insns += rgn_n_insns;
-        }
+	      /* Clean up.  */
+	      if (current_nr_blocks > 1)
+		free_trg_info ();
+	    }
+	  else
+	    bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_ARTICIAL;
+	}

-      /* Clean up.  */
-      if (current_nr_blocks > 1)
-	free_trg_info ();
+      /* We have counted this block when computing rgn_n_insns
+	 previously, so need to fix up sched_rgn_n_insns now.  */
+      if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_ARTICIAL)
+	{
+	  while (head != NEXT_INSN (tail))
+	    {
+	      if (INSN_P (head))
+		sched_rgn_n_insns++;
+	      head = NEXT_INSN (head);
+	    }
+	}
     }

   /* Sanity check: verify that all region insns were scheduled.  */
@@ -3218,13 +3282,13 @@ schedule_region (int rgn)

   sched_finish_ready_list ();

-  /* Done with this region.  */
-  sched_rgn_local_finish ();
-
   /* Free dependencies.  */
   for (bb = 0; bb < current_nr_blocks; ++bb)
     free_block_dependencies (bb);

+  /* Done with this region.  */
+  sched_rgn_local_finish ();
+
   gcc_assert (haifa_recovery_bb_ever_added_p
 	      || deps_pools_are_empty_p ());
 }
@@ -3445,6 +3509,19 @@ sched_rgn_local_init (int rgn)
 	    e->aux = NULL;
         }
     }
+
+  /* Initialize bb_deps_free_actions.  */
+  bb_deps_free_actions
+    = XNEWVEC (enum rgn_bb_deps_free_action, current_nr_blocks);
+  for (bb = 0; bb < current_nr_blocks; bb++)
+    {
+      rtx_insn *head, *tail;
+      get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);
+      if (no_real_nondebug_insns_p (head, tail))
+	bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_NO;
+      else
+	bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_NORMAL;
+    }
 }

 /* Free data computed for the finished region.  */
@@ -3462,9 +3539,12 @@ sched_rgn_local_free (void)
 void
 sched_rgn_local_finish (void)
 {
-  if (current_nr_blocks > 1 && !sel_sched_p ())
+  if (!sel_sched_p ())
     {
-      sched_rgn_local_free ();
+      if (current_nr_blocks > 1)
+	sched_rgn_local_free ();
+
+      free (bb_deps_free_actions);
     }
 }

diff --git a/gcc/sel-sched.cc b/gcc/sel-sched.cc
index cb1cf75bfe4..04415590455 100644
--- a/gcc/sel-sched.cc
+++ b/gcc/sel-sched.cc
@@ -7213,7 +7213,8 @@ sel_region_target_finish (bool reset_sched_cycles_p)

       find_ebb_boundaries (EBB_FIRST_BB (i), scheduled_blocks);

-      if (no_real_insns_p (current_sched_info->head, current_sched_info->tail))
+      if (no_real_nondebug_insns_p (current_sched_info->head,
+				    current_sched_info->tail))
 	continue;

       if (reset_sched_cycles_p)
diff --git a/gcc/testsuite/gcc.target/powerpc/pr108273.c b/gcc/testsuite/gcc.target/powerpc/pr108273.c
new file mode 100644
index 00000000000..937224eaa69
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr108273.c
@@ -0,0 +1,26 @@
+/* { dg-options "-O2 -fdbg-cnt=sched_block:1" } */
+/* { dg-prune-output {\*\*\*dbgcnt:.*limit.*reached} } */
+
+/* Verify there is no ICE.  */
+
+int a, b, c, e, f;
+float d;
+
+void
+g ()
+{
+  float h, i[1];
+  for (; f;)
+    if (c)
+      {
+	d *e;
+	if (b)
+	  {
+	    float *j = i;
+	    j[0] += 0;
+	  }
+	h += d;
+      }
+  if (h)
+    a = i[0];
+}
--
2.25.1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]
  2023-03-29  7:18     ` [PATCH v2] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273] Kewen.Lin
@ 2023-05-17  6:20       ` Kewen.Lin
  2023-06-15  6:39         ` PING^2 " Kewen.Lin
  0 siblings, 1 reply; 7+ messages in thread
From: Kewen.Lin @ 2023-05-17  6:20 UTC (permalink / raw)
  To: GCC Patches
  Cc: Segher Boessenkool, Peter Bergner, Richard Biener, Jeff Law,
	Vladimir Makarov, Richard Sandiford, Alexander Monakov

Hi,

I'd like to gentle ping this patch:

https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614818.html

BR,
Kewen

on 2023/3/29 15:18, Kewen.Lin via Gcc-patches wrote:
> Hi,
> 
> By addressing Alexander's comments, against v1 this
> patch v2 mainly:
> 
>   - Rename no_real_insns_p to no_real_nondebug_insns_p;
>   - Introduce enum rgn_bb_deps_free_action for three
>     kinds of actions to free deps;
>   - Change function free_deps_for_bb_no_real_insns_p to
>     resolve_forw_deps which only focuses on forward deps;
>   - Extend the handlings to cover dbg-cnt sched_block,
>     add one test case for it;
>   - Move free_trg_info call in schedule_region to an
>     appropriate place.
> 
> One thing I'm not sure about is the change in function
> sched_rgn_local_finish, currently the invocation to
> sched_rgn_local_free is guarded with !sel_sched_p (),
> so I just follow it, but the initialization of those
> structures (in sched_rgn_local_init) isn't guarded
> with !sel_sched_p (), it looks odd.
> 
> ----
> 
> As PR108273 shows, when there is one block which only has
> NOTE_P and LABEL_P insns at non-debug mode while has some
> extra DEBUG_INSN_P insns at debug mode, after scheduling
> it, the DFA states would be different between debug mode
> and non-debug mode.  Since at non-debug mode, the block
> meets no_real_insns_p, it gets skipped; while at debug
> mode, it gets scheduled, even it only has NOTE_P, LABEL_P
> and DEBUG_INSN_P, the call of function advance_one_cycle
> will change the DFA state.  PR108519 also shows this issue
> issue can be exposed by some scheduler changes.
> 
> This patch is to change function no_real_insns_p to
> function no_real_nondebug_insns_p by taking debug insn into
> account, which make us not try to schedule for the block
> having only NOTE_P, LABEL_P and DEBUG_INSN_P insns,
> resulting in consistent DFA states between non-debug and
> debug mode.
> 
> Changing no_real_insns_p to no_real_nondebug_insns_p caused
> ICE when doing free_block_dependencies, the root cause is
> that we create dependencies for debug insns, those
> dependencies are expected to be resolved during scheduling
> insns, but which gets skipped after this change.
> By checking the code, it looks it's reasonable to skip to
> compute block dependences for no_real_nondebug_insns_p
> blocks.  There is also another issue, which gets exposed
> in SPEC2017 bmks build at option -O2 -g, is that we could
> skip to schedule some block, which already gets dependency
> graph built so has dependencies computed and rgn_n_insns
> accumulated, then the later verification on if the graph
> becomes exhausted by scheduling would fail as follow:
> 
>   /* Sanity check: verify that all region insns were
>      scheduled.  */
>     gcc_assert (sched_rgn_n_insns == rgn_n_insns);
> 
> , and also some forward deps aren't resovled.
> 
> As Alexander pointed out, the current debug count handling
> also suffers the similar issue, so this patch handles these
> two cases together: one is for some block gets skipped by
> !dbg_cnt (sched_block), the other is for some block which
> is not no_real_nondebug_insns_p initially but becomes
> no_real_nondebug_insns_p due to speculative scheduling.
> 
> This patch can be bootstrapped and regress-tested on
> x86_64-redhat-linux, aarch64-linux-gnu and
> powerpc64{,le}-linux-gnu.
> 
> I also verified this patch can pass SPEC2017 both intrate
> and fprate bmks building at -g -O2/-O3.
> 
> Any thoughts?
> 
> BR,
> Kewen
> ----
> 	PR rtl-optimization/108273
> 
> gcc/ChangeLog:
> 
> 	* haifa-sched.cc (no_real_insns_p): Rename to ...
> 	(no_real_nondebug_insns_p): ... this, and consider DEBUG_INSN_P insn.
> 	* sched-ebb.cc (schedule_ebb): Replace no_real_insns_p with
> 	no_real_nondebug_insns_p.
> 	* sched-int.h (no_real_insns_p): Rename to ...
> 	(no_real_nondebug_insns_p): ... this.
> 	* sched-rgn.cc (enum rgn_bb_deps_free_action): New enum.
> 	(bb_deps_free_actions): New static variable.
> 	(compute_block_dependences): Skip for no_real_nondebug_insns_p.
> 	(resolve_forw_deps): New function.
> 	(free_block_dependencies): Check bb_deps_free_actions and call
> 	function resolve_forw_deps for RGN_BB_DEPS_FREE_ARTICIAL.
> 	(compute_priorities): Replace no_real_insns_p with
> 	no_real_nondebug_insns_p.
> 	(schedule_region): Replace no_real_insns_p with
> 	no_real_nondebug_insns_p, set RGN_BB_DEPS_FREE_ARTICIAL if the block
> 	get dependencies computed before but skipped now, fix up count
> 	sched_rgn_n_insns for it too.  Call free_trg_info when the block
> 	gets scheduled, and move sched_rgn_local_finish after the loop
> 	of free_block_dependencies loop.
> 	(sched_rgn_local_init): Allocate and compute bb_deps_free_actions.
> 	(sched_rgn_local_finish): Free bb_deps_free_actions.
> 	* sel-sched.cc (sel_region_target_finish): Replace no_real_insns_p with
> 	no_real_nondebug_insns_p.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/powerpc/pr108273.c: New test.
> ---
>  gcc/haifa-sched.cc                          |   9 +-
>  gcc/sched-ebb.cc                            |   2 +-
>  gcc/sched-int.h                             |   2 +-
>  gcc/sched-rgn.cc                            | 148 +++++++++++++++-----
>  gcc/sel-sched.cc                            |   3 +-
>  gcc/testsuite/gcc.target/powerpc/pr108273.c |  26 ++++
>  6 files changed, 150 insertions(+), 40 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108273.c
> 
> diff --git a/gcc/haifa-sched.cc b/gcc/haifa-sched.cc
> index 48b53776fa9..371de486eb0 100644
> --- a/gcc/haifa-sched.cc
> +++ b/gcc/haifa-sched.cc
> @@ -5033,14 +5033,17 @@ get_ebb_head_tail (basic_block beg, basic_block end,
>    *tailp = end_tail;
>  }
> 
> -/* Return nonzero if there are no real insns in the range [ HEAD, TAIL ].  */
> +/* Return nonzero if there are no real nondebug insns in the range
> +   [ HEAD, TAIL ].  */
> 
>  int
> -no_real_insns_p (const rtx_insn *head, const rtx_insn *tail)
> +no_real_nondebug_insns_p (const rtx_insn *head, const rtx_insn *tail)
>  {
>    while (head != NEXT_INSN (tail))
>      {
> -      if (!NOTE_P (head) && !LABEL_P (head))
> +      if (!NOTE_P (head)
> +	  && !LABEL_P (head)
> +	  && !DEBUG_INSN_P (head))
>  	return 0;
>        head = NEXT_INSN (head);
>      }
> diff --git a/gcc/sched-ebb.cc b/gcc/sched-ebb.cc
> index 3eb6a24f187..e3bb0d57159 100644
> --- a/gcc/sched-ebb.cc
> +++ b/gcc/sched-ebb.cc
> @@ -491,7 +491,7 @@ schedule_ebb (rtx_insn *head, rtx_insn *tail, bool modulo_scheduling)
>    first_bb = BLOCK_FOR_INSN (head);
>    last_bb = BLOCK_FOR_INSN (tail);
> 
> -  if (no_real_insns_p (head, tail))
> +  if (no_real_nondebug_insns_p (head, tail))
>      return BLOCK_FOR_INSN (tail);
> 
>    gcc_assert (INSN_P (head) && INSN_P (tail));
> diff --git a/gcc/sched-int.h b/gcc/sched-int.h
> index 97b7d2d319b..eb4e8acec9f 100644
> --- a/gcc/sched-int.h
> +++ b/gcc/sched-int.h
> @@ -1397,7 +1397,7 @@ extern void free_global_sched_pressure_data (void);
>  extern int haifa_classify_insn (const_rtx);
>  extern void get_ebb_head_tail (basic_block, basic_block,
>  			       rtx_insn **, rtx_insn **);
> -extern int no_real_insns_p (const rtx_insn *, const rtx_insn *);
> +extern int no_real_nondebug_insns_p (const rtx_insn *, const rtx_insn *);
> 
>  extern int insn_sched_cost (rtx_insn *);
>  extern int dep_cost_1 (dep_t, dw_t);
> diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc
> index f2751f62450..3508a26aeb6 100644
> --- a/gcc/sched-rgn.cc
> +++ b/gcc/sched-rgn.cc
> @@ -213,6 +213,22 @@ static int rgn_nr_edges;
>  /* Array of size rgn_nr_edges.  */
>  static edge *rgn_edges;
> 
> +/* Possible actions for dependencies freeing.  */
> +enum rgn_bb_deps_free_action
> +{
> +  /* This block doesn't get dependencies computed so don't need to free.  */
> +  RGN_BB_DEPS_FREE_NO,
> +  /* This block gets scheduled normally so free dependencies as usual.  */
> +  RGN_BB_DEPS_FREE_NORMAL,
> +  /* This block gets skipped in scheduling but has dependencies computed early,
> +     need to free the forward list articially.  */
> +  RGN_BB_DEPS_FREE_ARTICIAL
> +};
> +
> +/* For basic block i, bb_deps_free_actions[i] indicates which action needs
> +   to be taken for freeing its dependencies.  */
> +static enum rgn_bb_deps_free_action *bb_deps_free_actions;
> +
>  /* Mapping from each edge in the graph to its number in the rgn.  */
>  #define EDGE_TO_BIT(edge) ((int)(size_t)(edge)->aux)
>  #define SET_EDGE_TO_BIT(edge,nr) ((edge)->aux = (void *)(size_t)(nr))
> @@ -2730,6 +2746,15 @@ compute_block_dependences (int bb)
>    gcc_assert (EBB_FIRST_BB (bb) == EBB_LAST_BB (bb));
>    get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);
> 
> +  /* Don't compute block dependencies if there are no real nondebug insns.  */
> +  if (no_real_nondebug_insns_p (head, tail))
> +    {
> +      if (current_nr_blocks > 1)
> +	propagate_deps (bb, &tmp_deps);
> +      free_deps (&tmp_deps);
> +      return;
> +    }
> +
>    sched_analyze (&tmp_deps, head, tail);
> 
>    add_branch_dependences (head, tail);
> @@ -2744,6 +2769,24 @@ compute_block_dependences (int bb)
>      targetm.sched.dependencies_evaluation_hook (head, tail);
>  }
> 
> +/* Artificially resolve forward dependencies for instructions HEAD to TAIL.  */
> +
> +static void
> +resolve_forw_deps (rtx_insn *head, rtx_insn *tail)
> +{
> +  rtx_insn *insn;
> +  rtx_insn *next_tail = NEXT_INSN (tail);
> +  sd_iterator_def sd_it;
> +  dep_t dep;
> +
> +  /* There could be some insns which get skipped in scheduling but we compute
> +     dependencies for them previously, so make them resolved.  */
> +  for (insn = head; insn != next_tail; insn = NEXT_INSN (insn))
> +    for (sd_it = sd_iterator_start (insn, SD_LIST_FORW);
> +	 sd_iterator_cond (&sd_it, &dep);)
> +      sd_resolve_dep (sd_it);
> +}
> +
>  /* Free dependencies of instructions inside BB.  */
>  static void
>  free_block_dependencies (int bb)
> @@ -2753,9 +2796,12 @@ free_block_dependencies (int bb)
> 
>    get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);
> 
> -  if (no_real_insns_p (head, tail))
> +  if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_NO)
>      return;
> 
> +  if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_ARTICIAL)
> +    resolve_forw_deps (head, tail);
> +
>    sched_free_deps (head, tail, true);
>  }
> 
> @@ -3019,7 +3065,7 @@ compute_priorities (void)
>        gcc_assert (EBB_FIRST_BB (bb) == EBB_LAST_BB (bb));
>        get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);
> 
> -      if (no_real_insns_p (head, tail))
> +      if (no_real_nondebug_insns_p (head, tail))
>  	continue;
> 
>        rgn_n_insns += set_priorities (head, tail);
> @@ -3153,7 +3199,7 @@ schedule_region (int rgn)
> 
>  	  get_ebb_head_tail (first_bb, last_bb, &head, &tail);
> 
> -	  if (no_real_insns_p (head, tail))
> +	  if (no_real_nondebug_insns_p (head, tail))
>  	    {
>  	      gcc_assert (first_bb == last_bb);
>  	      continue;
> @@ -3173,44 +3219,62 @@ schedule_region (int rgn)
> 
>        get_ebb_head_tail (first_bb, last_bb, &head, &tail);
> 
> -      if (no_real_insns_p (head, tail))
> +      if (no_real_nondebug_insns_p (head, tail))
>  	{
>  	  gcc_assert (first_bb == last_bb);
>  	  save_state_for_fallthru_edge (last_bb, bb_state[first_bb->index]);
> -	  continue;
> +
> +	  if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_NO)
> +	    continue;
> +
> +	  /* As it's not no_real_nondebug_insns_p initially, then it has some
> +	     dependencies computed so free it articially.  */
> +	  bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_ARTICIAL;
>  	}
> +      else
> +	{
> +	  current_sched_info->prev_head = PREV_INSN (head);
> +	  current_sched_info->next_tail = NEXT_INSN (tail);
> 
> -      current_sched_info->prev_head = PREV_INSN (head);
> -      current_sched_info->next_tail = NEXT_INSN (tail);
> +	  remove_notes (head, tail);
> 
> -      remove_notes (head, tail);
> +	  unlink_bb_notes (first_bb, last_bb);
> 
> -      unlink_bb_notes (first_bb, last_bb);
> +	  target_bb = bb;
> 
> -      target_bb = bb;
> +	  gcc_assert (flag_schedule_interblock || current_nr_blocks == 1);
> +	  current_sched_info->queue_must_finish_empty = current_nr_blocks == 1;
> 
> -      gcc_assert (flag_schedule_interblock || current_nr_blocks == 1);
> -      current_sched_info->queue_must_finish_empty = current_nr_blocks == 1;
> +	  curr_bb = first_bb;
> +	  if (dbg_cnt (sched_block))
> +	    {
> +	      int saved_last_basic_block = last_basic_block_for_fn (cfun);
> 
> -      curr_bb = first_bb;
> -      if (dbg_cnt (sched_block))
> -        {
> -	  int saved_last_basic_block = last_basic_block_for_fn (cfun);
> +	      schedule_block (&curr_bb, bb_state[first_bb->index]);
> +	      gcc_assert (EBB_FIRST_BB (bb) == first_bb);
> +	      sched_rgn_n_insns += sched_n_insns;
> +	      realloc_bb_state_array (saved_last_basic_block);
> +	      save_state_for_fallthru_edge (last_bb, curr_state);
> 
> -	  schedule_block (&curr_bb, bb_state[first_bb->index]);
> -	  gcc_assert (EBB_FIRST_BB (bb) == first_bb);
> -	  sched_rgn_n_insns += sched_n_insns;
> -	  realloc_bb_state_array (saved_last_basic_block);
> -	  save_state_for_fallthru_edge (last_bb, curr_state);
> -        }
> -      else
> -        {
> -          sched_rgn_n_insns += rgn_n_insns;
> -        }
> +	      /* Clean up.  */
> +	      if (current_nr_blocks > 1)
> +		free_trg_info ();
> +	    }
> +	  else
> +	    bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_ARTICIAL;
> +	}
> 
> -      /* Clean up.  */
> -      if (current_nr_blocks > 1)
> -	free_trg_info ();
> +      /* We have counted this block when computing rgn_n_insns
> +	 previously, so need to fix up sched_rgn_n_insns now.  */
> +      if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_ARTICIAL)
> +	{
> +	  while (head != NEXT_INSN (tail))
> +	    {
> +	      if (INSN_P (head))
> +		sched_rgn_n_insns++;
> +	      head = NEXT_INSN (head);
> +	    }
> +	}
>      }
> 
>    /* Sanity check: verify that all region insns were scheduled.  */
> @@ -3218,13 +3282,13 @@ schedule_region (int rgn)
> 
>    sched_finish_ready_list ();
> 
> -  /* Done with this region.  */
> -  sched_rgn_local_finish ();
> -
>    /* Free dependencies.  */
>    for (bb = 0; bb < current_nr_blocks; ++bb)
>      free_block_dependencies (bb);
> 
> +  /* Done with this region.  */
> +  sched_rgn_local_finish ();
> +
>    gcc_assert (haifa_recovery_bb_ever_added_p
>  	      || deps_pools_are_empty_p ());
>  }
> @@ -3445,6 +3509,19 @@ sched_rgn_local_init (int rgn)
>  	    e->aux = NULL;
>          }
>      }
> +
> +  /* Initialize bb_deps_free_actions.  */
> +  bb_deps_free_actions
> +    = XNEWVEC (enum rgn_bb_deps_free_action, current_nr_blocks);
> +  for (bb = 0; bb < current_nr_blocks; bb++)
> +    {
> +      rtx_insn *head, *tail;
> +      get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);
> +      if (no_real_nondebug_insns_p (head, tail))
> +	bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_NO;
> +      else
> +	bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_NORMAL;
> +    }
>  }
> 
>  /* Free data computed for the finished region.  */
> @@ -3462,9 +3539,12 @@ sched_rgn_local_free (void)
>  void
>  sched_rgn_local_finish (void)
>  {
> -  if (current_nr_blocks > 1 && !sel_sched_p ())
> +  if (!sel_sched_p ())
>      {
> -      sched_rgn_local_free ();
> +      if (current_nr_blocks > 1)
> +	sched_rgn_local_free ();
> +
> +      free (bb_deps_free_actions);
>      }
>  }
> 
> diff --git a/gcc/sel-sched.cc b/gcc/sel-sched.cc
> index cb1cf75bfe4..04415590455 100644
> --- a/gcc/sel-sched.cc
> +++ b/gcc/sel-sched.cc
> @@ -7213,7 +7213,8 @@ sel_region_target_finish (bool reset_sched_cycles_p)
> 
>        find_ebb_boundaries (EBB_FIRST_BB (i), scheduled_blocks);
> 
> -      if (no_real_insns_p (current_sched_info->head, current_sched_info->tail))
> +      if (no_real_nondebug_insns_p (current_sched_info->head,
> +				    current_sched_info->tail))
>  	continue;
> 
>        if (reset_sched_cycles_p)
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr108273.c b/gcc/testsuite/gcc.target/powerpc/pr108273.c
> new file mode 100644
> index 00000000000..937224eaa69
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr108273.c
> @@ -0,0 +1,26 @@
> +/* { dg-options "-O2 -fdbg-cnt=sched_block:1" } */
> +/* { dg-prune-output {\*\*\*dbgcnt:.*limit.*reached} } */
> +
> +/* Verify there is no ICE.  */
> +
> +int a, b, c, e, f;
> +float d;
> +
> +void
> +g ()
> +{
> +  float h, i[1];
> +  for (; f;)
> +    if (c)
> +      {
> +	d *e;
> +	if (b)
> +	  {
> +	    float *j = i;
> +	    j[0] += 0;
> +	  }
> +	h += d;
> +      }
> +  if (h)
> +    a = i[0];
> +}
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* PING^2 [PATCH v2] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]
  2023-05-17  6:20       ` Kewen.Lin
@ 2023-06-15  6:39         ` Kewen.Lin
  2023-08-07 10:07           ` PING^3 " Kewen.Lin
  0 siblings, 1 reply; 7+ messages in thread
From: Kewen.Lin @ 2023-06-15  6:39 UTC (permalink / raw)
  To: GCC Patches
  Cc: Segher Boessenkool, Peter Bergner, Richard Biener, Jeff Law,
	Vladimir Makarov, Richard Sandiford, Alexander Monakov

Hi,

I'd like to gentle ping this patch:

https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614818.html

BR,
Kewen

> on 2023/3/29 15:18, Kewen.Lin via Gcc-patches wrote:
>> Hi,
>>
>> By addressing Alexander's comments, against v1 this
>> patch v2 mainly:
>>
>>   - Rename no_real_insns_p to no_real_nondebug_insns_p;
>>   - Introduce enum rgn_bb_deps_free_action for three
>>     kinds of actions to free deps;
>>   - Change function free_deps_for_bb_no_real_insns_p to
>>     resolve_forw_deps which only focuses on forward deps;
>>   - Extend the handlings to cover dbg-cnt sched_block,
>>     add one test case for it;
>>   - Move free_trg_info call in schedule_region to an
>>     appropriate place.
>>
>> One thing I'm not sure about is the change in function
>> sched_rgn_local_finish, currently the invocation to
>> sched_rgn_local_free is guarded with !sel_sched_p (),
>> so I just follow it, but the initialization of those
>> structures (in sched_rgn_local_init) isn't guarded
>> with !sel_sched_p (), it looks odd.
>>
>> ----
>>
>> As PR108273 shows, when there is one block which only has
>> NOTE_P and LABEL_P insns at non-debug mode while has some
>> extra DEBUG_INSN_P insns at debug mode, after scheduling
>> it, the DFA states would be different between debug mode
>> and non-debug mode.  Since at non-debug mode, the block
>> meets no_real_insns_p, it gets skipped; while at debug
>> mode, it gets scheduled, even it only has NOTE_P, LABEL_P
>> and DEBUG_INSN_P, the call of function advance_one_cycle
>> will change the DFA state.  PR108519 also shows this issue
>> issue can be exposed by some scheduler changes.
>>
>> This patch is to change function no_real_insns_p to
>> function no_real_nondebug_insns_p by taking debug insn into
>> account, which make us not try to schedule for the block
>> having only NOTE_P, LABEL_P and DEBUG_INSN_P insns,
>> resulting in consistent DFA states between non-debug and
>> debug mode.
>>
>> Changing no_real_insns_p to no_real_nondebug_insns_p caused
>> ICE when doing free_block_dependencies, the root cause is
>> that we create dependencies for debug insns, those
>> dependencies are expected to be resolved during scheduling
>> insns, but which gets skipped after this change.
>> By checking the code, it looks it's reasonable to skip to
>> compute block dependences for no_real_nondebug_insns_p
>> blocks.  There is also another issue, which gets exposed
>> in SPEC2017 bmks build at option -O2 -g, is that we could
>> skip to schedule some block, which already gets dependency
>> graph built so has dependencies computed and rgn_n_insns
>> accumulated, then the later verification on if the graph
>> becomes exhausted by scheduling would fail as follow:
>>
>>   /* Sanity check: verify that all region insns were
>>      scheduled.  */
>>     gcc_assert (sched_rgn_n_insns == rgn_n_insns);
>>
>> , and also some forward deps aren't resovled.
>>
>> As Alexander pointed out, the current debug count handling
>> also suffers the similar issue, so this patch handles these
>> two cases together: one is for some block gets skipped by
>> !dbg_cnt (sched_block), the other is for some block which
>> is not no_real_nondebug_insns_p initially but becomes
>> no_real_nondebug_insns_p due to speculative scheduling.
>>
>> This patch can be bootstrapped and regress-tested on
>> x86_64-redhat-linux, aarch64-linux-gnu and
>> powerpc64{,le}-linux-gnu.
>>
>> I also verified this patch can pass SPEC2017 both intrate
>> and fprate bmks building at -g -O2/-O3.
>>
>> Any thoughts?
>>
>> BR,
>> Kewen
>> ----
>> 	PR rtl-optimization/108273
>>
>> gcc/ChangeLog:
>>
>> 	* haifa-sched.cc (no_real_insns_p): Rename to ...
>> 	(no_real_nondebug_insns_p): ... this, and consider DEBUG_INSN_P insn.
>> 	* sched-ebb.cc (schedule_ebb): Replace no_real_insns_p with
>> 	no_real_nondebug_insns_p.
>> 	* sched-int.h (no_real_insns_p): Rename to ...
>> 	(no_real_nondebug_insns_p): ... this.
>> 	* sched-rgn.cc (enum rgn_bb_deps_free_action): New enum.
>> 	(bb_deps_free_actions): New static variable.
>> 	(compute_block_dependences): Skip for no_real_nondebug_insns_p.
>> 	(resolve_forw_deps): New function.
>> 	(free_block_dependencies): Check bb_deps_free_actions and call
>> 	function resolve_forw_deps for RGN_BB_DEPS_FREE_ARTICIAL.
>> 	(compute_priorities): Replace no_real_insns_p with
>> 	no_real_nondebug_insns_p.
>> 	(schedule_region): Replace no_real_insns_p with
>> 	no_real_nondebug_insns_p, set RGN_BB_DEPS_FREE_ARTICIAL if the block
>> 	get dependencies computed before but skipped now, fix up count
>> 	sched_rgn_n_insns for it too.  Call free_trg_info when the block
>> 	gets scheduled, and move sched_rgn_local_finish after the loop
>> 	of free_block_dependencies loop.
>> 	(sched_rgn_local_init): Allocate and compute bb_deps_free_actions.
>> 	(sched_rgn_local_finish): Free bb_deps_free_actions.
>> 	* sel-sched.cc (sel_region_target_finish): Replace no_real_insns_p with
>> 	no_real_nondebug_insns_p.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 	* gcc.target/powerpc/pr108273.c: New test.
>> ---
>>  gcc/haifa-sched.cc                          |   9 +-
>>  gcc/sched-ebb.cc                            |   2 +-
>>  gcc/sched-int.h                             |   2 +-
>>  gcc/sched-rgn.cc                            | 148 +++++++++++++++-----
>>  gcc/sel-sched.cc                            |   3 +-
>>  gcc/testsuite/gcc.target/powerpc/pr108273.c |  26 ++++
>>  6 files changed, 150 insertions(+), 40 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108273.c
>>
>> diff --git a/gcc/haifa-sched.cc b/gcc/haifa-sched.cc
>> index 48b53776fa9..371de486eb0 100644
>> --- a/gcc/haifa-sched.cc
>> +++ b/gcc/haifa-sched.cc
>> @@ -5033,14 +5033,17 @@ get_ebb_head_tail (basic_block beg, basic_block end,
>>    *tailp = end_tail;
>>  }
>>
>> -/* Return nonzero if there are no real insns in the range [ HEAD, TAIL ].  */
>> +/* Return nonzero if there are no real nondebug insns in the range
>> +   [ HEAD, TAIL ].  */
>>
>>  int
>> -no_real_insns_p (const rtx_insn *head, const rtx_insn *tail)
>> +no_real_nondebug_insns_p (const rtx_insn *head, const rtx_insn *tail)
>>  {
>>    while (head != NEXT_INSN (tail))
>>      {
>> -      if (!NOTE_P (head) && !LABEL_P (head))
>> +      if (!NOTE_P (head)
>> +	  && !LABEL_P (head)
>> +	  && !DEBUG_INSN_P (head))
>>  	return 0;
>>        head = NEXT_INSN (head);
>>      }
>> diff --git a/gcc/sched-ebb.cc b/gcc/sched-ebb.cc
>> index 3eb6a24f187..e3bb0d57159 100644
>> --- a/gcc/sched-ebb.cc
>> +++ b/gcc/sched-ebb.cc
>> @@ -491,7 +491,7 @@ schedule_ebb (rtx_insn *head, rtx_insn *tail, bool modulo_scheduling)
>>    first_bb = BLOCK_FOR_INSN (head);
>>    last_bb = BLOCK_FOR_INSN (tail);
>>
>> -  if (no_real_insns_p (head, tail))
>> +  if (no_real_nondebug_insns_p (head, tail))
>>      return BLOCK_FOR_INSN (tail);
>>
>>    gcc_assert (INSN_P (head) && INSN_P (tail));
>> diff --git a/gcc/sched-int.h b/gcc/sched-int.h
>> index 97b7d2d319b..eb4e8acec9f 100644
>> --- a/gcc/sched-int.h
>> +++ b/gcc/sched-int.h
>> @@ -1397,7 +1397,7 @@ extern void free_global_sched_pressure_data (void);
>>  extern int haifa_classify_insn (const_rtx);
>>  extern void get_ebb_head_tail (basic_block, basic_block,
>>  			       rtx_insn **, rtx_insn **);
>> -extern int no_real_insns_p (const rtx_insn *, const rtx_insn *);
>> +extern int no_real_nondebug_insns_p (const rtx_insn *, const rtx_insn *);
>>
>>  extern int insn_sched_cost (rtx_insn *);
>>  extern int dep_cost_1 (dep_t, dw_t);
>> diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc
>> index f2751f62450..3508a26aeb6 100644
>> --- a/gcc/sched-rgn.cc
>> +++ b/gcc/sched-rgn.cc
>> @@ -213,6 +213,22 @@ static int rgn_nr_edges;
>>  /* Array of size rgn_nr_edges.  */
>>  static edge *rgn_edges;
>>
>> +/* Possible actions for dependencies freeing.  */
>> +enum rgn_bb_deps_free_action
>> +{
>> +  /* This block doesn't get dependencies computed so don't need to free.  */
>> +  RGN_BB_DEPS_FREE_NO,
>> +  /* This block gets scheduled normally so free dependencies as usual.  */
>> +  RGN_BB_DEPS_FREE_NORMAL,
>> +  /* This block gets skipped in scheduling but has dependencies computed early,
>> +     need to free the forward list articially.  */
>> +  RGN_BB_DEPS_FREE_ARTICIAL
>> +};
>> +
>> +/* For basic block i, bb_deps_free_actions[i] indicates which action needs
>> +   to be taken for freeing its dependencies.  */
>> +static enum rgn_bb_deps_free_action *bb_deps_free_actions;
>> +
>>  /* Mapping from each edge in the graph to its number in the rgn.  */
>>  #define EDGE_TO_BIT(edge) ((int)(size_t)(edge)->aux)
>>  #define SET_EDGE_TO_BIT(edge,nr) ((edge)->aux = (void *)(size_t)(nr))
>> @@ -2730,6 +2746,15 @@ compute_block_dependences (int bb)
>>    gcc_assert (EBB_FIRST_BB (bb) == EBB_LAST_BB (bb));
>>    get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);
>>
>> +  /* Don't compute block dependencies if there are no real nondebug insns.  */
>> +  if (no_real_nondebug_insns_p (head, tail))
>> +    {
>> +      if (current_nr_blocks > 1)
>> +	propagate_deps (bb, &tmp_deps);
>> +      free_deps (&tmp_deps);
>> +      return;
>> +    }
>> +
>>    sched_analyze (&tmp_deps, head, tail);
>>
>>    add_branch_dependences (head, tail);
>> @@ -2744,6 +2769,24 @@ compute_block_dependences (int bb)
>>      targetm.sched.dependencies_evaluation_hook (head, tail);
>>  }
>>
>> +/* Artificially resolve forward dependencies for instructions HEAD to TAIL.  */
>> +
>> +static void
>> +resolve_forw_deps (rtx_insn *head, rtx_insn *tail)
>> +{
>> +  rtx_insn *insn;
>> +  rtx_insn *next_tail = NEXT_INSN (tail);
>> +  sd_iterator_def sd_it;
>> +  dep_t dep;
>> +
>> +  /* There could be some insns which get skipped in scheduling but we compute
>> +     dependencies for them previously, so make them resolved.  */
>> +  for (insn = head; insn != next_tail; insn = NEXT_INSN (insn))
>> +    for (sd_it = sd_iterator_start (insn, SD_LIST_FORW);
>> +	 sd_iterator_cond (&sd_it, &dep);)
>> +      sd_resolve_dep (sd_it);
>> +}
>> +
>>  /* Free dependencies of instructions inside BB.  */
>>  static void
>>  free_block_dependencies (int bb)
>> @@ -2753,9 +2796,12 @@ free_block_dependencies (int bb)
>>
>>    get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);
>>
>> -  if (no_real_insns_p (head, tail))
>> +  if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_NO)
>>      return;
>>
>> +  if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_ARTICIAL)
>> +    resolve_forw_deps (head, tail);
>> +
>>    sched_free_deps (head, tail, true);
>>  }
>>
>> @@ -3019,7 +3065,7 @@ compute_priorities (void)
>>        gcc_assert (EBB_FIRST_BB (bb) == EBB_LAST_BB (bb));
>>        get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);
>>
>> -      if (no_real_insns_p (head, tail))
>> +      if (no_real_nondebug_insns_p (head, tail))
>>  	continue;
>>
>>        rgn_n_insns += set_priorities (head, tail);
>> @@ -3153,7 +3199,7 @@ schedule_region (int rgn)
>>
>>  	  get_ebb_head_tail (first_bb, last_bb, &head, &tail);
>>
>> -	  if (no_real_insns_p (head, tail))
>> +	  if (no_real_nondebug_insns_p (head, tail))
>>  	    {
>>  	      gcc_assert (first_bb == last_bb);
>>  	      continue;
>> @@ -3173,44 +3219,62 @@ schedule_region (int rgn)
>>
>>        get_ebb_head_tail (first_bb, last_bb, &head, &tail);
>>
>> -      if (no_real_insns_p (head, tail))
>> +      if (no_real_nondebug_insns_p (head, tail))
>>  	{
>>  	  gcc_assert (first_bb == last_bb);
>>  	  save_state_for_fallthru_edge (last_bb, bb_state[first_bb->index]);
>> -	  continue;
>> +
>> +	  if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_NO)
>> +	    continue;
>> +
>> +	  /* As it's not no_real_nondebug_insns_p initially, then it has some
>> +	     dependencies computed so free it articially.  */
>> +	  bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_ARTICIAL;
>>  	}
>> +      else
>> +	{
>> +	  current_sched_info->prev_head = PREV_INSN (head);
>> +	  current_sched_info->next_tail = NEXT_INSN (tail);
>>
>> -      current_sched_info->prev_head = PREV_INSN (head);
>> -      current_sched_info->next_tail = NEXT_INSN (tail);
>> +	  remove_notes (head, tail);
>>
>> -      remove_notes (head, tail);
>> +	  unlink_bb_notes (first_bb, last_bb);
>>
>> -      unlink_bb_notes (first_bb, last_bb);
>> +	  target_bb = bb;
>>
>> -      target_bb = bb;
>> +	  gcc_assert (flag_schedule_interblock || current_nr_blocks == 1);
>> +	  current_sched_info->queue_must_finish_empty = current_nr_blocks == 1;
>>
>> -      gcc_assert (flag_schedule_interblock || current_nr_blocks == 1);
>> -      current_sched_info->queue_must_finish_empty = current_nr_blocks == 1;
>> +	  curr_bb = first_bb;
>> +	  if (dbg_cnt (sched_block))
>> +	    {
>> +	      int saved_last_basic_block = last_basic_block_for_fn (cfun);
>>
>> -      curr_bb = first_bb;
>> -      if (dbg_cnt (sched_block))
>> -        {
>> -	  int saved_last_basic_block = last_basic_block_for_fn (cfun);
>> +	      schedule_block (&curr_bb, bb_state[first_bb->index]);
>> +	      gcc_assert (EBB_FIRST_BB (bb) == first_bb);
>> +	      sched_rgn_n_insns += sched_n_insns;
>> +	      realloc_bb_state_array (saved_last_basic_block);
>> +	      save_state_for_fallthru_edge (last_bb, curr_state);
>>
>> -	  schedule_block (&curr_bb, bb_state[first_bb->index]);
>> -	  gcc_assert (EBB_FIRST_BB (bb) == first_bb);
>> -	  sched_rgn_n_insns += sched_n_insns;
>> -	  realloc_bb_state_array (saved_last_basic_block);
>> -	  save_state_for_fallthru_edge (last_bb, curr_state);
>> -        }
>> -      else
>> -        {
>> -          sched_rgn_n_insns += rgn_n_insns;
>> -        }
>> +	      /* Clean up.  */
>> +	      if (current_nr_blocks > 1)
>> +		free_trg_info ();
>> +	    }
>> +	  else
>> +	    bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_ARTICIAL;
>> +	}
>>
>> -      /* Clean up.  */
>> -      if (current_nr_blocks > 1)
>> -	free_trg_info ();
>> +      /* We have counted this block when computing rgn_n_insns
>> +	 previously, so need to fix up sched_rgn_n_insns now.  */
>> +      if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_ARTICIAL)
>> +	{
>> +	  while (head != NEXT_INSN (tail))
>> +	    {
>> +	      if (INSN_P (head))
>> +		sched_rgn_n_insns++;
>> +	      head = NEXT_INSN (head);
>> +	    }
>> +	}
>>      }
>>
>>    /* Sanity check: verify that all region insns were scheduled.  */
>> @@ -3218,13 +3282,13 @@ schedule_region (int rgn)
>>
>>    sched_finish_ready_list ();
>>
>> -  /* Done with this region.  */
>> -  sched_rgn_local_finish ();
>> -
>>    /* Free dependencies.  */
>>    for (bb = 0; bb < current_nr_blocks; ++bb)
>>      free_block_dependencies (bb);
>>
>> +  /* Done with this region.  */
>> +  sched_rgn_local_finish ();
>> +
>>    gcc_assert (haifa_recovery_bb_ever_added_p
>>  	      || deps_pools_are_empty_p ());
>>  }
>> @@ -3445,6 +3509,19 @@ sched_rgn_local_init (int rgn)
>>  	    e->aux = NULL;
>>          }
>>      }
>> +
>> +  /* Initialize bb_deps_free_actions.  */
>> +  bb_deps_free_actions
>> +    = XNEWVEC (enum rgn_bb_deps_free_action, current_nr_blocks);
>> +  for (bb = 0; bb < current_nr_blocks; bb++)
>> +    {
>> +      rtx_insn *head, *tail;
>> +      get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);
>> +      if (no_real_nondebug_insns_p (head, tail))
>> +	bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_NO;
>> +      else
>> +	bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_NORMAL;
>> +    }
>>  }
>>
>>  /* Free data computed for the finished region.  */
>> @@ -3462,9 +3539,12 @@ sched_rgn_local_free (void)
>>  void
>>  sched_rgn_local_finish (void)
>>  {
>> -  if (current_nr_blocks > 1 && !sel_sched_p ())
>> +  if (!sel_sched_p ())
>>      {
>> -      sched_rgn_local_free ();
>> +      if (current_nr_blocks > 1)
>> +	sched_rgn_local_free ();
>> +
>> +      free (bb_deps_free_actions);
>>      }
>>  }
>>
>> diff --git a/gcc/sel-sched.cc b/gcc/sel-sched.cc
>> index cb1cf75bfe4..04415590455 100644
>> --- a/gcc/sel-sched.cc
>> +++ b/gcc/sel-sched.cc
>> @@ -7213,7 +7213,8 @@ sel_region_target_finish (bool reset_sched_cycles_p)
>>
>>        find_ebb_boundaries (EBB_FIRST_BB (i), scheduled_blocks);
>>
>> -      if (no_real_insns_p (current_sched_info->head, current_sched_info->tail))
>> +      if (no_real_nondebug_insns_p (current_sched_info->head,
>> +				    current_sched_info->tail))
>>  	continue;
>>
>>        if (reset_sched_cycles_p)
>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr108273.c b/gcc/testsuite/gcc.target/powerpc/pr108273.c
>> new file mode 100644
>> index 00000000000..937224eaa69
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr108273.c
>> @@ -0,0 +1,26 @@
>> +/* { dg-options "-O2 -fdbg-cnt=sched_block:1" } */
>> +/* { dg-prune-output {\*\*\*dbgcnt:.*limit.*reached} } */
>> +
>> +/* Verify there is no ICE.  */
>> +
>> +int a, b, c, e, f;
>> +float d;
>> +
>> +void
>> +g ()
>> +{
>> +  float h, i[1];
>> +  for (; f;)
>> +    if (c)
>> +      {
>> +	d *e;
>> +	if (b)
>> +	  {
>> +	    float *j = i;
>> +	    j[0] += 0;
>> +	  }
>> +	h += d;
>> +      }
>> +  if (h)
>> +    a = i[0];
>> +}
>> --
>> 2.25.1
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* PING^3 [PATCH v2] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]
  2023-06-15  6:39         ` PING^2 " Kewen.Lin
@ 2023-08-07 10:07           ` Kewen.Lin
  0 siblings, 0 replies; 7+ messages in thread
From: Kewen.Lin @ 2023-08-07 10:07 UTC (permalink / raw)
  To: GCC Patches
  Cc: Segher Boessenkool, Peter Bergner, Richard Biener, Jeff Law,
	Vladimir Makarov, Richard Sandiford, Alexander Monakov

Hi,

I'd like to gentle ping this patch:

https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614818.html

BR,
Kewen

>> on 2023/3/29 15:18, Kewen.Lin via Gcc-patches wrote:
>>> Hi,
>>>
>>> By addressing Alexander's comments, against v1 this
>>> patch v2 mainly:
>>>
>>>   - Rename no_real_insns_p to no_real_nondebug_insns_p;
>>>   - Introduce enum rgn_bb_deps_free_action for three
>>>     kinds of actions to free deps;
>>>   - Change function free_deps_for_bb_no_real_insns_p to
>>>     resolve_forw_deps which only focuses on forward deps;
>>>   - Extend the handlings to cover dbg-cnt sched_block,
>>>     add one test case for it;
>>>   - Move free_trg_info call in schedule_region to an
>>>     appropriate place.
>>>
>>> One thing I'm not sure about is the change in function
>>> sched_rgn_local_finish, currently the invocation to
>>> sched_rgn_local_free is guarded with !sel_sched_p (),
>>> so I just follow it, but the initialization of those
>>> structures (in sched_rgn_local_init) isn't guarded
>>> with !sel_sched_p (), it looks odd.
>>>
>>> ----
>>>
>>> As PR108273 shows, when there is one block which only has
>>> NOTE_P and LABEL_P insns at non-debug mode while has some
>>> extra DEBUG_INSN_P insns at debug mode, after scheduling
>>> it, the DFA states would be different between debug mode
>>> and non-debug mode.  Since at non-debug mode, the block
>>> meets no_real_insns_p, it gets skipped; while at debug
>>> mode, it gets scheduled, even it only has NOTE_P, LABEL_P
>>> and DEBUG_INSN_P, the call of function advance_one_cycle
>>> will change the DFA state.  PR108519 also shows this issue
>>> issue can be exposed by some scheduler changes.
>>>
>>> This patch is to change function no_real_insns_p to
>>> function no_real_nondebug_insns_p by taking debug insn into
>>> account, which make us not try to schedule for the block
>>> having only NOTE_P, LABEL_P and DEBUG_INSN_P insns,
>>> resulting in consistent DFA states between non-debug and
>>> debug mode.
>>>
>>> Changing no_real_insns_p to no_real_nondebug_insns_p caused
>>> ICE when doing free_block_dependencies, the root cause is
>>> that we create dependencies for debug insns, those
>>> dependencies are expected to be resolved during scheduling
>>> insns, but which gets skipped after this change.
>>> By checking the code, it looks it's reasonable to skip to
>>> compute block dependences for no_real_nondebug_insns_p
>>> blocks.  There is also another issue, which gets exposed
>>> in SPEC2017 bmks build at option -O2 -g, is that we could
>>> skip to schedule some block, which already gets dependency
>>> graph built so has dependencies computed and rgn_n_insns
>>> accumulated, then the later verification on if the graph
>>> becomes exhausted by scheduling would fail as follow:
>>>
>>>   /* Sanity check: verify that all region insns were
>>>      scheduled.  */
>>>     gcc_assert (sched_rgn_n_insns == rgn_n_insns);
>>>
>>> , and also some forward deps aren't resovled.
>>>
>>> As Alexander pointed out, the current debug count handling
>>> also suffers the similar issue, so this patch handles these
>>> two cases together: one is for some block gets skipped by
>>> !dbg_cnt (sched_block), the other is for some block which
>>> is not no_real_nondebug_insns_p initially but becomes
>>> no_real_nondebug_insns_p due to speculative scheduling.
>>>
>>> This patch can be bootstrapped and regress-tested on
>>> x86_64-redhat-linux, aarch64-linux-gnu and
>>> powerpc64{,le}-linux-gnu.
>>>
>>> I also verified this patch can pass SPEC2017 both intrate
>>> and fprate bmks building at -g -O2/-O3.
>>>
>>> Any thoughts?
>>>
>>> BR,
>>> Kewen
>>> ----
>>> 	PR rtl-optimization/108273
>>>
>>> gcc/ChangeLog:
>>>
>>> 	* haifa-sched.cc (no_real_insns_p): Rename to ...
>>> 	(no_real_nondebug_insns_p): ... this, and consider DEBUG_INSN_P insn.
>>> 	* sched-ebb.cc (schedule_ebb): Replace no_real_insns_p with
>>> 	no_real_nondebug_insns_p.
>>> 	* sched-int.h (no_real_insns_p): Rename to ...
>>> 	(no_real_nondebug_insns_p): ... this.
>>> 	* sched-rgn.cc (enum rgn_bb_deps_free_action): New enum.
>>> 	(bb_deps_free_actions): New static variable.
>>> 	(compute_block_dependences): Skip for no_real_nondebug_insns_p.
>>> 	(resolve_forw_deps): New function.
>>> 	(free_block_dependencies): Check bb_deps_free_actions and call
>>> 	function resolve_forw_deps for RGN_BB_DEPS_FREE_ARTICIAL.
>>> 	(compute_priorities): Replace no_real_insns_p with
>>> 	no_real_nondebug_insns_p.
>>> 	(schedule_region): Replace no_real_insns_p with
>>> 	no_real_nondebug_insns_p, set RGN_BB_DEPS_FREE_ARTICIAL if the block
>>> 	get dependencies computed before but skipped now, fix up count
>>> 	sched_rgn_n_insns for it too.  Call free_trg_info when the block
>>> 	gets scheduled, and move sched_rgn_local_finish after the loop
>>> 	of free_block_dependencies loop.
>>> 	(sched_rgn_local_init): Allocate and compute bb_deps_free_actions.
>>> 	(sched_rgn_local_finish): Free bb_deps_free_actions.
>>> 	* sel-sched.cc (sel_region_target_finish): Replace no_real_insns_p with
>>> 	no_real_nondebug_insns_p.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> 	* gcc.target/powerpc/pr108273.c: New test.
>>> ---
>>>  gcc/haifa-sched.cc                          |   9 +-
>>>  gcc/sched-ebb.cc                            |   2 +-
>>>  gcc/sched-int.h                             |   2 +-
>>>  gcc/sched-rgn.cc                            | 148 +++++++++++++++-----
>>>  gcc/sel-sched.cc                            |   3 +-
>>>  gcc/testsuite/gcc.target/powerpc/pr108273.c |  26 ++++
>>>  6 files changed, 150 insertions(+), 40 deletions(-)
>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108273.c
>>>
>>> diff --git a/gcc/haifa-sched.cc b/gcc/haifa-sched.cc
>>> index 48b53776fa9..371de486eb0 100644
>>> --- a/gcc/haifa-sched.cc
>>> +++ b/gcc/haifa-sched.cc
>>> @@ -5033,14 +5033,17 @@ get_ebb_head_tail (basic_block beg, basic_block end,
>>>    *tailp = end_tail;
>>>  }
>>>
>>> -/* Return nonzero if there are no real insns in the range [ HEAD, TAIL ].  */
>>> +/* Return nonzero if there are no real nondebug insns in the range
>>> +   [ HEAD, TAIL ].  */
>>>
>>>  int
>>> -no_real_insns_p (const rtx_insn *head, const rtx_insn *tail)
>>> +no_real_nondebug_insns_p (const rtx_insn *head, const rtx_insn *tail)
>>>  {
>>>    while (head != NEXT_INSN (tail))
>>>      {
>>> -      if (!NOTE_P (head) && !LABEL_P (head))
>>> +      if (!NOTE_P (head)
>>> +	  && !LABEL_P (head)
>>> +	  && !DEBUG_INSN_P (head))
>>>  	return 0;
>>>        head = NEXT_INSN (head);
>>>      }
>>> diff --git a/gcc/sched-ebb.cc b/gcc/sched-ebb.cc
>>> index 3eb6a24f187..e3bb0d57159 100644
>>> --- a/gcc/sched-ebb.cc
>>> +++ b/gcc/sched-ebb.cc
>>> @@ -491,7 +491,7 @@ schedule_ebb (rtx_insn *head, rtx_insn *tail, bool modulo_scheduling)
>>>    first_bb = BLOCK_FOR_INSN (head);
>>>    last_bb = BLOCK_FOR_INSN (tail);
>>>
>>> -  if (no_real_insns_p (head, tail))
>>> +  if (no_real_nondebug_insns_p (head, tail))
>>>      return BLOCK_FOR_INSN (tail);
>>>
>>>    gcc_assert (INSN_P (head) && INSN_P (tail));
>>> diff --git a/gcc/sched-int.h b/gcc/sched-int.h
>>> index 97b7d2d319b..eb4e8acec9f 100644
>>> --- a/gcc/sched-int.h
>>> +++ b/gcc/sched-int.h
>>> @@ -1397,7 +1397,7 @@ extern void free_global_sched_pressure_data (void);
>>>  extern int haifa_classify_insn (const_rtx);
>>>  extern void get_ebb_head_tail (basic_block, basic_block,
>>>  			       rtx_insn **, rtx_insn **);
>>> -extern int no_real_insns_p (const rtx_insn *, const rtx_insn *);
>>> +extern int no_real_nondebug_insns_p (const rtx_insn *, const rtx_insn *);
>>>
>>>  extern int insn_sched_cost (rtx_insn *);
>>>  extern int dep_cost_1 (dep_t, dw_t);
>>> diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc
>>> index f2751f62450..3508a26aeb6 100644
>>> --- a/gcc/sched-rgn.cc
>>> +++ b/gcc/sched-rgn.cc
>>> @@ -213,6 +213,22 @@ static int rgn_nr_edges;
>>>  /* Array of size rgn_nr_edges.  */
>>>  static edge *rgn_edges;
>>>
>>> +/* Possible actions for dependencies freeing.  */
>>> +enum rgn_bb_deps_free_action
>>> +{
>>> +  /* This block doesn't get dependencies computed so don't need to free.  */
>>> +  RGN_BB_DEPS_FREE_NO,
>>> +  /* This block gets scheduled normally so free dependencies as usual.  */
>>> +  RGN_BB_DEPS_FREE_NORMAL,
>>> +  /* This block gets skipped in scheduling but has dependencies computed early,
>>> +     need to free the forward list articially.  */
>>> +  RGN_BB_DEPS_FREE_ARTICIAL
>>> +};
>>> +
>>> +/* For basic block i, bb_deps_free_actions[i] indicates which action needs
>>> +   to be taken for freeing its dependencies.  */
>>> +static enum rgn_bb_deps_free_action *bb_deps_free_actions;
>>> +
>>>  /* Mapping from each edge in the graph to its number in the rgn.  */
>>>  #define EDGE_TO_BIT(edge) ((int)(size_t)(edge)->aux)
>>>  #define SET_EDGE_TO_BIT(edge,nr) ((edge)->aux = (void *)(size_t)(nr))
>>> @@ -2730,6 +2746,15 @@ compute_block_dependences (int bb)
>>>    gcc_assert (EBB_FIRST_BB (bb) == EBB_LAST_BB (bb));
>>>    get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);
>>>
>>> +  /* Don't compute block dependencies if there are no real nondebug insns.  */
>>> +  if (no_real_nondebug_insns_p (head, tail))
>>> +    {
>>> +      if (current_nr_blocks > 1)
>>> +	propagate_deps (bb, &tmp_deps);
>>> +      free_deps (&tmp_deps);
>>> +      return;
>>> +    }
>>> +
>>>    sched_analyze (&tmp_deps, head, tail);
>>>
>>>    add_branch_dependences (head, tail);
>>> @@ -2744,6 +2769,24 @@ compute_block_dependences (int bb)
>>>      targetm.sched.dependencies_evaluation_hook (head, tail);
>>>  }
>>>
>>> +/* Artificially resolve forward dependencies for instructions HEAD to TAIL.  */
>>> +
>>> +static void
>>> +resolve_forw_deps (rtx_insn *head, rtx_insn *tail)
>>> +{
>>> +  rtx_insn *insn;
>>> +  rtx_insn *next_tail = NEXT_INSN (tail);
>>> +  sd_iterator_def sd_it;
>>> +  dep_t dep;
>>> +
>>> +  /* There could be some insns which get skipped in scheduling but we compute
>>> +     dependencies for them previously, so make them resolved.  */
>>> +  for (insn = head; insn != next_tail; insn = NEXT_INSN (insn))
>>> +    for (sd_it = sd_iterator_start (insn, SD_LIST_FORW);
>>> +	 sd_iterator_cond (&sd_it, &dep);)
>>> +      sd_resolve_dep (sd_it);
>>> +}
>>> +
>>>  /* Free dependencies of instructions inside BB.  */
>>>  static void
>>>  free_block_dependencies (int bb)
>>> @@ -2753,9 +2796,12 @@ free_block_dependencies (int bb)
>>>
>>>    get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);
>>>
>>> -  if (no_real_insns_p (head, tail))
>>> +  if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_NO)
>>>      return;
>>>
>>> +  if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_ARTICIAL)
>>> +    resolve_forw_deps (head, tail);
>>> +
>>>    sched_free_deps (head, tail, true);
>>>  }
>>>
>>> @@ -3019,7 +3065,7 @@ compute_priorities (void)
>>>        gcc_assert (EBB_FIRST_BB (bb) == EBB_LAST_BB (bb));
>>>        get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);
>>>
>>> -      if (no_real_insns_p (head, tail))
>>> +      if (no_real_nondebug_insns_p (head, tail))
>>>  	continue;
>>>
>>>        rgn_n_insns += set_priorities (head, tail);
>>> @@ -3153,7 +3199,7 @@ schedule_region (int rgn)
>>>
>>>  	  get_ebb_head_tail (first_bb, last_bb, &head, &tail);
>>>
>>> -	  if (no_real_insns_p (head, tail))
>>> +	  if (no_real_nondebug_insns_p (head, tail))
>>>  	    {
>>>  	      gcc_assert (first_bb == last_bb);
>>>  	      continue;
>>> @@ -3173,44 +3219,62 @@ schedule_region (int rgn)
>>>
>>>        get_ebb_head_tail (first_bb, last_bb, &head, &tail);
>>>
>>> -      if (no_real_insns_p (head, tail))
>>> +      if (no_real_nondebug_insns_p (head, tail))
>>>  	{
>>>  	  gcc_assert (first_bb == last_bb);
>>>  	  save_state_for_fallthru_edge (last_bb, bb_state[first_bb->index]);
>>> -	  continue;
>>> +
>>> +	  if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_NO)
>>> +	    continue;
>>> +
>>> +	  /* As it's not no_real_nondebug_insns_p initially, then it has some
>>> +	     dependencies computed so free it articially.  */
>>> +	  bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_ARTICIAL;
>>>  	}
>>> +      else
>>> +	{
>>> +	  current_sched_info->prev_head = PREV_INSN (head);
>>> +	  current_sched_info->next_tail = NEXT_INSN (tail);
>>>
>>> -      current_sched_info->prev_head = PREV_INSN (head);
>>> -      current_sched_info->next_tail = NEXT_INSN (tail);
>>> +	  remove_notes (head, tail);
>>>
>>> -      remove_notes (head, tail);
>>> +	  unlink_bb_notes (first_bb, last_bb);
>>>
>>> -      unlink_bb_notes (first_bb, last_bb);
>>> +	  target_bb = bb;
>>>
>>> -      target_bb = bb;
>>> +	  gcc_assert (flag_schedule_interblock || current_nr_blocks == 1);
>>> +	  current_sched_info->queue_must_finish_empty = current_nr_blocks == 1;
>>>
>>> -      gcc_assert (flag_schedule_interblock || current_nr_blocks == 1);
>>> -      current_sched_info->queue_must_finish_empty = current_nr_blocks == 1;
>>> +	  curr_bb = first_bb;
>>> +	  if (dbg_cnt (sched_block))
>>> +	    {
>>> +	      int saved_last_basic_block = last_basic_block_for_fn (cfun);
>>>
>>> -      curr_bb = first_bb;
>>> -      if (dbg_cnt (sched_block))
>>> -        {
>>> -	  int saved_last_basic_block = last_basic_block_for_fn (cfun);
>>> +	      schedule_block (&curr_bb, bb_state[first_bb->index]);
>>> +	      gcc_assert (EBB_FIRST_BB (bb) == first_bb);
>>> +	      sched_rgn_n_insns += sched_n_insns;
>>> +	      realloc_bb_state_array (saved_last_basic_block);
>>> +	      save_state_for_fallthru_edge (last_bb, curr_state);
>>>
>>> -	  schedule_block (&curr_bb, bb_state[first_bb->index]);
>>> -	  gcc_assert (EBB_FIRST_BB (bb) == first_bb);
>>> -	  sched_rgn_n_insns += sched_n_insns;
>>> -	  realloc_bb_state_array (saved_last_basic_block);
>>> -	  save_state_for_fallthru_edge (last_bb, curr_state);
>>> -        }
>>> -      else
>>> -        {
>>> -          sched_rgn_n_insns += rgn_n_insns;
>>> -        }
>>> +	      /* Clean up.  */
>>> +	      if (current_nr_blocks > 1)
>>> +		free_trg_info ();
>>> +	    }
>>> +	  else
>>> +	    bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_ARTICIAL;
>>> +	}
>>>
>>> -      /* Clean up.  */
>>> -      if (current_nr_blocks > 1)
>>> -	free_trg_info ();
>>> +      /* We have counted this block when computing rgn_n_insns
>>> +	 previously, so need to fix up sched_rgn_n_insns now.  */
>>> +      if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_ARTICIAL)
>>> +	{
>>> +	  while (head != NEXT_INSN (tail))
>>> +	    {
>>> +	      if (INSN_P (head))
>>> +		sched_rgn_n_insns++;
>>> +	      head = NEXT_INSN (head);
>>> +	    }
>>> +	}
>>>      }
>>>
>>>    /* Sanity check: verify that all region insns were scheduled.  */
>>> @@ -3218,13 +3282,13 @@ schedule_region (int rgn)
>>>
>>>    sched_finish_ready_list ();
>>>
>>> -  /* Done with this region.  */
>>> -  sched_rgn_local_finish ();
>>> -
>>>    /* Free dependencies.  */
>>>    for (bb = 0; bb < current_nr_blocks; ++bb)
>>>      free_block_dependencies (bb);
>>>
>>> +  /* Done with this region.  */
>>> +  sched_rgn_local_finish ();
>>> +
>>>    gcc_assert (haifa_recovery_bb_ever_added_p
>>>  	      || deps_pools_are_empty_p ());
>>>  }
>>> @@ -3445,6 +3509,19 @@ sched_rgn_local_init (int rgn)
>>>  	    e->aux = NULL;
>>>          }
>>>      }
>>> +
>>> +  /* Initialize bb_deps_free_actions.  */
>>> +  bb_deps_free_actions
>>> +    = XNEWVEC (enum rgn_bb_deps_free_action, current_nr_blocks);
>>> +  for (bb = 0; bb < current_nr_blocks; bb++)
>>> +    {
>>> +      rtx_insn *head, *tail;
>>> +      get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);
>>> +      if (no_real_nondebug_insns_p (head, tail))
>>> +	bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_NO;
>>> +      else
>>> +	bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_NORMAL;
>>> +    }
>>>  }
>>>
>>>  /* Free data computed for the finished region.  */
>>> @@ -3462,9 +3539,12 @@ sched_rgn_local_free (void)
>>>  void
>>>  sched_rgn_local_finish (void)
>>>  {
>>> -  if (current_nr_blocks > 1 && !sel_sched_p ())
>>> +  if (!sel_sched_p ())
>>>      {
>>> -      sched_rgn_local_free ();
>>> +      if (current_nr_blocks > 1)
>>> +	sched_rgn_local_free ();
>>> +
>>> +      free (bb_deps_free_actions);
>>>      }
>>>  }
>>>
>>> diff --git a/gcc/sel-sched.cc b/gcc/sel-sched.cc
>>> index cb1cf75bfe4..04415590455 100644
>>> --- a/gcc/sel-sched.cc
>>> +++ b/gcc/sel-sched.cc
>>> @@ -7213,7 +7213,8 @@ sel_region_target_finish (bool reset_sched_cycles_p)
>>>
>>>        find_ebb_boundaries (EBB_FIRST_BB (i), scheduled_blocks);
>>>
>>> -      if (no_real_insns_p (current_sched_info->head, current_sched_info->tail))
>>> +      if (no_real_nondebug_insns_p (current_sched_info->head,
>>> +				    current_sched_info->tail))
>>>  	continue;
>>>
>>>        if (reset_sched_cycles_p)
>>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr108273.c b/gcc/testsuite/gcc.target/powerpc/pr108273.c
>>> new file mode 100644
>>> index 00000000000..937224eaa69
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/powerpc/pr108273.c
>>> @@ -0,0 +1,26 @@
>>> +/* { dg-options "-O2 -fdbg-cnt=sched_block:1" } */
>>> +/* { dg-prune-output {\*\*\*dbgcnt:.*limit.*reached} } */
>>> +
>>> +/* Verify there is no ICE.  */
>>> +
>>> +int a, b, c, e, f;
>>> +float d;
>>> +
>>> +void
>>> +g ()
>>> +{
>>> +  float h, i[1];
>>> +  for (; f;)
>>> +    if (c)
>>> +      {
>>> +	d *e;
>>> +	if (b)
>>> +	  {
>>> +	    float *j = i;
>>> +	    j[0] += 0;
>>> +	  }
>>> +	h += d;
>>> +      }
>>> +  if (h)
>>> +    a = i[0];
>>> +}
>>> --
>>> 2.25.1
>>


BR,
Kewen

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-08-07 10:08 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-20  6:31 [RFC/PATCH] sched: Consider debug insn in no_real_insns_p [PR108273] Kewen.Lin
2023-03-20  8:03 ` Alexander Monakov
2023-03-20  9:48   ` Kewen.Lin
2023-03-29  7:18     ` [PATCH v2] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273] Kewen.Lin
2023-05-17  6:20       ` Kewen.Lin
2023-06-15  6:39         ` PING^2 " Kewen.Lin
2023-08-07 10:07           ` PING^3 " Kewen.Lin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).