public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH][RFC] More update-ssa speedup
@ 2022-07-07 13:33 Richard Biener
  0 siblings, 0 replies; 4+ messages in thread
From: Richard Biener @ 2022-07-07 13:33 UTC (permalink / raw)
  To: gcc-patches

When we do TODO_update_ssa_no_phi we already avoid computing
dominance frontiers for all blocks - it is worth to also avoid
walking all dominated blocks in the update domwalk and restrict
the walk to the SEME region with the affected blocks.  We can
do that by walking the CFG in reverse from blocks_to_update to
the common immediate dominator, marking blocks in the region
and telling the domwalk to STOP when leaving it.

For an artificial testcase with N adjacent loops with one
unswitching opportunity that takes the incremental SSA updating
off the -ftime-report radar:

 tree loop unswitching              :  11.25 (  3%)   0.09 (  5%)  11.53 (  3%)    36M (  9%)
 `- tree SSA incremental            :  35.74 (  9%)   0.07 (  4%)  36.65 (  9%)  2734k (  1%)

improves to

 tree loop unswitching              :  10.21 (  3%)   0.05 (  3%)  11.50 (  3%)    36M (  9%)
 `- tree SSA incremental            :   0.66 (  0%)   0.02 (  1%)   0.49 (  0%)  2734k (  1%)

for less localized updates the SEME region isn't likely constrained
enough so I've restricted the extra work to TODO_update_ssa_no_phi
callers.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

It's probably the last change that makes a visible difference for
general update-ssa and any specialized manual updating that would be
possible as well would not show up in better numbers than above
(unless I manage to complicate the testcase more).

Comments?

Thanks,
Richard.

	* tree-into-ssa.cc (rewrite_mode::REWRITE_UPDATE_REGION): New.
	(rewrite_update_dom_walker::rewrite_update_dom_walker): Update.
	(rewrite_update_dom_walker::m_in_region_flag): New.
	(rewrite_update_dom_walker::before_dom_children): If the region
	to update is marked, STOP at exits.
	(rewrite_blocks): For REWRITE_UPDATE_REGION mark the region
	to be updated.
	(dump_update_ssa): Use bitmap_empty_p.
	(update_ssa): Likewise.  Use REWRITE_UPDATE_REGION when
	TODO_update_ssa_no_phi.
	* tree-cfgcleanup.cc (cleanup_tree_cfg_noloop): Account
	pending update_ssa to the caller.
---
 gcc/tree-cfgcleanup.cc |  6 ++-
 gcc/tree-into-ssa.cc   | 97 ++++++++++++++++++++++++++++++++++++------
 2 files changed, 90 insertions(+), 13 deletions(-)

diff --git a/gcc/tree-cfgcleanup.cc b/gcc/tree-cfgcleanup.cc
index b9ff6896ce6..3535a7e28a4 100644
--- a/gcc/tree-cfgcleanup.cc
+++ b/gcc/tree-cfgcleanup.cc
@@ -1095,7 +1095,11 @@ cleanup_tree_cfg_noloop (unsigned ssa_update_flags)
   /* After doing the above SSA form should be valid (or an update SSA
      should be required).  */
   if (ssa_update_flags)
-    update_ssa (ssa_update_flags);
+    {
+      timevar_pop (TV_TREE_CLEANUP_CFG);
+      update_ssa (ssa_update_flags);
+      timevar_push (TV_TREE_CLEANUP_CFG);
+    }
 
   /* Compute dominator info which we need for the iterative process below.  */
   if (!dom_info_available_p (CDI_DOMINATORS))
diff --git a/gcc/tree-into-ssa.cc b/gcc/tree-into-ssa.cc
index 9f45e62c6d0..be71b629f97 100644
--- a/gcc/tree-into-ssa.cc
+++ b/gcc/tree-into-ssa.cc
@@ -240,7 +240,8 @@ enum rewrite_mode {
 
     /* Incrementally update the SSA web by replacing existing SSA
        names with new ones.  See update_ssa for details.  */
-    REWRITE_UPDATE
+    REWRITE_UPDATE,
+    REWRITE_UPDATE_REGION
 };
 
 /* The set of symbols we ought to re-write into SSA form in update_ssa.  */
@@ -2155,11 +2156,14 @@ rewrite_update_phi_arguments (basic_block bb)
 class rewrite_update_dom_walker : public dom_walker
 {
 public:
-  rewrite_update_dom_walker (cdi_direction direction)
-    : dom_walker (direction, ALL_BLOCKS, (int *)(uintptr_t)-1) {}
+  rewrite_update_dom_walker (cdi_direction direction, int in_region_flag = -1)
+    : dom_walker (direction, ALL_BLOCKS, (int *)(uintptr_t)-1),
+      m_in_region_flag (in_region_flag) {}
 
   edge before_dom_children (basic_block) final override;
   void after_dom_children (basic_block) final override;
+
+  int m_in_region_flag;
 };
 
 /* Initialization of block data structures for the incremental SSA
@@ -2179,6 +2183,10 @@ rewrite_update_dom_walker::before_dom_children (basic_block bb)
   /* Mark the unwind point for this block.  */
   block_defs_stack.safe_push (NULL_TREE);
 
+  if (m_in_region_flag != -1
+      && !(bb->flags & m_in_region_flag))
+    return STOP;
+
   if (!bitmap_bit_p (blocks_to_update, bb->index))
     return NULL;
 
@@ -2270,8 +2278,8 @@ rewrite_update_dom_walker::after_dom_children (basic_block bb ATTRIBUTE_UNUSED)
    WHAT indicates what actions will be taken by the renamer (see enum
       rewrite_mode).
 
-   BLOCKS are the set of interesting blocks for the dominator walker
-      to process.  If this set is NULL, then all the nodes dominated
+   REGION is a SEME region of interesting blocks for the dominator walker
+      to process.  If this set is invalid, then all the nodes dominated
       by ENTRY are walked.  Otherwise, blocks dominated by ENTRY that
       are not present in BLOCKS are ignored.  */
 
@@ -2283,9 +2291,71 @@ rewrite_blocks (basic_block entry, enum rewrite_mode what)
   /* Recursively walk the dominator tree rewriting each statement in
      each basic block.  */
   if (what == REWRITE_ALL)
-      rewrite_dom_walker (CDI_DOMINATORS).walk (entry);
+    rewrite_dom_walker (CDI_DOMINATORS).walk (entry);
   else if (what == REWRITE_UPDATE)
-      rewrite_update_dom_walker (CDI_DOMINATORS).walk (entry);
+    rewrite_update_dom_walker (CDI_DOMINATORS).walk (entry);
+  else if (what == REWRITE_UPDATE_REGION)
+    {
+      /* First mark all blocks in the SEME region dominated by
+	 entry and exited by blocks not backwards reachable from
+	 blocks_to_update.  Optimize for dense blocks_to_update
+	 so instead of seeding the worklist with a copy of
+	 blocks_to_update treat those blocks explicit.  */
+      auto_bb_flag in_region (cfun);
+      auto_vec<basic_block, 64> extra_rgn;
+      bitmap_iterator bi;
+      unsigned int idx;
+      EXECUTE_IF_SET_IN_BITMAP (blocks_to_update, 0, idx, bi)
+	{
+	  basic_block bb = BASIC_BLOCK_FOR_FN (cfun, idx);
+	  bb->flags |= in_region;
+	}
+      auto_bitmap worklist;
+      EXECUTE_IF_SET_IN_BITMAP (blocks_to_update, 0, idx, bi)
+	{
+	  basic_block bb = BASIC_BLOCK_FOR_FN (cfun, idx);
+	  if (bb != entry)
+	    {
+	      edge_iterator ei;
+	      edge e;
+	      FOR_EACH_EDGE (e, ei, bb->preds)
+		{
+		  if ((e->src->flags & in_region)
+		      || dominated_by_p (CDI_DOMINATORS, e->src, bb))
+		    continue;
+		  bitmap_set_bit (worklist, e->src->index);
+		}
+	    }
+	}
+      while (!bitmap_empty_p (worklist))
+	{
+	  int idx = bitmap_first_set_bit (worklist);
+	  bitmap_clear_bit (worklist, idx);
+	  basic_block bb = BASIC_BLOCK_FOR_FN (cfun, idx);
+	  bb->flags |= in_region;
+	  extra_rgn.safe_push (bb);
+	  if (bb != entry)
+	    {
+	      edge_iterator ei;
+	      edge e;
+	      FOR_EACH_EDGE (e, ei, bb->preds)
+		{
+		  if ((e->src->flags & in_region)
+		      || dominated_by_p (CDI_DOMINATORS, e->src, bb))
+		    continue;
+		  bitmap_set_bit (worklist, e->src->index);
+		}
+	    }
+	}
+      rewrite_update_dom_walker (CDI_DOMINATORS, in_region).walk (entry);
+      EXECUTE_IF_SET_IN_BITMAP (blocks_to_update, 0, idx, bi)
+	{
+	  basic_block bb = BASIC_BLOCK_FOR_FN (cfun, idx);
+	  bb->flags &= ~in_region;
+	}
+      for (auto bb : extra_rgn)
+	bb->flags &= ~in_region;
+    }
   else
     gcc_unreachable ();
 
@@ -2879,7 +2949,7 @@ dump_update_ssa (FILE *file)
   if (!need_ssa_update_p (cfun))
     return;
 
-  if (new_ssa_names && bitmap_first_set_bit (new_ssa_names) >= 0)
+  if (new_ssa_names && !bitmap_empty_p (new_ssa_names))
     {
       sbitmap_iterator sbi;
 
@@ -3389,7 +3459,7 @@ update_ssa (unsigned update_flags)
   /* If there are names defined in the replacement table, prepare
      definition and use sites for all the names in NEW_SSA_NAMES and
      OLD_SSA_NAMES.  */
-  if (bitmap_first_set_bit (new_ssa_names) >= 0)
+  if (!bitmap_empty_p (new_ssa_names))
     {
       statistics_counter_event (cfun, "Incremental SSA update", 1);
 
@@ -3398,7 +3468,7 @@ update_ssa (unsigned update_flags)
       /* If all the names in NEW_SSA_NAMES had been marked for
 	 removal, and there are no symbols to rename, then there's
 	 nothing else to do.  */
-      if (bitmap_first_set_bit (new_ssa_names) < 0
+      if (bitmap_empty_p (new_ssa_names)
 	  && !cfun->gimple_df->ssa_renaming_needed)
 	goto done;
     }
@@ -3503,8 +3573,11 @@ update_ssa (unsigned update_flags)
   FOR_EACH_VEC_ELT (symbols_to_rename, i, sym)
     get_var_info (sym)->info.current_def = NULL_TREE;
 
-  /* Now start the renaming process at START_BB.  */
-  rewrite_blocks (start_bb, REWRITE_UPDATE);
+  /* Now start the renaming process at START_BB.  When not inserting PHIs
+     and thus we are avoiding work on all blocks, try to confine the
+     rewriting domwalk to the affected region, otherwise it's not worth it.  */
+  rewrite_blocks (start_bb,
+		  insert_phi_p ? REWRITE_UPDATE : REWRITE_UPDATE_REGION);
 
   /* Debugging dumps.  */
   if (dump_file)
-- 
2.35.3

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH][RFC] More update-ssa speedup
       [not found] <20220707133323.1C4993856094@sourceware.org>
@ 2022-07-07 14:29 ` Jeff Law
  0 siblings, 0 replies; 4+ messages in thread
From: Jeff Law @ 2022-07-07 14:29 UTC (permalink / raw)
  To: gcc-patches



On 7/7/2022 7:33 AM, Richard Biener via Gcc-patches wrote:
> When we do TODO_update_ssa_no_phi we already avoid computing
> dominance frontiers for all blocks - it is worth to also avoid
> walking all dominated blocks in the update domwalk and restrict
> the walk to the SEME region with the affected blocks.  We can
> do that by walking the CFG in reverse from blocks_to_update to
> the common immediate dominator, marking blocks in the region
> and telling the domwalk to STOP when leaving it.
>
> For an artificial testcase with N adjacent loops with one
> unswitching opportunity that takes the incremental SSA updating
> off the -ftime-report radar:
>
>   tree loop unswitching              :  11.25 (  3%)   0.09 (  5%)  11.53 (  3%)    36M (  9%)
>   `- tree SSA incremental            :  35.74 (  9%)   0.07 (  4%)  36.65 (  9%)  2734k (  1%)
>
> improves to
>
>   tree loop unswitching              :  10.21 (  3%)   0.05 (  3%)  11.50 (  3%)    36M (  9%)
>   `- tree SSA incremental            :   0.66 (  0%)   0.02 (  1%)   0.49 (  0%)  2734k (  1%)
>
> for less localized updates the SEME region isn't likely constrained
> enough so I've restricted the extra work to TODO_update_ssa_no_phi
> callers.
>
> Bootstrap & regtest running on x86_64-unknown-linux-gnu.
>
> It's probably the last change that makes a visible difference for
> general update-ssa and any specialized manual updating that would be
> possible as well would not show up in better numbers than above
> (unless I manage to complicate the testcase more).
>
> Comments?
>
> Thanks,
> Richard.
>
> 	* tree-into-ssa.cc (rewrite_mode::REWRITE_UPDATE_REGION): New.
> 	(rewrite_update_dom_walker::rewrite_update_dom_walker): Update.
> 	(rewrite_update_dom_walker::m_in_region_flag): New.
> 	(rewrite_update_dom_walker::before_dom_children): If the region
> 	to update is marked, STOP at exits.
> 	(rewrite_blocks): For REWRITE_UPDATE_REGION mark the region
> 	to be updated.
> 	(dump_update_ssa): Use bitmap_empty_p.
> 	(update_ssa): Likewise.  Use REWRITE_UPDATE_REGION when
> 	TODO_update_ssa_no_phi.
> 	* tree-cfgcleanup.cc (cleanup_tree_cfg_noloop): Account
> 	pending update_ssa to the caller.
Overall concept seems quite reasonable to me -- it also largely mirrors 
ideas I was exploring for incremental DOM many years ago.

Jeff

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH][RFC] More update-ssa speedup
@ 2022-07-07 13:33 Richard Biener
  0 siblings, 0 replies; 4+ messages in thread
From: Richard Biener @ 2022-07-07 13:33 UTC (permalink / raw)
  To: gcc-patches

When we do TODO_update_ssa_no_phi we already avoid computing
dominance frontiers for all blocks - it is worth to also avoid
walking all dominated blocks in the update domwalk and restrict
the walk to the SEME region with the affected blocks.  We can
do that by walking the CFG in reverse from blocks_to_update to
the common immediate dominator, marking blocks in the region
and telling the domwalk to STOP when leaving it.

For an artificial testcase with N adjacent loops with one
unswitching opportunity that takes the incremental SSA updating
off the -ftime-report radar:

 tree loop unswitching              :  11.25 (  3%)   0.09 (  5%)  11.53 (  3%)    36M (  9%)
 `- tree SSA incremental            :  35.74 (  9%)   0.07 (  4%)  36.65 (  9%)  2734k (  1%)

improves to

 tree loop unswitching              :  10.21 (  3%)   0.05 (  3%)  11.50 (  3%)    36M (  9%)
 `- tree SSA incremental            :   0.66 (  0%)   0.02 (  1%)   0.49 (  0%)  2734k (  1%)

for less localized updates the SEME region isn't likely constrained
enough so I've restricted the extra work to TODO_update_ssa_no_phi
callers.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

It's probably the last change that makes a visible difference for
general update-ssa and any specialized manual updating that would be
possible as well would not show up in better numbers than above
(unless I manage to complicate the testcase more).

Comments?

Thanks,
Richard.

	* tree-into-ssa.cc (rewrite_mode::REWRITE_UPDATE_REGION): New.
	(rewrite_update_dom_walker::rewrite_update_dom_walker): Update.
	(rewrite_update_dom_walker::m_in_region_flag): New.
	(rewrite_update_dom_walker::before_dom_children): If the region
	to update is marked, STOP at exits.
	(rewrite_blocks): For REWRITE_UPDATE_REGION mark the region
	to be updated.
	(dump_update_ssa): Use bitmap_empty_p.
	(update_ssa): Likewise.  Use REWRITE_UPDATE_REGION when
	TODO_update_ssa_no_phi.
	* tree-cfgcleanup.cc (cleanup_tree_cfg_noloop): Account
	pending update_ssa to the caller.
---
 gcc/tree-cfgcleanup.cc |  6 ++-
 gcc/tree-into-ssa.cc   | 97 ++++++++++++++++++++++++++++++++++++------
 2 files changed, 90 insertions(+), 13 deletions(-)

diff --git a/gcc/tree-cfgcleanup.cc b/gcc/tree-cfgcleanup.cc
index b9ff6896ce6..3535a7e28a4 100644
--- a/gcc/tree-cfgcleanup.cc
+++ b/gcc/tree-cfgcleanup.cc
@@ -1095,7 +1095,11 @@ cleanup_tree_cfg_noloop (unsigned ssa_update_flags)
   /* After doing the above SSA form should be valid (or an update SSA
      should be required).  */
   if (ssa_update_flags)
-    update_ssa (ssa_update_flags);
+    {
+      timevar_pop (TV_TREE_CLEANUP_CFG);
+      update_ssa (ssa_update_flags);
+      timevar_push (TV_TREE_CLEANUP_CFG);
+    }
 
   /* Compute dominator info which we need for the iterative process below.  */
   if (!dom_info_available_p (CDI_DOMINATORS))
diff --git a/gcc/tree-into-ssa.cc b/gcc/tree-into-ssa.cc
index 9f45e62c6d0..be71b629f97 100644
--- a/gcc/tree-into-ssa.cc
+++ b/gcc/tree-into-ssa.cc
@@ -240,7 +240,8 @@ enum rewrite_mode {
 
     /* Incrementally update the SSA web by replacing existing SSA
        names with new ones.  See update_ssa for details.  */
-    REWRITE_UPDATE
+    REWRITE_UPDATE,
+    REWRITE_UPDATE_REGION
 };
 
 /* The set of symbols we ought to re-write into SSA form in update_ssa.  */
@@ -2155,11 +2156,14 @@ rewrite_update_phi_arguments (basic_block bb)
 class rewrite_update_dom_walker : public dom_walker
 {
 public:
-  rewrite_update_dom_walker (cdi_direction direction)
-    : dom_walker (direction, ALL_BLOCKS, (int *)(uintptr_t)-1) {}
+  rewrite_update_dom_walker (cdi_direction direction, int in_region_flag = -1)
+    : dom_walker (direction, ALL_BLOCKS, (int *)(uintptr_t)-1),
+      m_in_region_flag (in_region_flag) {}
 
   edge before_dom_children (basic_block) final override;
   void after_dom_children (basic_block) final override;
+
+  int m_in_region_flag;
 };
 
 /* Initialization of block data structures for the incremental SSA
@@ -2179,6 +2183,10 @@ rewrite_update_dom_walker::before_dom_children (basic_block bb)
   /* Mark the unwind point for this block.  */
   block_defs_stack.safe_push (NULL_TREE);
 
+  if (m_in_region_flag != -1
+      && !(bb->flags & m_in_region_flag))
+    return STOP;
+
   if (!bitmap_bit_p (blocks_to_update, bb->index))
     return NULL;
 
@@ -2270,8 +2278,8 @@ rewrite_update_dom_walker::after_dom_children (basic_block bb ATTRIBUTE_UNUSED)
    WHAT indicates what actions will be taken by the renamer (see enum
       rewrite_mode).
 
-   BLOCKS are the set of interesting blocks for the dominator walker
-      to process.  If this set is NULL, then all the nodes dominated
+   REGION is a SEME region of interesting blocks for the dominator walker
+      to process.  If this set is invalid, then all the nodes dominated
       by ENTRY are walked.  Otherwise, blocks dominated by ENTRY that
       are not present in BLOCKS are ignored.  */
 
@@ -2283,9 +2291,71 @@ rewrite_blocks (basic_block entry, enum rewrite_mode what)
   /* Recursively walk the dominator tree rewriting each statement in
      each basic block.  */
   if (what == REWRITE_ALL)
-      rewrite_dom_walker (CDI_DOMINATORS).walk (entry);
+    rewrite_dom_walker (CDI_DOMINATORS).walk (entry);
   else if (what == REWRITE_UPDATE)
-      rewrite_update_dom_walker (CDI_DOMINATORS).walk (entry);
+    rewrite_update_dom_walker (CDI_DOMINATORS).walk (entry);
+  else if (what == REWRITE_UPDATE_REGION)
+    {
+      /* First mark all blocks in the SEME region dominated by
+	 entry and exited by blocks not backwards reachable from
+	 blocks_to_update.  Optimize for dense blocks_to_update
+	 so instead of seeding the worklist with a copy of
+	 blocks_to_update treat those blocks explicit.  */
+      auto_bb_flag in_region (cfun);
+      auto_vec<basic_block, 64> extra_rgn;
+      bitmap_iterator bi;
+      unsigned int idx;
+      EXECUTE_IF_SET_IN_BITMAP (blocks_to_update, 0, idx, bi)
+	{
+	  basic_block bb = BASIC_BLOCK_FOR_FN (cfun, idx);
+	  bb->flags |= in_region;
+	}
+      auto_bitmap worklist;
+      EXECUTE_IF_SET_IN_BITMAP (blocks_to_update, 0, idx, bi)
+	{
+	  basic_block bb = BASIC_BLOCK_FOR_FN (cfun, idx);
+	  if (bb != entry)
+	    {
+	      edge_iterator ei;
+	      edge e;
+	      FOR_EACH_EDGE (e, ei, bb->preds)
+		{
+		  if ((e->src->flags & in_region)
+		      || dominated_by_p (CDI_DOMINATORS, e->src, bb))
+		    continue;
+		  bitmap_set_bit (worklist, e->src->index);
+		}
+	    }
+	}
+      while (!bitmap_empty_p (worklist))
+	{
+	  int idx = bitmap_first_set_bit (worklist);
+	  bitmap_clear_bit (worklist, idx);
+	  basic_block bb = BASIC_BLOCK_FOR_FN (cfun, idx);
+	  bb->flags |= in_region;
+	  extra_rgn.safe_push (bb);
+	  if (bb != entry)
+	    {
+	      edge_iterator ei;
+	      edge e;
+	      FOR_EACH_EDGE (e, ei, bb->preds)
+		{
+		  if ((e->src->flags & in_region)
+		      || dominated_by_p (CDI_DOMINATORS, e->src, bb))
+		    continue;
+		  bitmap_set_bit (worklist, e->src->index);
+		}
+	    }
+	}
+      rewrite_update_dom_walker (CDI_DOMINATORS, in_region).walk (entry);
+      EXECUTE_IF_SET_IN_BITMAP (blocks_to_update, 0, idx, bi)
+	{
+	  basic_block bb = BASIC_BLOCK_FOR_FN (cfun, idx);
+	  bb->flags &= ~in_region;
+	}
+      for (auto bb : extra_rgn)
+	bb->flags &= ~in_region;
+    }
   else
     gcc_unreachable ();
 
@@ -2879,7 +2949,7 @@ dump_update_ssa (FILE *file)
   if (!need_ssa_update_p (cfun))
     return;
 
-  if (new_ssa_names && bitmap_first_set_bit (new_ssa_names) >= 0)
+  if (new_ssa_names && !bitmap_empty_p (new_ssa_names))
     {
       sbitmap_iterator sbi;
 
@@ -3389,7 +3459,7 @@ update_ssa (unsigned update_flags)
   /* If there are names defined in the replacement table, prepare
      definition and use sites for all the names in NEW_SSA_NAMES and
      OLD_SSA_NAMES.  */
-  if (bitmap_first_set_bit (new_ssa_names) >= 0)
+  if (!bitmap_empty_p (new_ssa_names))
     {
       statistics_counter_event (cfun, "Incremental SSA update", 1);
 
@@ -3398,7 +3468,7 @@ update_ssa (unsigned update_flags)
       /* If all the names in NEW_SSA_NAMES had been marked for
 	 removal, and there are no symbols to rename, then there's
 	 nothing else to do.  */
-      if (bitmap_first_set_bit (new_ssa_names) < 0
+      if (bitmap_empty_p (new_ssa_names)
 	  && !cfun->gimple_df->ssa_renaming_needed)
 	goto done;
     }
@@ -3503,8 +3573,11 @@ update_ssa (unsigned update_flags)
   FOR_EACH_VEC_ELT (symbols_to_rename, i, sym)
     get_var_info (sym)->info.current_def = NULL_TREE;
 
-  /* Now start the renaming process at START_BB.  */
-  rewrite_blocks (start_bb, REWRITE_UPDATE);
+  /* Now start the renaming process at START_BB.  When not inserting PHIs
+     and thus we are avoiding work on all blocks, try to confine the
+     rewriting domwalk to the affected region, otherwise it's not worth it.  */
+  rewrite_blocks (start_bb,
+		  insert_phi_p ? REWRITE_UPDATE : REWRITE_UPDATE_REGION);
 
   /* Debugging dumps.  */
   if (dump_file)
-- 
2.35.3

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH][RFC] More update-ssa speedup
@ 2022-07-07 13:33 Richard Biener
  0 siblings, 0 replies; 4+ messages in thread
From: Richard Biener @ 2022-07-07 13:33 UTC (permalink / raw)
  To: gcc-patches

When we do TODO_update_ssa_no_phi we already avoid computing
dominance frontiers for all blocks - it is worth to also avoid
walking all dominated blocks in the update domwalk and restrict
the walk to the SEME region with the affected blocks.  We can
do that by walking the CFG in reverse from blocks_to_update to
the common immediate dominator, marking blocks in the region
and telling the domwalk to STOP when leaving it.

For an artificial testcase with N adjacent loops with one
unswitching opportunity that takes the incremental SSA updating
off the -ftime-report radar:

 tree loop unswitching              :  11.25 (  3%)   0.09 (  5%)  11.53 (  3%)    36M (  9%)
 `- tree SSA incremental            :  35.74 (  9%)   0.07 (  4%)  36.65 (  9%)  2734k (  1%)

improves to

 tree loop unswitching              :  10.21 (  3%)   0.05 (  3%)  11.50 (  3%)    36M (  9%)
 `- tree SSA incremental            :   0.66 (  0%)   0.02 (  1%)   0.49 (  0%)  2734k (  1%)

for less localized updates the SEME region isn't likely constrained
enough so I've restricted the extra work to TODO_update_ssa_no_phi
callers.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

It's probably the last change that makes a visible difference for
general update-ssa and any specialized manual updating that would be
possible as well would not show up in better numbers than above
(unless I manage to complicate the testcase more).

Comments?

Thanks,
Richard.

	* tree-into-ssa.cc (rewrite_mode::REWRITE_UPDATE_REGION): New.
	(rewrite_update_dom_walker::rewrite_update_dom_walker): Update.
	(rewrite_update_dom_walker::m_in_region_flag): New.
	(rewrite_update_dom_walker::before_dom_children): If the region
	to update is marked, STOP at exits.
	(rewrite_blocks): For REWRITE_UPDATE_REGION mark the region
	to be updated.
	(dump_update_ssa): Use bitmap_empty_p.
	(update_ssa): Likewise.  Use REWRITE_UPDATE_REGION when
	TODO_update_ssa_no_phi.
	* tree-cfgcleanup.cc (cleanup_tree_cfg_noloop): Account
	pending update_ssa to the caller.
---
 gcc/tree-cfgcleanup.cc |  6 ++-
 gcc/tree-into-ssa.cc   | 97 ++++++++++++++++++++++++++++++++++++------
 2 files changed, 90 insertions(+), 13 deletions(-)

diff --git a/gcc/tree-cfgcleanup.cc b/gcc/tree-cfgcleanup.cc
index b9ff6896ce6..3535a7e28a4 100644
--- a/gcc/tree-cfgcleanup.cc
+++ b/gcc/tree-cfgcleanup.cc
@@ -1095,7 +1095,11 @@ cleanup_tree_cfg_noloop (unsigned ssa_update_flags)
   /* After doing the above SSA form should be valid (or an update SSA
      should be required).  */
   if (ssa_update_flags)
-    update_ssa (ssa_update_flags);
+    {
+      timevar_pop (TV_TREE_CLEANUP_CFG);
+      update_ssa (ssa_update_flags);
+      timevar_push (TV_TREE_CLEANUP_CFG);
+    }
 
   /* Compute dominator info which we need for the iterative process below.  */
   if (!dom_info_available_p (CDI_DOMINATORS))
diff --git a/gcc/tree-into-ssa.cc b/gcc/tree-into-ssa.cc
index 9f45e62c6d0..be71b629f97 100644
--- a/gcc/tree-into-ssa.cc
+++ b/gcc/tree-into-ssa.cc
@@ -240,7 +240,8 @@ enum rewrite_mode {
 
     /* Incrementally update the SSA web by replacing existing SSA
        names with new ones.  See update_ssa for details.  */
-    REWRITE_UPDATE
+    REWRITE_UPDATE,
+    REWRITE_UPDATE_REGION
 };
 
 /* The set of symbols we ought to re-write into SSA form in update_ssa.  */
@@ -2155,11 +2156,14 @@ rewrite_update_phi_arguments (basic_block bb)
 class rewrite_update_dom_walker : public dom_walker
 {
 public:
-  rewrite_update_dom_walker (cdi_direction direction)
-    : dom_walker (direction, ALL_BLOCKS, (int *)(uintptr_t)-1) {}
+  rewrite_update_dom_walker (cdi_direction direction, int in_region_flag = -1)
+    : dom_walker (direction, ALL_BLOCKS, (int *)(uintptr_t)-1),
+      m_in_region_flag (in_region_flag) {}
 
   edge before_dom_children (basic_block) final override;
   void after_dom_children (basic_block) final override;
+
+  int m_in_region_flag;
 };
 
 /* Initialization of block data structures for the incremental SSA
@@ -2179,6 +2183,10 @@ rewrite_update_dom_walker::before_dom_children (basic_block bb)
   /* Mark the unwind point for this block.  */
   block_defs_stack.safe_push (NULL_TREE);
 
+  if (m_in_region_flag != -1
+      && !(bb->flags & m_in_region_flag))
+    return STOP;
+
   if (!bitmap_bit_p (blocks_to_update, bb->index))
     return NULL;
 
@@ -2270,8 +2278,8 @@ rewrite_update_dom_walker::after_dom_children (basic_block bb ATTRIBUTE_UNUSED)
    WHAT indicates what actions will be taken by the renamer (see enum
       rewrite_mode).
 
-   BLOCKS are the set of interesting blocks for the dominator walker
-      to process.  If this set is NULL, then all the nodes dominated
+   REGION is a SEME region of interesting blocks for the dominator walker
+      to process.  If this set is invalid, then all the nodes dominated
       by ENTRY are walked.  Otherwise, blocks dominated by ENTRY that
       are not present in BLOCKS are ignored.  */
 
@@ -2283,9 +2291,71 @@ rewrite_blocks (basic_block entry, enum rewrite_mode what)
   /* Recursively walk the dominator tree rewriting each statement in
      each basic block.  */
   if (what == REWRITE_ALL)
-      rewrite_dom_walker (CDI_DOMINATORS).walk (entry);
+    rewrite_dom_walker (CDI_DOMINATORS).walk (entry);
   else if (what == REWRITE_UPDATE)
-      rewrite_update_dom_walker (CDI_DOMINATORS).walk (entry);
+    rewrite_update_dom_walker (CDI_DOMINATORS).walk (entry);
+  else if (what == REWRITE_UPDATE_REGION)
+    {
+      /* First mark all blocks in the SEME region dominated by
+	 entry and exited by blocks not backwards reachable from
+	 blocks_to_update.  Optimize for dense blocks_to_update
+	 so instead of seeding the worklist with a copy of
+	 blocks_to_update treat those blocks explicit.  */
+      auto_bb_flag in_region (cfun);
+      auto_vec<basic_block, 64> extra_rgn;
+      bitmap_iterator bi;
+      unsigned int idx;
+      EXECUTE_IF_SET_IN_BITMAP (blocks_to_update, 0, idx, bi)
+	{
+	  basic_block bb = BASIC_BLOCK_FOR_FN (cfun, idx);
+	  bb->flags |= in_region;
+	}
+      auto_bitmap worklist;
+      EXECUTE_IF_SET_IN_BITMAP (blocks_to_update, 0, idx, bi)
+	{
+	  basic_block bb = BASIC_BLOCK_FOR_FN (cfun, idx);
+	  if (bb != entry)
+	    {
+	      edge_iterator ei;
+	      edge e;
+	      FOR_EACH_EDGE (e, ei, bb->preds)
+		{
+		  if ((e->src->flags & in_region)
+		      || dominated_by_p (CDI_DOMINATORS, e->src, bb))
+		    continue;
+		  bitmap_set_bit (worklist, e->src->index);
+		}
+	    }
+	}
+      while (!bitmap_empty_p (worklist))
+	{
+	  int idx = bitmap_first_set_bit (worklist);
+	  bitmap_clear_bit (worklist, idx);
+	  basic_block bb = BASIC_BLOCK_FOR_FN (cfun, idx);
+	  bb->flags |= in_region;
+	  extra_rgn.safe_push (bb);
+	  if (bb != entry)
+	    {
+	      edge_iterator ei;
+	      edge e;
+	      FOR_EACH_EDGE (e, ei, bb->preds)
+		{
+		  if ((e->src->flags & in_region)
+		      || dominated_by_p (CDI_DOMINATORS, e->src, bb))
+		    continue;
+		  bitmap_set_bit (worklist, e->src->index);
+		}
+	    }
+	}
+      rewrite_update_dom_walker (CDI_DOMINATORS, in_region).walk (entry);
+      EXECUTE_IF_SET_IN_BITMAP (blocks_to_update, 0, idx, bi)
+	{
+	  basic_block bb = BASIC_BLOCK_FOR_FN (cfun, idx);
+	  bb->flags &= ~in_region;
+	}
+      for (auto bb : extra_rgn)
+	bb->flags &= ~in_region;
+    }
   else
     gcc_unreachable ();
 
@@ -2879,7 +2949,7 @@ dump_update_ssa (FILE *file)
   if (!need_ssa_update_p (cfun))
     return;
 
-  if (new_ssa_names && bitmap_first_set_bit (new_ssa_names) >= 0)
+  if (new_ssa_names && !bitmap_empty_p (new_ssa_names))
     {
       sbitmap_iterator sbi;
 
@@ -3389,7 +3459,7 @@ update_ssa (unsigned update_flags)
   /* If there are names defined in the replacement table, prepare
      definition and use sites for all the names in NEW_SSA_NAMES and
      OLD_SSA_NAMES.  */
-  if (bitmap_first_set_bit (new_ssa_names) >= 0)
+  if (!bitmap_empty_p (new_ssa_names))
     {
       statistics_counter_event (cfun, "Incremental SSA update", 1);
 
@@ -3398,7 +3468,7 @@ update_ssa (unsigned update_flags)
       /* If all the names in NEW_SSA_NAMES had been marked for
 	 removal, and there are no symbols to rename, then there's
 	 nothing else to do.  */
-      if (bitmap_first_set_bit (new_ssa_names) < 0
+      if (bitmap_empty_p (new_ssa_names)
 	  && !cfun->gimple_df->ssa_renaming_needed)
 	goto done;
     }
@@ -3503,8 +3573,11 @@ update_ssa (unsigned update_flags)
   FOR_EACH_VEC_ELT (symbols_to_rename, i, sym)
     get_var_info (sym)->info.current_def = NULL_TREE;
 
-  /* Now start the renaming process at START_BB.  */
-  rewrite_blocks (start_bb, REWRITE_UPDATE);
+  /* Now start the renaming process at START_BB.  When not inserting PHIs
+     and thus we are avoiding work on all blocks, try to confine the
+     rewriting domwalk to the affected region, otherwise it's not worth it.  */
+  rewrite_blocks (start_bb,
+		  insert_phi_p ? REWRITE_UPDATE : REWRITE_UPDATE_REGION);
 
   /* Debugging dumps.  */
   if (dump_file)
-- 
2.35.3

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-07-07 14:29 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-07 13:33 [PATCH][RFC] More update-ssa speedup Richard Biener
     [not found] <20220707133323.1C4993856094@sourceware.org>
2022-07-07 14:29 ` Jeff Law
  -- strict thread matches above, loose matches on Subject: below --
2022-07-07 13:33 Richard Biener
2022-07-07 13:33 Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).