public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* RFA: Improve tree-ssa-sink block selection
@ 2011-10-18  7:23 Jeff Law
  2011-10-18  8:47 ` Paolo Bonzini
  0 siblings, 1 reply; 4+ messages in thread
From: Jeff Law @ 2011-10-18  7:23 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2699 bytes --]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


tree-ssa-sink.c could benefit from a little TLC in its code to select
statement's final location.

For those not familiar with tree-ssa-sink, it attempts to sink
statements down in the CFG to points where they're less likely to
execute.  Ideally we move into lower loop nests and more control
dependent paths.

tree-ssa-sink has some code to avoid sinking in certain cases.  For
example, if the selected block is at a deeper loop nest or if the
selected block post-dominates the statement's original block.  These
are (of course) good things, but both could use some improvement.

First, the selected block might be at a deeper loop nest, but we can
safely move up the dominator tree from the selected block to the
statement's original block looking for a better location.   This
allows more sinking.  We should think of the selected block as the
latest valid block -- we're then free to select any block between the
statement's original block and the latest valid block on the dominator
tree.

Second, while we avoid sinking to a post-dominator of the statement's
original block, the more general heuristic should be to avoid sinking
if the target block does not have a frequency significantly smaller
than the statement's original block.  ie, sinking past conditionals
which will never trigger (checking abort paths for example) is not
profitable.  Furthermore, it appears that gratutious sinking tends to
actually generate worse code.  A new PARAM is introduced to control
the aggressiveness of sinking.

I've done a fair number of experiments with valgrind and the sweet
spot for sinking non-memory referencing statements seems to be the new
block having a frequency less than 75% of the original block's
frequency. Statements which hit memory should be moved somewhat more
aggressively, so they're adjusted slightly.

According to valgrind this patch reduces the amount of instructions
executed by just over 1.15%.  Not huge, but nothing to sneeze out
given we're just changing the target block selector.

Bootstrapped and regression tested on x86_64-unknown-linux-gnu.  OK
for trunk?







-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJOnQpFAAoJEBRtltQi2kC72GAIAKIHiG/QmV9cc2k6tyMQXXB9
o3ell9hAQJXD5fXm5VfzBK+NbkrYusMb1SsOe5CPpdL9LCnUt2obhCaOlZpH3GWC
TdrAkv96m24Y23oVnn7c2cWWjHflTBOCfNzYddG1W2GWwZx7t1Jz1uM4HJd334F4
aijyksbYxEWUs3vCmaeJ1/7oHS71QrgpUqPKrcw4oV97CtszgPiZfgdV24fzuGsb
WsPUGXH93pqJm04E6EQ1jI5dr2nfpazhuSkEhOR+9gZiNYR9Dt4sFpjwgX1o0NVH
UbS4p43RpPQHEGoBxEYnvLVnhVa1n6ywdnNvGAXHcGyDGMOuo6E5GGvFm9Kj7QI=
=9a+g
-----END PGP SIGNATURE-----

[-- Attachment #2: patch --]
[-- Type: text/plain, Size: 8544 bytes --]

	* doc/invoke.texi (sink-frequency-threshold): Document.
	* tree-ssa-sink.c: Include params.h.
	(select_best_block): New function.
	(statement_sink_location): Use it.
	* params.def (SINK_FREQUENCY_THRESHOLD): New PARAM.

	
Index: doc/invoke.texi
===================================================================
*** doc/invoke.texi	(revision 179664)
--- doc/invoke.texi	(working copy)
*************** partitions.
*** 9113,9118 ****
--- 9113,9125 ----
  The maximum number of namespaces to consult for suggestions when C++
  name lookup fails for an identifier.  The default is 1000.
  
+ @item sink-frequency-threshold
+ The maximum relative execution frequency (in percents) of the target block
+ relative to a statement's original block to allow statement sinking of a
+ statement.  Larger numbers result in more aggressive statement sinking.
+ The default value is 75.  A small positive adjustment is applied for
+ statements with memory operands as those are even more profitable so sink.
+ 
  @item max-stores-to-sink
  The maximum number of conditional stores paires that can be sunk.  Set to 0
  if either vectorization (@option{-ftree-vectorize}) or if-conversion
Index: tree-ssa-sink.c
===================================================================
*** tree-ssa-sink.c	(revision 180108)
--- tree-ssa-sink.c	(working copy)
*************** along with GCC; see the file COPYING3.  
*** 40,45 ****
--- 40,46 ----
  #include "bitmap.h"
  #include "langhooks.h"
  #include "cfgloop.h"
+ #include "params.h"
  
  /* TODO:
     1. Sinking store only using scalar promotion (IE without moving the RHS):
*************** nearest_common_dominator_of_uses (gimple
*** 258,263 ****
--- 259,329 ----
    return commondom;
  }
  
+ /* Given EARLY_BB and LATE_BB, two blocks in a path through the dominator
+    tree, return the best basic block between them (inclusive) to place
+    statements.
+ 
+    We want the most control dependent block in the shallowest loop nest.
+ 
+    If the resulting block is in a shallower loop nest, then use it.  Else
+    only use the resulting block if it has significantly lower execution
+    frequency than EARLY_BB to avoid gratutious statement movement.  We
+    consider statements with VOPS more desirable to move.
+ 
+    This pass would obviously benefit from PDO as it utilizes block
+    frequencies.  It would also benefit from recomputing frequencies
+    if profile data is not available since frequencies often get out
+    of sync with reality.  */
+ 
+ static basic_block
+ select_best_block (basic_block early_bb,
+ 		   basic_block late_bb,
+ 		   gimple stmt)
+ {
+   basic_block best_bb = late_bb;
+   basic_block temp_bb = late_bb;
+   int threshold;
+ 
+   while (temp_bb != early_bb)
+     {
+       /* If we've moved into a lower loop nest, then that becomes
+ 	 our best block.  */
+       if (temp_bb->loop_depth < best_bb->loop_depth)
+ 	best_bb = temp_bb;
+ 
+       /* Walk up the dominator tree, hopefully we'll find a shallower
+  	 loop nest.  */
+       temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
+     }
+ 
+   /* If we found a shallower loop nest, then we always consider that
+      a win.  This will always give us the most control dependent block
+      within that loop nest.  */
+   if (best_bb->loop_depth < early_bb->loop_depth)
+     return best_bb;
+ 
+   /* Get the sinking threshold.  If the statement to be moved has memory
+      operands, then increase the threshold by 7% as those are even more
+      profitable to avoid, clamping at 100%.  */
+   threshold = PARAM_VALUE (PARAM_SINK_FREQUENCY_THRESHOLD);
+   if (gimple_vuse (stmt) || gimple_vdef (stmt))
+     {
+       threshold += 7;
+       if (threshold > 100)
+ 	threshold = 100;
+     }
+ 
+   /* If BEST_BB is at the same nesting level, then require it to have
+      significantly lower execution frequency to avoid gratutious movement.  */
+   if (best_bb->loop_depth == early_bb->loop_depth
+       && best_bb->frequency < (early_bb->frequency * threshold / 100.0))
+     return best_bb;
+ 
+   /* No better block found, so return EARLY_BB, which happens to be the
+      statement's original block.  */
+   return early_bb;
+ }
+ 
  /* Given a statement (STMT) and the basic block it is currently in (FROMBB),
     determine the location to sink the statement to, if any.
     Returns true if there is such location; in that case, TOGSI points to the
*************** statement_sink_location (gimple stmt, ba
*** 379,402 ****
        if (!dominated_by_p (CDI_DOMINATORS, commondom, frombb))
  	return false;
  
!       /* It doesn't make sense to move to a dominator that post-dominates
! 	 frombb, because it means we've just moved it into a path that always
! 	 executes if frombb executes, instead of reducing the number of
! 	 executions .  */
!       if (dominated_by_p (CDI_POST_DOMINATORS, frombb, commondom))
! 	{
! 	  if (dump_file && (dump_flags & TDF_DETAILS))
! 	    fprintf (dump_file, "Not moving store, common dominator post-dominates from block.\n");
! 	  return false;
! 	}
  
!       if (commondom == frombb || commondom->loop_depth > frombb->loop_depth)
! 	return false;
!       if (dump_file && (dump_flags & TDF_DETAILS))
! 	{
! 	  fprintf (dump_file, "Common dominator of all uses is %d\n",
! 		   commondom->index);
! 	}
  
        *togsi = gsi_after_labels (commondom);
  
--- 445,454 ----
        if (!dominated_by_p (CDI_DOMINATORS, commondom, frombb))
  	return false;
  
!       commondom = select_best_block (frombb, commondom, stmt);
  
!       if (commondom == frombb)
! 	return false;	
  
        *togsi = gsi_after_labels (commondom);
  
*************** statement_sink_location (gimple stmt, ba
*** 415,427 ****
        if (gimple_code (use) != GIMPLE_PHI)
  	{
  	  sinkbb = gimple_bb (use);
! 	  if (sinkbb == frombb || sinkbb->loop_depth > frombb->loop_depth
! 	      || sinkbb->loop_father != frombb->loop_father)
! 	    return false;
  
! 	  /* Move the expression to a post dominator can't reduce the number of
! 	     executions.  */
! 	  if (dominated_by_p (CDI_POST_DOMINATORS, frombb, sinkbb))
  	    return false;
  
  	  *togsi = gsi_for_stmt (use);
--- 467,475 ----
        if (gimple_code (use) != GIMPLE_PHI)
  	{
  	  sinkbb = gimple_bb (use);
! 	  sinkbb = select_best_block (frombb, gimple_bb (use), stmt);
  
! 	  if (sinkbb == frombb)
  	    return false;
  
  	  *togsi = gsi_for_stmt (use);
*************** statement_sink_location (gimple stmt, ba
*** 431,451 ****
      }
  
    sinkbb = find_bb_for_arg (use, DEF_FROM_PTR (def_p));
-   if (!sinkbb)
-     return false;
  
!   /* This will happen when you have
!      a_3 = PHI <a_13, a_26>
! 
!      a_26 = VDEF <a_3>
! 
!      If the use is a phi, and is in the same bb as the def,
!      we can't sink it.  */
! 
!   if (gimple_bb (use) == frombb)
      return false;
!   if (sinkbb == frombb || sinkbb->loop_depth > frombb->loop_depth
!       || sinkbb->loop_father != frombb->loop_father)
      return false;
  
    /* If the latch block is empty, don't make it non-empty by sinking
--- 479,491 ----
      }
  
    sinkbb = find_bb_for_arg (use, DEF_FROM_PTR (def_p));
  
!   /* This can happen if there are multiple uses in a PHI.  */
!   if (!sinkbb)
      return false;
!   
!   sinkbb = select_best_block (frombb, sinkbb, stmt);
!   if (!sinkbb || sinkbb == frombb)
      return false;
  
    /* If the latch block is empty, don't make it non-empty by sinking
*************** statement_sink_location (gimple stmt, ba
*** 454,464 ****
        && empty_block_p (sinkbb))
      return false;
  
-   /* Move the expression to a post dominator can't reduce the number of
-      executions.  */
-   if (dominated_by_p (CDI_POST_DOMINATORS, frombb, sinkbb))
-     return false;
- 
    *togsi = gsi_after_labels (sinkbb);
  
    return true;
--- 494,499 ----
Index: params.def
===================================================================
*** params.def	(revision 179664)
--- params.def	(working copy)
*************** DEFPARAM(PARAM_MAX_RELOAD_SEARCH_INSNS,
*** 566,571 ****
--- 566,576 ----
  	 "The maximum number of instructions to search backward when looking for equivalent reload",
  	 100, 0, 0)
  
+ DEFPARAM(PARAM_SINK_FREQUENCY_THRESHOLD,
+ 	 "sink-frequency-threshold",
+ 	 "Target block's relative execution frequency (as a percentage) required to sink a statement",
+ 	 75, 0, 100)
+ 
  DEFPARAM(PARAM_MAX_SCHED_REGION_BLOCKS,
  	 "max-sched-region-blocks",
  	 "The maximum number of blocks in a region to be considered for interblock scheduling",

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RFA: Improve tree-ssa-sink block selection
  2011-10-18  7:23 RFA: Improve tree-ssa-sink block selection Jeff Law
@ 2011-10-18  8:47 ` Paolo Bonzini
  2011-10-18 19:08   ` Jeff Law
  2011-10-20  7:41   ` Jeff Law
  0 siblings, 2 replies; 4+ messages in thread
From: Paolo Bonzini @ 2011-10-18  8:47 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

On 10/18/2011 07:10 AM, Jeff Law wrote:
> --- 467,475 ----
>          if (gimple_code (use) != GIMPLE_PHI)
>    	{
>    	  sinkbb = gimple_bb (use);
> ! 	  sinkbb = select_best_block (frombb, gimple_bb (use), stmt);
>
> ! 	  if (sinkbb == frombb)
>    	    return false;
>
>    	  *togsi = gsi_for_stmt (use);

Useless assignment of sinkbb, otherwise looks fine.

By the way, is it intended that sink_code_in_bb visits again 
postdominators that were already visited (which with domwalk would come 
for free)?  As it is, the pass is quadratic when you have something like 
this:

   if (x1) y; else z;
   if (x2) y; else z;
   ...
   if (x9999) y; else z;
   if (x10000) y; else z;

where the postdominator tree is

              x1
               |
              x2
               |
            x9999
               |
            x10000 y1 z1 y2 z2 ... y10000 z10000
                \___|__|__|__|______/__,---'
                           |
                       EXIT_BLOCK

Paolo

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RFA: Improve tree-ssa-sink block selection
  2011-10-18  8:47 ` Paolo Bonzini
@ 2011-10-18 19:08   ` Jeff Law
  2011-10-20  7:41   ` Jeff Law
  1 sibling, 0 replies; 4+ messages in thread
From: Jeff Law @ 2011-10-18 19:08 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: gcc-patches

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/18/11 01:44, Paolo Bonzini wrote:
> On 10/18/2011 07:10 AM, Jeff Law wrote:
>> --- 467,475 ---- if (gimple_code (use) != GIMPLE_PHI) { sinkbb =
>> gimple_bb (use); !       sinkbb = select_best_block (frombb,
>> gimple_bb (use), stmt);
>> 
>> !       if (sinkbb == frombb) return false;
>> 
>> *togsi = gsi_for_stmt (use);
> 
> Useless assignment of sinkbb, otherwise looks fine.
Thanks.  I'll fix that.

> 
> By the way, is it intended that sink_code_in_bb visits again 
> postdominators that were already visited (which with domwalk would
> come for free)?  As it is, the pass is quadratic when you have
> something like this:
I haven't really looked at the visitation order in tree-ssa-sink.c.
The formulation as a walk over the pdom tree and sinking statements
within each block is unfortunate, but workable.

I'm not sure where you're seeing multiple visits of an individual
block, but maybe I've got some kind of a mental block.

The other way to formulate this problem is with recursion.  A DFS walk
to sink each statement's immediate uses, then sinking the statement
itself.  It's a nice, simple formulation that is annoyingly complex to
implement in GCC due to "interesting" aspects of our immediate use
iterators.


Jeff
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJOnb/bAAoJEBRtltQi2kC7i7oIAIXnmR76SMZIBDQLHvMqEAME
qTG5HkbL0kCb4EDMHYRXA4U8Wot1UWiuAkJzihBQMRWiqOs5v555iUx6N+M/l2Fu
w+1SyqeZacNBLm9/bI/Ynbr55BXF3aKPxES5kqRWIJE1kCoOqKFOWAizgWJmyfMG
ct13PpUqU1CT572cwl+GgCQulEWXQEgeOH/Iu/gLPFv+PSEhkEN0vA6foBYgCnk6
d4lGaa3NcQsUwhAlu4rNgQVAZL3cpTRL8QDIultxg3j/zEhcTwJ4YB4azDQcsRQ4
ZasKqD7m4HXnvuqS8TI66v+w4wKsparWyKTXAqp4dibnhON4l2Omes/ZmJokiUs=
=HXaj
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RFA: Improve tree-ssa-sink block selection
  2011-10-18  8:47 ` Paolo Bonzini
  2011-10-18 19:08   ` Jeff Law
@ 2011-10-20  7:41   ` Jeff Law
  1 sibling, 0 replies; 4+ messages in thread
From: Jeff Law @ 2011-10-20  7:41 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: gcc-patches

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/18/11 01:44, Paolo Bonzini wrote:
> On 10/18/2011 07:10 AM, Jeff Law wrote:
>> --- 467,475 ---- if (gimple_code (use) != GIMPLE_PHI) { sinkbb =
>> gimple_bb (use); !       sinkbb = select_best_block (frombb,
>> gimple_bb (use), stmt);
>> 
>> !       if (sinkbb == frombb) return false;
>> 
>> *togsi = gsi_for_stmt (use);
> 
> Useless assignment of sinkbb, otherwise looks fine.
I've removed the useless assignment and re-tested everything with no
regressions.  I'm actually taking a few days off, so I'll check in the
updated patch after I return.

jeff
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJOn7YpAAoJEBRtltQi2kC78/sH/0LVTMn8x2EUnOgKIZL8Acgk
eYX7Er/cZZRkOUqtwdmwJNqqoNDVr3/0vS1JWnIGL9PvlEbpfb8AwLuUGlN6806x
iKbKVJGFHJ1LTSpxeebf2vI+R+0Ti8otkve01hcnq2rSOBv1j5C1MqhjuBeVlILs
ud0+zy7QxJRBr7YbXHaHN6ncJ+MqPYSDiWnwpvj9Obp6E/vDBc7biaaYSm08Pow/
G4gs3V3akjG3d/nP8XlgCucpJABZTmLpALrrV+s49AQKaARy+kVPfsRhIC8x3axL
MpXWLg9t8ZqRdW5pqAAQCbSTrKNPwX80NMvNFYqQjYU5HArNAZtoGffoQ6K158s=
=1D9Q
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-10-20  5:48 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-18  7:23 RFA: Improve tree-ssa-sink block selection Jeff Law
2011-10-18  8:47 ` Paolo Bonzini
2011-10-18 19:08   ` Jeff Law
2011-10-20  7:41   ` Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).