public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: [PATCH RFA] Implement register pressure directed hoist pass
       [not found] <50655073.e54c420a.651e.ffffac0fSMTPIN_ADDED@mx.google.com>
@ 2012-09-28  9:24 ` Steven Bosscher
  2012-09-28 11:05   ` Bin Cheng
                     ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Steven Bosscher @ 2012-09-28  9:24 UTC (permalink / raw)
  To: Bin Cheng; +Cc: gcc-patches, Eric Botcazou, Richard Sandiford, vmakarov

On Fri, Sep 28, 2012 at 9:18 AM, Bin Cheng <bin.cheng@arm.com> wrote:
>         (get_regno_pressure_class, get_pressure_class_and_nregs)

Broken long lines in a ChangeLog entry end with a ",".


>         (change_pressure, mark_regno_live, mark_regno_death, mark_reg_death)
>         (mark_reg_store, mark_reg_clobber, calculate_bb_reg_pressure)

Please use the DF caches instead of note_stores, note_uses, etc.


>         (free_bb_data): New.

Please use alloc_aux_for_blocks (in calculate_bb_reg_pressure) and
free_aux_for_block.

Ciao!
Steven

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH RFA] Implement register pressure directed hoist pass
  2012-09-28  9:24 ` [PATCH RFA] Implement register pressure directed hoist pass Steven Bosscher
@ 2012-09-28 11:05   ` Bin Cheng
  2012-09-28 11:26   ` Pedro Alves
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 16+ messages in thread
From: Bin Cheng @ 2012-09-28 11:05 UTC (permalink / raw)
  To: 'Steven Bosscher'
  Cc: gcc-patches, Eric Botcazou, Richard Sandiford, vmakarov

Thanks for comments.

> -----Original Message-----
> From: Steven Bosscher [mailto:stevenb.gcc@gmail.com]
> Sent: Friday, September 28, 2012 4:29 PM
> To: Bin Cheng
> Cc: gcc-patches@gcc.gnu.org; Eric Botcazou; Richard Sandiford;
> vmakarov@redhat.com
> Subject: Re: [PATCH RFA] Implement register pressure directed hoist pass
> 
> On Fri, Sep 28, 2012 at 9:18 AM, Bin Cheng <bin.cheng@arm.com> wrote:
> >         (get_regno_pressure_class, get_pressure_class_and_nregs)
> 
> Broken long lines in a ChangeLog entry end with a ",".

Did the mail-client wrap this line? I found no line exceeding 80 characters,
or I missed something?

> 
> 
> >         (change_pressure, mark_regno_live, mark_regno_death,
mark_reg_death)
> >         (mark_reg_store, mark_reg_clobber, calculate_bb_reg_pressure)
> 
> Please use the DF caches instead of note_stores, note_uses, etc.

These register pressure calculation codes are copied from loop-invariant.c.
I was thinking: Should we just export these interfaces from
loop-invariant.c, or copy it as in this patch. Maybe we can abstract these
function/data-structure into a common file? I would like to hear your
comments.

If we decide to keep these codes in gcse.c, I will make the change using DF
caches according to your comments. 
Moreover, it seems I still have to iterate REG_NOTES to find
REG_UNUSED/REG_DEAD information, or I can get them from DF caches too?
Thanks.
 
> 
> 
> >         (free_bb_data): New.
> 
> Please use alloc_aux_for_blocks (in calculate_bb_reg_pressure) and
> free_aux_for_block.

Yes.




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFA] Implement register pressure directed hoist pass
  2012-09-28  9:24 ` [PATCH RFA] Implement register pressure directed hoist pass Steven Bosscher
  2012-09-28 11:05   ` Bin Cheng
@ 2012-09-28 11:26   ` Pedro Alves
  2012-09-29 11:54   ` Bin Cheng
       [not found]   ` <50669835.e988440a.4421.ffffa17cSMTPIN_ADDED@mx.google.com>
  3 siblings, 0 replies; 16+ messages in thread
From: Pedro Alves @ 2012-09-28 11:26 UTC (permalink / raw)
  To: Steven Bosscher
  Cc: Bin Cheng, gcc-patches, Eric Botcazou, Richard Sandiford, vmakarov

On 09/28/2012 09:29 AM, Steven Bosscher wrote:
> On Fri, Sep 28, 2012 at 9:18 AM, Bin Cheng <bin.cheng@arm.com> wrote:
>>         (get_regno_pressure_class, get_pressure_class_and_nregs)
> 
> Broken long lines in a ChangeLog entry end with a ",".

From <http://www.gnu.org/prep/standards/html_node/Style-of-Change-Logs.html#Style-of-Change-Logs>:

Break long lists of function names by closing continued lines with ‘)’, rather than ‘,’, and opening the continuation with ‘(’ as in this example:

* keyboard.c (menu_bar_items, tool_bar_items)
(Fexecute_extended_command): Deal with 'keymap' property.

-- 
Pedro Alves

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH RFA] Implement register pressure directed hoist pass
  2012-09-28  9:24 ` [PATCH RFA] Implement register pressure directed hoist pass Steven Bosscher
  2012-09-28 11:05   ` Bin Cheng
  2012-09-28 11:26   ` Pedro Alves
@ 2012-09-29 11:54   ` Bin Cheng
  2012-10-02 12:58     ` Jeff Law
       [not found]   ` <50669835.e988440a.4421.ffffa17cSMTPIN_ADDED@mx.google.com>
  3 siblings, 1 reply; 16+ messages in thread
From: Bin Cheng @ 2012-09-29 11:54 UTC (permalink / raw)
  To: 'Steven Bosscher'; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2371 bytes --]


> -----Original Message-----
> From: Steven Bosscher [mailto:stevenb.gcc@gmail.com]
> Sent: Friday, September 28, 2012 4:29 PM
> To: Bin Cheng
> Cc: gcc-patches@gcc.gnu.org; Eric Botcazou; Richard Sandiford;
> vmakarov@redhat.com
> Subject: Re: [PATCH RFA] Implement register pressure directed hoist pass
> 
> On Fri, Sep 28, 2012 at 9:18 AM, Bin Cheng <bin.cheng@arm.com> wrote:
> >         (get_regno_pressure_class, get_pressure_class_and_nregs)
> 
> Broken long lines in a ChangeLog entry end with a ",".
> 
> 
> >         (change_pressure, mark_regno_live, mark_regno_death,
mark_reg_death)
> >         (mark_reg_store, mark_reg_clobber, calculate_bb_reg_pressure)
> 
> Please use the DF caches instead of note_stores, note_uses, etc.
> 
> 
> >         (free_bb_data): New.
> 
> Please use alloc_aux_for_blocks (in calculate_bb_reg_pressure) and
> free_aux_for_block.
> 

Hi Steven,

This is the updated patch according to your comments. Please review.
I also re-collected code size data and found it is improved by about 0.24%
for mips, which is better than previous data. I believe this should be
caused by recent changes in trunk, rather than by using DF caches to
calculate register pressure. 

Thanks.

2012-09-29  Bin Cheng  <bin.cheng@arm.com>

	* common.opt (flag_ira_hoist_pressure): New.
	* doc/invoke.texi (-fira-hoist-pressure): Describe.
	* ira-costs.c (ira_set_pseudo_classes): New parameter.
	* ira.h (ira_set_pseudo_classes): Update prototype.
	* haifa-sched.c (sched_init): Update call.
	* ira.c (ira): Update call.
	* regmove.c (regmove_optimize): Update call.
	* loop-invariant.c (move_loop_invariants): Update call.
	* gcse.c (struct bb_data): New structure.
	(BB_DATA): New macro.
	(curr_bb, curr_regs_live, curr_reg_pressure, regs_set)
	(n_regs_set): New static variables.
	(hoist_expr_reaches_here_p): Use reg pressure to determine the
	distance expr can be hoisted.
	(hoist_code): Use reg pressure to direct the hoist process.
	(get_regno_pressure_class, get_pressure_class_and_nregs)
	(change_pressure, mark_regno_live, mark_regno_death)
	(mark_reg_death, mark_reg_store, calculate_bb_reg_pressure): New.
	(one_code_hoisting_pass): Calculate register pressure. Free data.
	* config/arm/arm.c (arm_option_override): Set
flag_ira_hoist_pressure
	on Thumb1 when optimizing for size.

[-- Attachment #2: hoist-reg-pressure-20120929.txt --]
[-- Type: text/plain, Size: 23852 bytes --]

Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	(revision 191816)
+++ gcc/doc/invoke.texi	(working copy)
@@ -370,7 +370,7 @@ Objective-C and Objective-C++ Dialects}.
 -finline-small-functions -fipa-cp -fipa-cp-clone @gol
 -fipa-pta -fipa-profile -fipa-pure-const -fipa-reference @gol
 -fira-algorithm=@var{algorithm} @gol
--fira-region=@var{region} @gol
+-fira-region=@var{region} -fira-hoist-pressure @gol
 -fira-loop-pressure -fno-ira-share-save-slots @gol
 -fno-ira-share-spill-slots -fira-verbose=@var{n} @gol
 -fivopts -fkeep-inline-functions -fkeep-static-consts @gol
@@ -6904,6 +6904,14 @@ This typically results in the smallest code size,
 
 @end table
 
+@item -fira-hoist-pressure
+@opindex fira-hoist-pressure
+Use IRA to evaluate register pressure in hoist pass for decisions to hoist
+expressions.  This option usually results in generation of smaller code on
+RISC machines, but it can slow the compiler down.
+
+This option is enabled at level @option{-Os} for some targets.
+
 @item -fira-loop-pressure
 @opindex fira-loop-pressure
 Use IRA to evaluate register pressure in loops for decisions to move
Index: gcc/haifa-sched.c
===================================================================
--- gcc/haifa-sched.c	(revision 191816)
+++ gcc/haifa-sched.c	(working copy)
@@ -6629,7 +6629,7 @@ sched_init (void)
 	/* We need info about pseudos for rtl dumps about pseudo
 	   classes and costs.  */
 	regstat_init_n_sets_and_refs ();
-      ira_set_pseudo_classes (sched_verbose ? sched_dump : NULL);
+      ira_set_pseudo_classes (true, sched_verbose ? sched_dump : NULL);
       sched_regno_pressure_class
 	= (enum reg_class *) xmalloc (max_regno * sizeof (enum reg_class));
       for (i = 0; i < max_regno; i++)
Index: gcc/regmove.c
===================================================================
--- gcc/regmove.c	(revision 191816)
+++ gcc/regmove.c	(working copy)
@@ -1237,7 +1237,7 @@ regmove_optimize (void)
   regstat_compute_ri ();
 
   if (flag_ira_loop_pressure)
-    ira_set_pseudo_classes (dump_file);
+    ira_set_pseudo_classes (true, dump_file);
 
   regno_src_regno = XNEWVEC (int, nregs);
   for (i = nregs; --i >= 0; )
Index: gcc/gcse.c
===================================================================
--- gcc/gcse.c	(revision 191816)
+++ gcc/gcse.c	(working copy)
@@ -20,9 +20,9 @@ along with GCC; see the file COPYING3.  If not see
 
 /* TODO
    - reordering of memory allocation and freeing to be more space efficient
-   - do rough calc of how many regs are needed in each block, and a rough
-     calc of how many regs are available in each class and use that to
-     throttle back the code in cases where RTX_COST is minimal.
+   - simulate register pressure change of each basic block accurately during
+     hoist process. But I doubt the benefit since most expressions hoisted
+     are constant or address, which usually won't reduce register pressure.
 */
 
 /* References searched while implementing this.
@@ -141,11 +141,12 @@ along with GCC; see the file COPYING3.  If not see
 #include "diagnostic-core.h"
 #include "toplev.h"
 
+#include "hard-reg-set.h"
 #include "rtl.h"
 #include "tree.h"
 #include "tm_p.h"
 #include "regs.h"
-#include "hard-reg-set.h"
+#include "ira.h"
 #include "flags.h"
 #include "insn-config.h"
 #include "recog.h"
@@ -412,6 +413,33 @@ static bool doing_code_hoisting_p = false;
 /* For available exprs */
 static sbitmap *ae_kill;
 \f
+/* Data stored for each basic block.  */
+struct bb_data
+{
+  /* Maximal register pressure inside basic block for given register class
+     (defined only for the pressure classes).  */
+  int max_reg_pressure[N_REG_CLASSES];
+};
+
+#define BB_DATA(bb) ((struct bb_data *) (bb)->aux)
+
+static basic_block curr_bb;
+
+/* Register currently living.  */
+static bitmap_head curr_regs_live;
+
+/* Currently register pressure for each pressure class.  */
+static int curr_reg_pressure[N_REG_CLASSES];
+
+/* Record all regs that are set in any one insn.  Communication from
+   mark_reg_{store,clobber} and global_conflicts.  Asm can refer to
+   all hard-registers.  */
+static rtx regs_set[(FIRST_PSEUDO_REGISTER > MAX_RECOG_OPERANDS
+		     ? FIRST_PSEUDO_REGISTER : MAX_RECOG_OPERANDS) * 2];
+/* Number of regs stored in the previous array.  */
+static int n_regs_set;
+\f
+
 static void compute_can_copy (void);
 static void *gmalloc (size_t) ATTRIBUTE_MALLOC;
 static void *gcalloc (size_t, size_t) ATTRIBUTE_MALLOC;
@@ -460,9 +488,11 @@ static void alloc_code_hoist_mem (int, int);
 static void free_code_hoist_mem (void);
 static void compute_code_hoist_vbeinout (void);
 static void compute_code_hoist_data (void);
-static int hoist_expr_reaches_here_p (basic_block, int, basic_block, char *,
-				      int, int *);
+static int hoist_expr_reaches_here_p (basic_block, struct expr*, basic_block,
+				      sbitmap, int, int *, enum reg_class,
+				      int *, bitmap_head *);
 static int hoist_code (void);
+static enum reg_class get_pressure_class_and_nregs (rtx insn, int *nregs);
 static int one_code_hoisting_pass (void);
 static rtx process_insert_insn (struct expr *);
 static int pre_edge_insert (struct edge_list *, struct expr **);
@@ -2825,11 +2855,16 @@ compute_code_hoist_data (void)
     fprintf (dump_file, "\n");
 }
 
-/* Determine if the expression identified by EXPR_INDEX would
-   reach BB unimpared if it was placed at the end of EXPR_BB.
-   Stop the search if the expression would need to be moved more
-   than DISTANCE instructions.
+/* Determine if the expression EXPR would reach BB unimpared if it was
+   placed at the end of EXPR_BB. Stop the search if the expression would
+   need to be moved more than DISTANCE instructions.
 
+   PRESSURE_CLASS and NREGS are register class and number of hard registers
+   for storing EXPR. 
+
+   HOISTED_BBS points to a bitmap indicating basic blocks through which
+   EXPR is hoisted.
+
    It's unclear exactly what Muchnick meant by "unimpared".  It seems
    to me that the expression must either be computed or transparent in
    *every* block in the path(s) from EXPR_BB to BB.  Any other definition
@@ -2841,18 +2876,29 @@ compute_code_hoist_data (void)
    paths.  */
 
 static int
-hoist_expr_reaches_here_p (basic_block expr_bb, int expr_index, basic_block bb,
-			   char *visited, int distance, int *bb_size)
+hoist_expr_reaches_here_p (basic_block expr_bb, struct expr *expr,
+			   basic_block bb, sbitmap visited, int distance,
+			   int *bb_size, enum reg_class pressure_class,
+			   int *nregs, bitmap_head *hoisted_bbs)
 {
+  unsigned int i;
   edge pred;
   edge_iterator ei;
+  sbitmap_iterator sbi;
   int visited_allocated_locally = 0;
 
   /* Terminate the search if distance, for which EXPR is allowed to move,
      is exhausted.  */
   if (distance > 0)
     {
-      distance -= bb_size[bb->index];
+      /* Only decrease distance if bb has high register pressure or EXPR
+	 is const expr, otherwise EXPR can be hoisted through bb without
+	 cost.  */
+      if (flag_ira_hoist_pressure != 1
+	  || (BB_DATA (bb)->max_reg_pressure[pressure_class]
+		>= ira_class_hard_regs_num[pressure_class]
+	      || CONST_INT_P (expr->expr)))
+	distance -= bb_size[bb->index];
 
       if (distance <= 0)
 	return 0;
@@ -2863,7 +2909,8 @@ static int
   if (visited == NULL)
     {
       visited_allocated_locally = 1;
-      visited = XCNEWVEC (char, last_basic_block);
+      visited = sbitmap_alloc (last_basic_block);
+      sbitmap_zero (visited);
     }
 
   FOR_EACH_EDGE (pred, ei, bb->preds)
@@ -2874,23 +2921,37 @@ static int
 	break;
       else if (pred_bb == expr_bb)
 	continue;
-      else if (visited[pred_bb->index])
+      else if (TEST_BIT (visited, pred_bb->index))
 	continue;
-
-      else if (! TEST_BIT (transp[pred_bb->index], expr_index))
+      else if (! TEST_BIT (transp[pred_bb->index], expr->bitmap_index))
 	break;
-
       /* Not killed.  */
       else
 	{
-	  visited[pred_bb->index] = 1;
-	  if (! hoist_expr_reaches_here_p (expr_bb, expr_index, pred_bb,
-					   visited, distance, bb_size))
+	  SET_BIT (visited, pred_bb->index);
+	  if (! hoist_expr_reaches_here_p (expr_bb, expr, pred_bb,
+					   visited, distance, bb_size,
+					   pressure_class, nregs, hoisted_bbs))
 	    break;
 	}
     }
   if (visited_allocated_locally)
-    free (visited);
+    {
+      /* If EXPR can be hoisted to expr_bb, record basic blocks through
+	 which EXPR is hoisted in hoisted_bbs. Also update register
+	 pressure for basic blocks newly added in hoisted_bbs.  */
+      if ((flag_ira_hoist_pressure == 1) && !pred)
+	{
+	  EXECUTE_IF_SET_IN_SBITMAP (visited, 0, i, sbi)
+	    if (!bitmap_bit_p (hoisted_bbs, i))
+	      {
+		bitmap_set_bit (hoisted_bbs, i);
+		BB_DATA (BASIC_BLOCK (i))->max_reg_pressure[pressure_class]
+		    += *nregs;
+	      }
+	}
+      sbitmap_free (visited);
+    }
 
   return (pred == NULL);
 }
@@ -2916,12 +2977,14 @@ hoist_code (void)
   VEC (basic_block, heap) *dom_tree_walk;
   unsigned int dom_tree_walk_index;
   VEC (basic_block, heap) *domby;
-  unsigned int i,j;
+  unsigned int i, j, k;
   struct expr **index_map;
   struct expr *expr;
   int *to_bb_head;
   int *bb_size;
   int changed = 0;
+  struct bb_data *data;
+  bitmap_iterator bi;
 
   /* Compute a mapping from expression number (`bitmap_index') to
      hash table entry.  */
@@ -2977,12 +3040,16 @@ hoist_code (void)
 	{
 	  if (TEST_BIT (hoist_vbeout[bb->index], i))
 	    {
+	      int nregs = 0;
+	      enum reg_class pressure_class = NO_REGS;
 	      /* Current expression.  */
 	      struct expr *expr = index_map[i];
 	      /* Number of occurrences of EXPR that can be hoisted to BB.  */
 	      int hoistable = 0;
 	      /* Basic blocks that have occurrences reachable from BB.  */
 	      bitmap_head _from_bbs, *from_bbs = &_from_bbs;
+	      /* Basic blocks through which expr is hoisted.  */
+	      bitmap_head _hoisted_bbs, *hoisted_bbs = &_hoisted_bbs;
 	      /* Occurrences reachable from BB.  */
 	      VEC (occr_t, heap) *occrs_to_hoist = NULL;
 	      /* We want to insert the expression into BB only once, so
@@ -2991,6 +3058,8 @@ hoist_code (void)
 	      occr_t occr;
 
 	      bitmap_initialize (from_bbs, 0);
+	      if (flag_ira_hoist_pressure == 1)
+		bitmap_initialize (hoisted_bbs, 0);
 
 	      /* If an expression is computed in BB and is available at end of
 		 BB, hoist all occurrences dominated by BB to BB.  */
@@ -3045,13 +3114,17 @@ hoist_code (void)
 		    max_distance += (bb_size[dominated->index]
 				     - to_bb_head[INSN_UID (occr->insn)]);
 
+		  pressure_class = get_pressure_class_and_nregs (occr->insn,
+								 &nregs);
+
 		  /* Note if the expression would reach the dominated block
 		     unimpared if it was placed at the end of BB.
 
 		     Keep track of how many times this expression is hoistable
 		     from a dominated block into BB.  */
-		  if (hoist_expr_reaches_here_p (bb, i, dominated, NULL,
-						 max_distance, bb_size))
+		  if (hoist_expr_reaches_here_p (bb, expr, dominated, NULL,
+						 max_distance, bb_size, pressure_class,
+						 &nregs, hoisted_bbs))
 		    {
 		      hoistable++;
 		      VEC_safe_push (occr_t, heap,
@@ -3072,6 +3145,13 @@ hoist_code (void)
 		 to nullify any benefit we get from code hoisting.  */
 	      if (hoistable > 1 && dbg_cnt (hoist_insn))
 		{
+		  /* Update register pressure for basic block to which expr
+		     is hoisted.  */
+		  if (flag_ira_hoist_pressure == 1)
+		    {
+		      data = BB_DATA (bb);
+		      data->max_reg_pressure[pressure_class] += nregs;
+		    }
 		  /* If (hoistable != VEC_length), then there is
 		     an occurrence of EXPR in BB itself.  Don't waste
 		     time looking for LCA in this case.  */
@@ -3089,8 +3169,20 @@ hoist_code (void)
 		    }
 		}
 	      else
+	      {
 		/* Punt, no point hoisting a single occurence.  */
 		VEC_free (occr_t, heap, occrs_to_hoist);
+		/* Restore register pressure of basic block recorded in
+ 		   hoisted_bbs when expr will not be hoisted.  */
+		if (flag_ira_hoist_pressure == 1)
+		  EXECUTE_IF_SET_IN_BITMAP (hoisted_bbs, 0, k, bi)
+		    {
+		      data = BB_DATA (BASIC_BLOCK (k));
+		      data->max_reg_pressure[pressure_class] -= nregs;
+		    }
+	      }
+	      if (flag_ira_hoist_pressure == 1)
+		bitmap_clear (hoisted_bbs);
 
 	      insn_inserted_p = 0;
 
@@ -3147,6 +3239,248 @@ hoist_code (void)
   return changed;
 }
 
+/* Return pressure class and number of needed hard registers (through
+   *NREGS) of register REGNO.  */
+static enum reg_class
+get_regno_pressure_class (int regno, int *nregs)
+{
+  if (regno >= FIRST_PSEUDO_REGISTER)
+    {
+      enum reg_class pressure_class;
+
+      pressure_class = reg_allocno_class (regno);
+      pressure_class = ira_pressure_class_translate[pressure_class];
+      *nregs
+	= ira_reg_class_max_nregs[pressure_class][PSEUDO_REGNO_MODE (regno)];
+      return pressure_class;
+    }
+  else if (! TEST_HARD_REG_BIT (ira_no_alloc_regs, regno)
+	   && ! TEST_HARD_REG_BIT (eliminable_regset, regno))
+    {
+      *nregs = 1;
+      return ira_pressure_class_translate[REGNO_REG_CLASS (regno)];
+    }
+  else
+    {
+      *nregs = 0;
+      return NO_REGS;
+    }
+}
+
+/* Return pressure class and number of hard registers (through *NREGS)
+   for destination of INSN. */
+static enum reg_class
+get_pressure_class_and_nregs (rtx insn, int *nregs)
+{
+  rtx reg;
+  enum reg_class pressure_class;
+  rtx set = single_set (insn);
+
+  /* Considered invariant insns have only one set.  */
+  gcc_assert (set != NULL_RTX);
+  reg = SET_DEST (set);
+  if (GET_CODE (reg) == SUBREG)
+    reg = SUBREG_REG (reg);
+  if (MEM_P (reg))
+    {
+      *nregs = 0;
+      pressure_class = NO_REGS;
+    }
+  else
+    {
+      if (! REG_P (reg))
+	reg = NULL_RTX;
+      if (reg == NULL_RTX)
+	pressure_class = GENERAL_REGS;
+      else
+	{
+	  pressure_class = reg_allocno_class (REGNO (reg));
+	  pressure_class = ira_pressure_class_translate[pressure_class];
+	}
+      *nregs
+	= ira_reg_class_max_nregs[pressure_class][GET_MODE (SET_SRC (set))];
+    }
+  return pressure_class;
+}
+
+/* Increase (if INCR_P) or decrease current register pressure for
+   register REGNO.  */
+static void
+change_pressure (int regno, bool incr_p)
+{
+  int nregs;
+  enum reg_class pressure_class;
+
+  pressure_class = get_regno_pressure_class (regno, &nregs);
+  if (! incr_p)
+    curr_reg_pressure[pressure_class] -= nregs;
+  else
+    {
+      curr_reg_pressure[pressure_class] += nregs;
+      if (BB_DATA (curr_bb)->max_reg_pressure[pressure_class]
+	  < curr_reg_pressure[pressure_class])
+	BB_DATA (curr_bb)->max_reg_pressure[pressure_class]
+	  = curr_reg_pressure[pressure_class];
+    }
+}
+
+/* Mark REGNO birth.  */
+static void
+mark_regno_live (int regno)
+{
+  if (!bitmap_set_bit (&curr_regs_live, regno))
+    return;
+  change_pressure (regno, true);
+}
+
+/* Mark REGNO death.  */
+static void
+mark_regno_death (int regno)
+{
+  if (! bitmap_clear_bit (&curr_regs_live, regno))
+    return;
+  change_pressure (regno, false);
+}
+
+/* Mark register REG death.  */
+static void
+mark_reg_death (rtx reg)
+{
+  int regno = REGNO (reg);
+
+  if (regno >= FIRST_PSEUDO_REGISTER)
+    mark_regno_death (regno);
+  else
+    {
+      int last = regno + hard_regno_nregs[regno][GET_MODE (reg)];
+
+      while (regno < last)
+	{
+	  mark_regno_death (regno);
+	  regno++;
+	}
+    }
+}
+
+/* Mark setting register REG.  */
+static void
+mark_reg_store (rtx reg, const_rtx setter ATTRIBUTE_UNUSED,
+		void *data ATTRIBUTE_UNUSED)
+{
+  int regno;
+
+  if (GET_CODE (reg) == SUBREG)
+    reg = SUBREG_REG (reg);
+
+  if (! REG_P (reg))
+    return;
+
+  regs_set[n_regs_set++] = reg;
+
+  regno = REGNO (reg);
+
+  if (regno >= FIRST_PSEUDO_REGISTER)
+    mark_regno_live (regno);
+  else
+    {
+      int last = regno + hard_regno_nregs[regno][GET_MODE (reg)];
+
+      while (regno < last)
+	{
+	  mark_regno_live (regno);
+	  regno++;
+	}
+    }
+}
+
+/* Calculate register pressure of each basic block.  */
+static void
+calculate_bb_reg_pressure (void)
+{
+  int i;
+  unsigned int j;
+  bitmap_iterator bi;
+  basic_block bb;
+  rtx insn, link;
+  df_ref *def_rec;
+
+  ira_setup_eliminable_regset ();
+  bitmap_initialize (&curr_regs_live, &reg_obstack);
+  FOR_EACH_BB (bb)
+    {
+      curr_bb = bb;
+      bitmap_copy (&curr_regs_live, DF_LR_IN (bb));
+      for (i = 0; i < ira_pressure_classes_num; i++)
+	curr_reg_pressure[ira_pressure_classes[i]] = 0;
+      EXECUTE_IF_SET_IN_BITMAP (&curr_regs_live, 0, j, bi)
+	change_pressure (j, true);
+
+      FOR_BB_INSNS (bb, insn)
+	{
+	  if (! NONDEBUG_INSN_P (insn))
+	    continue;
+
+	  n_regs_set = 0;
+	  for (def_rec = DF_INSN_DEFS (insn); *def_rec; def_rec++)
+	    {
+	      rtx dreg = DF_REF_REG (*def_rec);
+	      if (DF_REF_FLAGS (*def_rec) & DF_REF_MUST_CLOBBER)
+		mark_reg_store (dreg, NULL, NULL);
+	    }
+
+	  /* Mark any registers dead after INSN as dead now.  */
+	  for (link = REG_NOTES (insn); link; link = XEXP (link, 1))
+	    if (REG_NOTE_KIND (link) == REG_DEAD)
+	      mark_reg_death (XEXP (link, 0));
+
+	  /* Mark any registers set in INSN as live,
+	     and mark them as conflicting with all other live regs.
+	     Clobbers are processed again, so they conflict with
+	     the registers that are set.  */
+	  for (def_rec = DF_INSN_DEFS (insn); *def_rec; def_rec++)
+	    {
+	      rtx dreg = DF_REF_REG (*def_rec);
+	      if (!(DF_REF_FLAGS (*def_rec) & DF_REF_MAY_CLOBBER))
+		mark_reg_store (dreg, NULL, NULL);
+	    }
+
+#ifdef AUTO_INC_DEC
+	  for (link = REG_NOTES (insn); link; link = XEXP (link, 1))
+	    if (REG_NOTE_KIND (link) == REG_INC)
+	      mark_reg_store (XEXP (link, 0), NULL_RTX, NULL);
+#endif
+	  while (n_regs_set-- > 0)
+	    {
+	      rtx note = find_regno_note (insn, REG_UNUSED,
+					  REGNO (regs_set[n_regs_set]));
+	      if (! note)
+		continue;
+
+	      mark_reg_death (XEXP (note, 0));
+	    }
+	}
+    }
+  bitmap_clear (&curr_regs_live);
+
+  if (dump_file == NULL)
+    return;
+  FOR_EACH_BB (bb)
+    {
+      fprintf (dump_file, "\n  Basic block %d Pressure: \n", bb->index);
+      for (i = 0; (int) i < ira_pressure_classes_num; i++)
+	{
+	  enum reg_class pressure_class;
+
+	  pressure_class = ira_pressure_classes[i];
+	  if (BB_DATA (bb)->max_reg_pressure[pressure_class] == 0)
+	    continue;
+	  fprintf (dump_file, " %s=%d", reg_class_names[pressure_class],
+		   BB_DATA (bb)->max_reg_pressure[pressure_class]);
+	}
+      fprintf (dump_file, "\n");
+    }
+}
+
 /* Top level routine to perform one code hoisting (aka unification) pass
 
    Return nonzero if a change was made.  */
@@ -3166,6 +3500,16 @@ one_code_hoisting_pass (void)
 
   doing_code_hoisting_p = true;
 
+  /* Calculate register pressure for each basic block.  */
+  if (flag_ira_hoist_pressure == 1)
+    {
+      regstat_init_n_sets_and_refs ();
+      ira_set_pseudo_classes (false, dump_file);
+      alloc_aux_for_blocks (sizeof (struct bb_data));
+      calculate_bb_reg_pressure ();
+      regstat_free_n_sets_and_refs ();
+    }
+
   /* We need alias.  */
   init_alias_analysis ();
 
@@ -3186,6 +3530,11 @@ one_code_hoisting_pass (void)
       free_code_hoist_mem ();
     }
 
+  if (flag_ira_hoist_pressure == 1)
+    {
+      free_aux_for_blocks ();
+      free_reg_info ();
+    }
   free_hash_table (&expr_hash_table);
   free_gcse_mem ();
   obstack_free (&gcse_obstack, NULL);
Index: gcc/loop-invariant.c
===================================================================
--- gcc/loop-invariant.c	(revision 191816)
+++ gcc/loop-invariant.c	(working copy)
@@ -1915,7 +1915,7 @@ move_loop_invariants (void)
     {
       df_analyze ();
       regstat_init_n_sets_and_refs ();
-      ira_set_pseudo_classes (dump_file);
+      ira_set_pseudo_classes (true, dump_file);
       calculate_loop_reg_pressure ();
       regstat_free_n_sets_and_refs ();
     }
Index: gcc/common.opt
===================================================================
--- gcc/common.opt	(revision 191816)
+++ gcc/common.opt	(working copy)
@@ -1395,6 +1395,11 @@ Enum(ira_region) String(all) Value(IRA_REGION_ALL)
 EnumValue
 Enum(ira_region) String(mixed) Value(IRA_REGION_MIXED)
 
+fira-hoist-pressure
+Common Report Var(flag_ira_hoist_pressure) Init(-1)
+Use IRA based register pressure calculation
+in hoist optimizations.
+
 fira-loop-pressure
 Common Report Var(flag_ira_loop_pressure)
 Use IRA based register pressure calculation
Index: gcc/ira.c
===================================================================
--- gcc/ira.c	(revision 191816)
+++ gcc/ira.c	(working copy)
@@ -4164,7 +4164,7 @@ ira (FILE *f)
   crtl->is_leaf = leaf_function_p ();
 
   if (resize_reg_info () && flag_ira_loop_pressure)
-    ira_set_pseudo_classes (ira_dump_file);
+    ira_set_pseudo_classes (true, ira_dump_file);
 
   rebuild_p = update_equiv_regs ();
 
Index: gcc/ira.h
===================================================================
--- gcc/ira.h	(revision 191816)
+++ gcc/ira.h	(working copy)
@@ -125,7 +125,7 @@ extern void ira_init (void);
 extern void ira_finish_once (void);
 extern void ira_setup_eliminable_regset (void);
 extern rtx ira_eliminate_regs (rtx, enum machine_mode);
-extern void ira_set_pseudo_classes (FILE *);
+extern void ira_set_pseudo_classes (bool, FILE *);
 extern void ira_implicitly_set_insn_hard_regs (HARD_REG_SET *);
 
 extern void ira_sort_regnos_for_alter_reg (int *, int, unsigned int *);
Index: gcc/ira-costs.c
===================================================================
--- gcc/ira-costs.c	(revision 191816)
+++ gcc/ira-costs.c	(working copy)
@@ -2048,9 +2048,10 @@ ira_costs (void)
   ira_free (total_allocno_costs);
 }
 
-/* Entry function which defines classes for pseudos.  */
+/* Entry function which defines classes for pseudos.
+   Set pseudo_classes_defined_p only if DEFINE_PSEUDO_CLASSES is true.  */
 void
-ira_set_pseudo_classes (FILE *dump_file)
+ira_set_pseudo_classes (bool define_pseudo_classes, FILE *dump_file)
 {
   allocno_p = false;
   internal_flag_ira_verbose = flag_ira_verbose;
@@ -2059,7 +2060,9 @@ void
   initiate_regno_cost_classes ();
   find_costs_and_classes (dump_file);
   finish_regno_cost_classes ();
-  pseudo_classes_defined_p = true;
+  if (define_pseudo_classes)
+    pseudo_classes_defined_p = true;
+
   finish_costs ();
 }
 
Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c	(revision 191816)
+++ gcc/config/arm/arm.c	(working copy)
@@ -2021,6 +2021,11 @@ arm_option_override (void)
       && current_tune->num_prefetch_slots > 0)
     flag_prefetch_loop_arrays = 1;
 
+  /* Enable register pressure hoist when optimizing for size on Thumb1 set.  */
+  if (TARGET_THUMB1 && optimize_function_for_size_p (cfun)
+      && flag_ira_hoist_pressure == -1)
+    flag_ira_hoist_pressure = 1;
+
   /* Set up parameters to be used in prefetching algorithm.  Do not override the
      defaults unless we are tuning for a core we have researched values for.  */
   if (current_tune->num_prefetch_slots > 0)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFA] Implement register pressure directed hoist pass
       [not found]   ` <50669835.e988440a.4421.ffffa17cSMTPIN_ADDED@mx.google.com>
@ 2012-10-01  9:40     ` Steven Bosscher
  2012-10-08  2:59       ` Bin Cheng
  0 siblings, 1 reply; 16+ messages in thread
From: Steven Bosscher @ 2012-10-01  9:40 UTC (permalink / raw)
  To: Bin Cheng; +Cc: gcc-patches

On Sat, Sep 29, 2012 at 8:37 AM, Bin Cheng <bin.cheng@arm.com> wrote:
> This is the updated patch according to your comments. Please review.
> I also re-collected code size data and found it is improved by about 0.24%
> for mips, which is better than previous data. I believe this should be
> caused by recent changes in trunk, rather than by using DF caches to
> calculate register pressure.

Hello,

Thanks for the update. The first look wasn't a very thorough review,
so I have more comments now. Sorry for that, I should have taken the
time for this the first time round...


First, as a general note: Please add a fat comment somewhere before
the code of the pass to explain how the register pressure driven
hoisting pass works. The existing implementation used to be an almost
one-to-one copy from Muchnick's book, but lately the code has moved
away from that (e.g. with Maxim's patches from last year) and it gets
a bit hard to follow without some explanation. You should at least
document your own algorithm, but I'd really appreciate if you could
also spend some time writing down how things work in general and/or
how the GCC implementation differs from Muchnick's original algorithm.


> +Use IRA to evaluate register pressure in hoist pass for decisions to hoist

"the code hoisting pass" (brownie points for an xref to the flag for
code hoisting).


> -   - do rough calc of how many regs are needed in each block, and a rough
> -     calc of how many regs are available in each class and use that to
> -     throttle back the code in cases where RTX_COST is minimal.

This comment still applies to LCM.


> +/* Record all regs that are set in any one insn.  Communication from
> +   mark_reg_{store,clobber} and global_conflicts.  Asm can refer to
> +   all hard-registers.  */
> +static rtx regs_set[(FIRST_PSEUDO_REGISTER > MAX_RECOG_OPERANDS
> +		     ? FIRST_PSEUDO_REGISTER : MAX_RECOG_OPERANDS) * 2];
> +/* Number of regs stored in the previous array.  */
> +static int n_regs_set;

You can use DF_INSN_DEFS for this in calculate_bb_reg_pressure.

But then again...

> +	  while (n_regs_set-- > 0)
> +	    {
> +	      rtx note = find_regno_note (insn, REG_UNUSED,
> +					  REGNO (regs_set[n_regs_set]));
> +	      if (! note)
> +		continue;
> +
> +	      mark_reg_death (XEXP (note, 0));
> +	    }

Why not just mark all registers mentioned in REG_UNUSED notes as death, i.e.:

for (note = REG_NOTES (insn); note; note = XEXP (note, 1))
  if ((REG_NOTE_KIND (note) == REG_UNUSED)
    mark_reg_death (XEXP (note, 0));

?


> +static int hoist_expr_reaches_here_p (basic_block, struct expr*, basic_block,

space between "struct expr" and "*".


> +				      int *, bitmap_head *);

"bitmap_head *" => "bitmap".
Likewise here:

> 	      /* Basic blocks that have occurrences reachable from BB.  */
> 	      bitmap_head _from_bbs, *from_bbs = &_from_bbs;
>+	      /* Basic blocks through which expr is hoisted.  */
>+	      bitmap_head _hoisted_bbs, *hoisted_bbs = &_hoisted_bbs;

And using BITMAP_ALLOC here:
> 	      bitmap_initialize (from_bbs, 0);
> +	      if (flag_ira_hoist_pressure == 1)
> +		bitmap_initialize (hoisted_bbs, 0);

(Some older code uses the "bitmap_head *" form, but that pre-dates
coretypes.h. Newer code that uses the "old form" should really be
fixed also.)

Please also use an obstack for the hoisted_bbs and from_bbs bitmaps.
They're currently allocated in GC space but that's completely
unnecessary for objects with such a clearly identifiable life time.
You can even allocate these bitmaps outside the loop, you're already
calling bitmap_clear on them.


> EXPR_BB. Stop

Two spaces after a period in comments.


> @@ -2863,7 +2909,8 @@ static int
>    if (visited == NULL)
>      {
>        visited_allocated_locally = 1;
> -      visited = XCNEWVEC (char, last_basic_block);
> +      visited = sbitmap_alloc (last_basic_block);
> +      sbitmap_zero (visited);
>      }

Can you please submit this as a separate patch? I'd go ahead and
commit it as obvious, but it's always good to have one change per
patch and this bit is independent of the reg-pressure dependent
hoisting.


> +fira-hoist-pressure
> +Common Report Var(flag_ira_hoist_pressure) Init(-1)
> +Use IRA based register pressure calculation
> +in hoist optimizations.

Please add the "Optimization" marker. Why initialize to -1? The
default initialization is 0, and if the flag is set it takes a value
1. If you follow that common behavior, you can replace all occurrences
of "if (flag_ira_hoist_pressure == 1)" with just ""if
(flag_ira_hoist_pressure)".


> +  /* Enable register pressure hoist when optimizing for size on Thumb1 set.  */
> +  if (TARGET_THUMB1 && optimize_function_for_size_p (cfun)
> +      && flag_ira_hoist_pressure == -1)
> +    flag_ira_hoist_pressure = 1;

One would expect this to be a win on all targets, but you probably
looked at this (at least Thumb2 and plain ARM). Did your patch not
help there? Otherwise, I'd be in favor of enabling this option by
default for all targets.


A few comments about the data flow bits:

> +#ifdef AUTO_INC_DEC
> +	  for (link = REG_NOTES (insn); link; link = XEXP (link, 1))
> +	    if (REG_NOTE_KIND (link) == REG_INC)
> +	      mark_reg_store (XEXP (link, 0), NULL_RTX, NULL);
> +#endif

AFAIR, you can't have REG_INC notes at this stage except for stack ops
(push/pop). But in any case, DF handles auto-increments insns fine
without looking at REG_INC notes so this piece shouldn't be needed.


> +
> +/* Mark setting register REG.  */
> +static void
> +mark_reg_store (rtx reg, const_rtx setter ATTRIBUTE_UNUSED,
> +		void *data ATTRIBUTE_UNUSED)
> +{
> +  int regno;
> +
> +  if (GET_CODE (reg) == SUBREG)
> +    reg = SUBREG_REG (reg);

Please use DF_REF_REAL_REG in  the caller instead.


> +  if (regno >= FIRST_PSEUDO_REGISTER)
> +    mark_regno_live (regno);
> +  else
> +    {
> +      int last = regno + hard_regno_nregs[regno][GET_MODE (reg)];
> +
> +      while (regno < last)
> +	{
> +	  mark_regno_live (regno);
> +	  regno++;
> +	}
> +    }

With the DF caches, all "sub-registers" of a multi-word hard register
have a DEF (or USE) in the cache, so you don't have to do this while
loop.

With removing regs_set, using DF_REF_REAL_REG and not looping,
mark_reg_store will be just a wrapper around mark_regno_live so you
can eliminate mark_reg_store entirely :-)


> +calculate_bb_reg_pressure (void)
...
> +      FOR_BB_INSNS (bb, insn)

Ideally, you'd be walking insns from last to first, so that you don't
need REG_DEAD notes and use DF_INSN_DEFS to find dieing registers
(eliminating the need for the loop on multi-word hard registers in
mark_reg_death).

Even better would be if you can use df_simulate_one_insn_backwards and
friends, but I don't immediately see how to hook that up with your
change_pressure() function.


> +  reg = SET_DEST (set);
> +  if (GET_CODE (reg) == SUBREG)
> +    reg = SUBREG_REG (reg);
> +  if (MEM_P (reg))
> +    {
> +      *nregs = 0;
> +      pressure_class = NO_REGS;
> +    }
> +  else
> +    {
> +      if (! REG_P (reg))
> +	reg = NULL_RTX;
> +      if (reg == NULL_RTX)
> +	pressure_class = GENERAL_REGS;

gcc_assert (REG_P (reg))? If reg is not a MEM, it must be a REG or
it's not recorded in compute_hash_table. In fact, AFAIR reg must be a
REG unless flag_gcse_las is set, and I don't know if that even works
with code hoisting. It's not a particularly well tested flag (although
it's a very useful one for most RISC targets, maybe you should have a
look at that, too, for ARM variants).

Ciao!
Steven

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFA] Implement register pressure directed hoist pass
  2012-09-29 11:54   ` Bin Cheng
@ 2012-10-02 12:58     ` Jeff Law
  2012-10-08  3:53       ` Bin Cheng
                         ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Jeff Law @ 2012-10-02 12:58 UTC (permalink / raw)
  To: Bin Cheng; +Cc: 'Steven Bosscher', gcc-patches

On 09/29/2012 12:37 AM, Bin Cheng wrote:
> Hi Steven,
>
> This is the updated patch according to your comments. Please review.
> I also re-collected code size data and found it is improved by about 0.24%
> for mips, which is better than previous data. I believe this should be
> caused by recent changes in trunk, rather than by using DF caches to
> calculate register pressure.
>
> Thanks.
>
> 2012-09-29  Bin Cheng<bin.cheng@arm.com>
>
> 	* common.opt (flag_ira_hoist_pressure): New.
> 	* doc/invoke.texi (-fira-hoist-pressure): Describe.
> 	* ira-costs.c (ira_set_pseudo_classes): New parameter.
> 	* ira.h (ira_set_pseudo_classes): Update prototype.
> 	* haifa-sched.c (sched_init): Update call.
> 	* ira.c (ira): Update call.
> 	* regmove.c (regmove_optimize): Update call.
> 	* loop-invariant.c (move_loop_invariants): Update call.
> 	* gcse.c (struct bb_data): New structure.
> 	(BB_DATA): New macro.
> 	(curr_bb, curr_regs_live, curr_reg_pressure, regs_set)
> 	(n_regs_set): New static variables.
> 	(hoist_expr_reaches_here_p): Use reg pressure to determine the
> 	distance expr can be hoisted.
> 	(hoist_code): Use reg pressure to direct the hoist process.
> 	(get_regno_pressure_class, get_pressure_class_and_nregs)
> 	(change_pressure, mark_regno_live, mark_regno_death)
> 	(mark_reg_death, mark_reg_store, calculate_bb_reg_pressure): New.
> 	(one_code_hoisting_pass): Calculate register pressure. Free data.
> 	* config/arm/arm.c (arm_option_override): Set
> flag_ira_hoist_pressure
> 	on Thumb1 when optimizing for size.
>
>
> hoist-reg-pressure-20120929.txt
> +@item -fira-hoist-pressure
> +@opindex fira-hoist-pressure
> +Use IRA to evaluate register pressure in hoist pass for decisions to hoist
> +expressions.  This option usually results in generation of smaller code on
> +RISC machines, but it can slow the compiler down.
I wouldn't use CISC/RISC here; I'd just say it usually results in 
smaller code.

You need to update the copyright year in gcse.c, ira.h, regmove.c, and 
loop-invariant.c.

> +      /* Only decrease distance if bb has high register pressure or EXPR
> +	 is const expr, otherwise EXPR can be hoisted through bb without
> +	 cost.  */
?!?  This comment makes no sense to me.  To accurately know how hoisting 
an expression affects pressure you have to look at the inputs and output 
and see how their lifetime has changed.

In general:

For inputs, hoisting *may* reduce pressure.  You really have to look at 
how the life of the input changes based on the new location of the insn. 
  For example, if the input's lifetime is unchanged (say perhaps because 
it was live after the insn we want to hoist), then hoisting will have no 
impact. Otherwise the input's life is shortened, but to know by how much 
you have to determine whether the new death of the input occurs (it may 
still die in the hoisted insn or it may die elsewhere).

For an output, hoisting will (effectively) always extend the lifetime.

I've speculated that the right way to deal with register pressure in 
code motion is to actually build the dependency graph and use that to 
guide the code motions.  I've never cobbled together any real code to do 
this though.

Can we find a better name for hoist_expr_reaches_here_p since it's no 
longer just dealing with reachability -- it has heuristics now for 
profitability as well.

> @@ -2863,7 +2909,8 @@ static int
>     if (visited == NULL)
>       {
>         visited_allocated_locally = 1;
> -      visited = XCNEWVEC (char, last_basic_block);
> +      visited = sbitmap_alloc (last_basic_block);
> +      sbitmap_zero (visited);
>       }
What's the purpose behind changing visited from a simple array to a 
sbitmap?  I'm not objecting, but would like to hear the rationale behind 
that change.  I'll also note it wasn't mentioned in the ChangeLog.

Similarly what's the rationale behind passing the expression itself 
rather than just its index?  I don't see where we need to use anything 
other than the index in this code.  And again, this change isn't 
mentioned in the ChangeLog.

> +  /* Considered invariant insns have only one set.  */
> +  gcc_assert (set != NULL_RTX);
> +  reg = SET_DEST (set);
> +  if (GET_CODE (reg) == SUBREG)
> +    reg = SUBREG_REG (reg);
> +  if (MEM_P (reg))
> +    {
> +      *nregs = 0;
> +      pressure_class = NO_REGS;
> +    }
Don't you need to look at the addresses within the MEM?



> Index: gcc/config/arm/arm.c
> ===================================================================
> --- gcc/config/arm/arm.c	(revision 191816)
> +++ gcc/config/arm/arm.c	(working copy)
> @@ -2021,6 +2021,11 @@ arm_option_override (void)
>         && current_tune->num_prefetch_slots > 0)
>       flag_prefetch_loop_arrays = 1;
>
> +  /* Enable register pressure hoist when optimizing for size on Thumb1 set.  */
> +  if (TARGET_THUMB1 && optimize_function_for_size_p (cfun)
> +      && flag_ira_hoist_pressure == -1)
> +    flag_ira_hoist_pressure = 1;
I'd rather see this enabled for all targets when optimizing for size; 
that guarantees more testing.  Even if it doesn't help a particular 
target, as long as it isn't hurting we're better off enabling it.

I don't see any testsuite additions -- it'd really help long term to 
have some tests for this stuff.

You should address the issues noted above and repost for final review 
before likely integration.



jeff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH RFA] Implement register pressure directed hoist pass
  2012-10-01  9:40     ` Steven Bosscher
@ 2012-10-08  2:59       ` Bin Cheng
  0 siblings, 0 replies; 16+ messages in thread
From: Bin Cheng @ 2012-10-08  2:59 UTC (permalink / raw)
  To: 'Steven Bosscher'; +Cc: gcc-patches

Hi Steven,
Thanks for the comments and sorry for the delay with this message.

> -----Original Message-----
> 
> Hello,
> 
> Thanks for the update. The first look wasn't a very thorough review, so I
have
> more comments now. Sorry for that, I should have taken the time for this
the
> first time round...

It's OK.

> > -   - do rough calc of how many regs are needed in each block, and a
rough
> > -     calc of how many regs are available in each class and use that to
> > -     throttle back the code in cases where RTX_COST is minimal.
> 
> This comment still applies to LCM.

It seems this comment just express the idea using register pressure
information to direct code movement for expressions with small
RTX_COST(max_distance). But the threshold of max_distance is only
implemented for hoist in GCC, I assume this comment does not affect LCM.
Could you explain more how this comments applies to LCM?

> But then again...
> 
> > +	  while (n_regs_set-- > 0)
> > +	    {
> > +	      rtx note = find_regno_note (insn, REG_UNUSED,
> > +					  REGNO (regs_set[n_regs_set]));
> > +	      if (! note)
> > +		continue;
> > +
> > +	      mark_reg_death (XEXP (note, 0));
> > +	    }
> 
> Why not just mark all registers mentioned in REG_UNUSED notes as death,
i.e.:
> 
> for (note = REG_NOTES (insn); note; note = XEXP (note, 1))
>   if ((REG_NOTE_KIND (note) == REG_UNUSED)
>     mark_reg_death (XEXP (note, 0));
> 
> ?

I don't know the exact reason, can only guess REG_UNUSED notes are recorded
only for set registers here. Since this calculation code is a copy of
existing codes, maybe Vlad can help explain this? Thanks.

> 
> 
> > +fira-hoist-pressure
> > +Common Report Var(flag_ira_hoist_pressure) Init(-1) Use IRA based
> > +register pressure calculation in hoist optimizations.
> 
> Please add the "Optimization" marker. Why initialize to -1? The default
> initialization is 0, and if the flag is set it takes a value 1. If you
follow
> that common behavior, you can replace all occurrences of "if
> (flag_ira_hoist_pressure == 1)" with just ""if (flag_ira_hoist_pressure)".
> 

The reason is because we only want to enable this for Thumb1 originally, and
make it possible to disable it for Thumb1 on command line by specifying
"-fno-ira-hoist-pressure".

> 
> > +  /* Enable register pressure hoist when optimizing for size on
> > + Thumb1 set.  */  if (TARGET_THUMB1 && optimize_function_for_size_p
(cfun)
> > +      && flag_ira_hoist_pressure == -1)
> > +    flag_ira_hoist_pressure = 1;
> 
> One would expect this to be a win on all targets, but you probably looked
at
> this (at least Thumb2 and plain ARM). Did your patch not help there?
Otherwise,
> I'd be in favor of enabling this option by default for all targets.

From the data I collected, it helps for Thumb1/ARM/Mips and maybe x86_64 a
little bit. It has no obvious effect on Thumb2/x86.
Since Jeff also expressed that it might be enabled for all targets, I will
collect size data for more targets. Your previous concern will also be
addressed in this way.

> 
> 
> > +  reg = SET_DEST (set);
> > +  if (GET_CODE (reg) == SUBREG)
> > +    reg = SUBREG_REG (reg);
> > +  if (MEM_P (reg))
> > +    {
> > +      *nregs = 0;
> > +      pressure_class = NO_REGS;
> > +    }
> > +  else
> > +    {
> > +      if (! REG_P (reg))
> > +	reg = NULL_RTX;
> > +      if (reg == NULL_RTX)
> > +	pressure_class = GENERAL_REGS;
> 
> gcc_assert (REG_P (reg))? If reg is not a MEM, it must be a REG or it's
not
> recorded in compute_hash_table. In fact, AFAIR reg must be a REG unless
> flag_gcse_las is set, and I don't know if that even works with code
hoisting.
> It's not a particularly well tested flag (although it's a very useful one
for
> most RISC targets, maybe you should have a look at that, too, for ARM
> variants).

Again, it is a copy from loop-invariant.c. It might be rewritten/optimized
for hoist, but should we keep it same with original copy in
loop-invariant.c?

> 
> Ciao!
> Steven

Thanks again for your review, all other comments are accepted and I will
work out an updated patch for review.




^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH RFA] Implement register pressure directed hoist pass
  2012-10-02 12:58     ` Jeff Law
@ 2012-10-08  3:53       ` Bin Cheng
  2012-10-11  7:10       ` Bin Cheng
       [not found]       ` <50766d69.a853420a.2475.fffffbafSMTPIN_ADDED@mx.google.com>
  2 siblings, 0 replies; 16+ messages in thread
From: Bin Cheng @ 2012-10-08  3:53 UTC (permalink / raw)
  To: 'Jeff Law'; +Cc: 'Steven Bosscher', gcc-patches

Hi Jeff,
Thanks for reviewing and sorry for this delayed message.

> 
> > +      /* Only decrease distance if bb has high register pressure or
EXPR
> > +	 is const expr, otherwise EXPR can be hoisted through bb without
> > +	 cost.  */
> ?!?  This comment makes no sense to me.  To accurately know how hoisting
> an expression affects pressure you have to look at the inputs and output
> and see how their lifetime has changed.
> 
> In general:
> 
> For inputs, hoisting *may* reduce pressure.  You really have to look at
> how the life of the input changes based on the new location of the insn.
>   For example, if the input's lifetime is unchanged (say perhaps because
> it was live after the insn we want to hoist), then hoisting will have no
> impact. Otherwise the input's life is shortened, but to know by how much
> you have to determine whether the new death of the input occurs (it may
> still die in the hoisted insn or it may die elsewhere).
> 
> For an output, hoisting will (effectively) always extend the lifetime.
> 
> I've speculated that the right way to deal with register pressure in
> code motion is to actually build the dependency graph and use that to
> guide the code motions.  I've never cobbled together any real code to do
> this though.

Yes, that's exactly what I mean "simulate register pressure change of each
basic block accurately" in the patch. The current patch works in the way it
only extend the lifetime of output registers of hoisted expressions, while
does not consider the possibility of decreasing of register pressure because
of input registers. The rationales are:
1. Calculation live range info and maintain of the info during hoisting
process are very costly.
2. From the observation during the work, I found most of expression hoisted
are constants and base-addresses, means the possibility of decreasing
register pressure in hoisting expression is relatively low.

This patch does not decrease distance if the basic block's register pressure
is low, making the expression can be hoisted up in flow graph longer. When
register pressure is high, the patch retreats to original algorithm.

Also I observed lots of size regressions by hoisting constant expressions
aggressively along with other expressions, so the patch uses original
heuristic for constant expressions.

> > @@ -2863,7 +2909,8 @@ static int
> >     if (visited == NULL)
> >       {
> >         visited_allocated_locally = 1;
> > -      visited = XCNEWVEC (char, last_basic_block);
> > +      visited = sbitmap_alloc (last_basic_block);
> > +      sbitmap_zero (visited);
> >       }
> What's the purpose behind changing visited from a simple array to a
> sbitmap?  I'm not objecting, but would like to hear the rationale behind
> that change.  I'll also note it wasn't mentioned in the ChangeLog.

Because I have to iterate over visited basic blocks to update register
pressure information, the operation is less effective on integer array.
Maybe I should submit this as a separate patch as suggested by Steven.

> 
> Similarly what's the rationale behind passing the expression itself
> rather than just its index?  I don't see where we need to use anything
> other than the index in this code.  And again, this change isn't
> mentioned in the ChangeLog.

The expression itself is needed to check "CONST_INT_P (expr->expr)".

> 
> > +  /* Considered invariant insns have only one set.  */
> > +  gcc_assert (set != NULL_RTX);
> > +  reg = SET_DEST (set);
> > +  if (GET_CODE (reg) == SUBREG)
> > +    reg = SUBREG_REG (reg);
> > +  if (MEM_P (reg))
> > +    {
> > +      *nregs = 0;
> > +      pressure_class = NO_REGS;
> > +    }
> Don't you need to look at the addresses within the MEM?

This function is a copy from register pressure calculation code in
loop-invariant.c
I think MEM should be ignored here, because as in function hash_scan_set, we
won't honor stores in memory when flag_gcse_las is not enabled. 

> 
> > Index: gcc/config/arm/arm.c
> > ===================================================================
> > --- gcc/config/arm/arm.c	(revision 191816)
> > +++ gcc/config/arm/arm.c	(working copy)
> > @@ -2021,6 +2021,11 @@ arm_option_override (void)
> >         && current_tune->num_prefetch_slots > 0)
> >       flag_prefetch_loop_arrays = 1;
> >
> > +  /* Enable register pressure hoist when optimizing for size on Thumb1
set.
> */
> > +  if (TARGET_THUMB1 && optimize_function_for_size_p (cfun)
> > +      && flag_ira_hoist_pressure == -1)
> > +    flag_ira_hoist_pressure = 1;
> I'd rather see this enabled for all targets when optimizing for size;
> that guarantees more testing.  Even if it doesn't help a particular
> target, as long as it isn't hurting we're better off enabling it.

I will collect size data for more targets.

> 
> I don't see any testsuite additions -- it'd really help long term to
> have some tests for this stuff.

It's some kind of difficult to create stable cases for hoist, especially
having register pressure information incorporated. I will try it anyway.
Could you give me some advices on this?

> 
> You should address the issues noted above and repost for final review
> before likely integration.
> 
All other comments are accepted.
I will work out an updated patch addressing all your/Steven's concerns, if
you agree my explanations in this message.

Thanks.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH RFA] Implement register pressure directed hoist pass
  2012-10-02 12:58     ` Jeff Law
  2012-10-08  3:53       ` Bin Cheng
@ 2012-10-11  7:10       ` Bin Cheng
       [not found]       ` <50766d69.a853420a.2475.fffffbafSMTPIN_ADDED@mx.google.com>
  2 siblings, 0 replies; 16+ messages in thread
From: Bin Cheng @ 2012-10-11  7:10 UTC (permalink / raw)
  To: 'Jeff Law'; +Cc: 'Steven Bosscher', gcc-patches

[-- Attachment #1: Type: text/plain, Size: 8443 bytes --]

Hi Jeff, Steven,

This is the updated patch according to your comments. The main changes
includes:
1. Enable the option for all target at Os level by default.
2. Add a target independent test case.
3. Add comments on how the algorithm works.
4. Simplify the calculation of register pressure info by using DF caches.

Jeff, as for accurately simulating the change of register pressure made by
hoisting expression, I explained a little in previous message. According to
observation, cases in which register pressure decreases are relatively rare,
so this patch does not implement it. I can continue to improve code hoisting
that way in the future.

I bootstrapped it on x86_64 and x86, also tested on x86_64/x86/arm.
I recollected size info for CSiBE on miscellaneous targets as following:
		improvement
ARM		0.12%
Thumb1		0.13%
mips		0.24%
powerpc	0.59%
Thumb2		X
x86		X
x86_64		X
(X means no obvious effect, unfortunately.)

Also, since this is the simplification of previous patch, I think there is
no slowdown in compilation.
 
Please review. Thanks.

And updated ChangeLog entry:

2012-10-11  Bin Cheng  <bin.cheng@arm.com>

	* common.opt (flag_ira_hoist_pressure): New.
	* doc/invoke.texi (-fira-hoist-pressure): Describe.
	* ira-costs.c (ira_set_pseudo_classes): New parameter.
	* ira.h: Update copyright dates.
	(ira_set_pseudo_classes): Update prototype.
	* haifa-sched.c (sched_init): Update call.
	* ira.c (ira): Update call.
	* regmove.c: Update copyright dates.
	(regmove_optimize): Update call.
	* loop-invariant.c: Update copyright dates.
	(move_loop_invariants): Update call.
	* gcse.c: Update copyright dates.
	(struct bb_data): New structure.
	(BB_DATA): New macro.
	(curr_bb, curr_reg_pressure): New static variables.
	(should_hoist_expr_to_dom): Rename from hoist_expr_reaches_here_p.
	Change parameter expr_index to expr.
	Change parameter visited.
	New parameters pressure_class, nregs and hoisted_bbs.
	Use reg pressure to determine the distance expr can be hoisted.
	(hoist_code): Use reg pressure to direct the hoist process.
	(get_regno_pressure_class, get_pressure_class_and_nregs)
	(change_pressure, calculate_bb_reg_pressure): New.
	(one_code_hoisting_pass): Calculate register pressure. Allocate
	and free data.

gcc/testsuite/ChangeLog
2012-10-11  Bin Cheng  <bin.cheng@arm.com>

	* testsuite/gcc.dg/hoist-register-pressure.c: New test.


> -----Original Message-----
> From: Jeff Law [mailto:law@redhat.com]
> Sent: Tuesday, October 02, 2012 8:58 PM
> To: Bin Cheng
> Cc: 'Steven Bosscher'; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH RFA] Implement register pressure directed hoist pass
> 
> On 09/29/2012 12:37 AM, Bin Cheng wrote:
> > Hi Steven,
> >
> > This is the updated patch according to your comments. Please review.
> > I also re-collected code size data and found it is improved by about
> > 0.24% for mips, which is better than previous data. I believe this
> > should be caused by recent changes in trunk, rather than by using DF
> > caches to calculate register pressure.
> >
> > Thanks.
> >
> > 2012-09-29  Bin Cheng<bin.cheng@arm.com>
> >
> > 	* common.opt (flag_ira_hoist_pressure): New.
> > 	* doc/invoke.texi (-fira-hoist-pressure): Describe.
> > 	* ira-costs.c (ira_set_pseudo_classes): New parameter.
> > 	* ira.h (ira_set_pseudo_classes): Update prototype.
> > 	* haifa-sched.c (sched_init): Update call.
> > 	* ira.c (ira): Update call.
> > 	* regmove.c (regmove_optimize): Update call.
> > 	* loop-invariant.c (move_loop_invariants): Update call.
> > 	* gcse.c (struct bb_data): New structure.
> > 	(BB_DATA): New macro.
> > 	(curr_bb, curr_regs_live, curr_reg_pressure, regs_set)
> > 	(n_regs_set): New static variables.
> > 	(hoist_expr_reaches_here_p): Use reg pressure to determine the
> > 	distance expr can be hoisted.
> > 	(hoist_code): Use reg pressure to direct the hoist process.
> > 	(get_regno_pressure_class, get_pressure_class_and_nregs)
> > 	(change_pressure, mark_regno_live, mark_regno_death)
> > 	(mark_reg_death, mark_reg_store, calculate_bb_reg_pressure): New.
> > 	(one_code_hoisting_pass): Calculate register pressure. Free data.
> > 	* config/arm/arm.c (arm_option_override): Set
flag_ira_hoist_pressure
> > 	on Thumb1 when optimizing for size.
> >
> >
> > hoist-reg-pressure-20120929.txt
> > +@item -fira-hoist-pressure
> > +@opindex fira-hoist-pressure
> > +Use IRA to evaluate register pressure in hoist pass for decisions to
hoist
> > +expressions.  This option usually results in generation of smaller code
on
> > +RISC machines, but it can slow the compiler down.
> I wouldn't use CISC/RISC here; I'd just say it usually results in
> smaller code.
> 
> You need to update the copyright year in gcse.c, ira.h, regmove.c, and
> loop-invariant.c.
> 
> > +      /* Only decrease distance if bb has high register pressure or
EXPR
> > +	 is const expr, otherwise EXPR can be hoisted through bb without
> > +	 cost.  */
> ?!?  This comment makes no sense to me.  To accurately know how hoisting
> an expression affects pressure you have to look at the inputs and output
> and see how their lifetime has changed.
> 
> In general:
> 
> For inputs, hoisting *may* reduce pressure.  You really have to look at
> how the life of the input changes based on the new location of the insn.
>   For example, if the input's lifetime is unchanged (say perhaps because
> it was live after the insn we want to hoist), then hoisting will have no
> impact. Otherwise the input's life is shortened, but to know by how much
> you have to determine whether the new death of the input occurs (it may
> still die in the hoisted insn or it may die elsewhere).
> 
> For an output, hoisting will (effectively) always extend the lifetime.
> 
> I've speculated that the right way to deal with register pressure in
> code motion is to actually build the dependency graph and use that to
> guide the code motions.  I've never cobbled together any real code to do
> this though.
> 
> Can we find a better name for hoist_expr_reaches_here_p since it's no
> longer just dealing with reachability -- it has heuristics now for
> profitability as well.
> 
> > @@ -2863,7 +2909,8 @@ static int
> >     if (visited == NULL)
> >       {
> >         visited_allocated_locally = 1;
> > -      visited = XCNEWVEC (char, last_basic_block);
> > +      visited = sbitmap_alloc (last_basic_block);
> > +      sbitmap_zero (visited);
> >       }
> What's the purpose behind changing visited from a simple array to a
> sbitmap?  I'm not objecting, but would like to hear the rationale behind
> that change.  I'll also note it wasn't mentioned in the ChangeLog.
> 
> Similarly what's the rationale behind passing the expression itself
> rather than just its index?  I don't see where we need to use anything
> other than the index in this code.  And again, this change isn't
> mentioned in the ChangeLog.
> 
> > +  /* Considered invariant insns have only one set.  */
> > +  gcc_assert (set != NULL_RTX);
> > +  reg = SET_DEST (set);
> > +  if (GET_CODE (reg) == SUBREG)
> > +    reg = SUBREG_REG (reg);
> > +  if (MEM_P (reg))
> > +    {
> > +      *nregs = 0;
> > +      pressure_class = NO_REGS;
> > +    }
> Don't you need to look at the addresses within the MEM?
> 
> 
> 
> > Index: gcc/config/arm/arm.c
> > ===================================================================
> > --- gcc/config/arm/arm.c	(revision 191816)
> > +++ gcc/config/arm/arm.c	(working copy)
> > @@ -2021,6 +2021,11 @@ arm_option_override (void)
> >         && current_tune->num_prefetch_slots > 0)
> >       flag_prefetch_loop_arrays = 1;
> >
> > +  /* Enable register pressure hoist when optimizing for size on Thumb1
set.
> */
> > +  if (TARGET_THUMB1 && optimize_function_for_size_p (cfun)
> > +      && flag_ira_hoist_pressure == -1)
> > +    flag_ira_hoist_pressure = 1;
> I'd rather see this enabled for all targets when optimizing for size;
> that guarantees more testing.  Even if it doesn't help a particular
> target, as long as it isn't hurting we're better off enabling it.
> 
> I don't see any testsuite additions -- it'd really help long term to
> have some tests for this stuff.
> 
> You should address the issues noted above and repost for final review
> before likely integration.
> 
> 
> 
> jeff
> 

[-- Attachment #2: hoist-reg-pressure-20121011.txt --]
[-- Type: text/plain, Size: 26559 bytes --]

Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	(revision 192194)
+++ gcc/doc/invoke.texi	(working copy)
@@ -372,7 +372,7 @@ Objective-C and Objective-C++ Dialects}.
 -finline-small-functions -fipa-cp -fipa-cp-clone @gol
 -fipa-pta -fipa-profile -fipa-pure-const -fipa-reference @gol
 -fira-algorithm=@var{algorithm} @gol
--fira-region=@var{region} @gol
+-fira-region=@var{region} -fira-hoist-pressure @gol
 -fira-loop-pressure -fno-ira-share-save-slots @gol
 -fno-ira-share-spill-slots -fira-verbose=@var{n} @gol
 -fivopts -fkeep-inline-functions -fkeep-static-consts @gol
@@ -6996,6 +6996,14 @@ This typically results in the smallest code size,
 
 @end table
 
+@item -fira-hoist-pressure
+@opindex fira-hoist-pressure
+Use IRA to evaluate register pressure in the code hoisting pass for
+decisions to hoist expressions.  This option usually results in smaller
+code, but it can slow the compiler down.
+
+This option is enabled at level @option{-Os} for all targets.
+
 @item -fira-loop-pressure
 @opindex fira-loop-pressure
 Use IRA to evaluate register pressure in loops for decisions to move
Index: gcc/testsuite/gcc.dg/hoist-register-pressure.c
===================================================================
--- gcc/testsuite/gcc.dg/hoist-register-pressure.c	(revision 0)
+++ gcc/testsuite/gcc.dg/hoist-register-pressure.c	(revision 0)
@@ -0,0 +1,31 @@
+/* { dg-options "-Os -fdump-rtl-hoist" }  */
+/* { dg-final { scan-rtl-dump "PRE/HOIST: end of bb .* copying expression" "hoist" } } */
+
+#define BUF 100
+int a[BUF];
+
+void com (int);
+void bar (int);
+
+int foo (int x, int y, int z)
+{
+  /* x+y won't be hoisted without defaultly enabled "-fira-hoist-pressure",
+     because its rtx_cost is too small.  */
+  if (z)
+    {
+      a[1] = a[0] + a[2];
+      a[2] = a[1] + a[3];
+      a[3] = a[2] + a[4];
+      a[4] = a[3] + a[5];
+      a[5] = a[4] + a[6];
+      a[6] = a[5] + a[7];
+      a[7] = a[6] + a[8];
+      com (x+y);
+    }
+  else
+    {
+      bar (x+y);
+    }
+
+  return 0;
+}
Index: gcc/haifa-sched.c
===================================================================
--- gcc/haifa-sched.c	(revision 192194)
+++ gcc/haifa-sched.c	(working copy)
@@ -6629,7 +6629,7 @@ sched_init (void)
 	/* We need info about pseudos for rtl dumps about pseudo
 	   classes and costs.  */
 	regstat_init_n_sets_and_refs ();
-      ira_set_pseudo_classes (sched_verbose ? sched_dump : NULL);
+      ira_set_pseudo_classes (true, sched_verbose ? sched_dump : NULL);
       sched_regno_pressure_class
 	= (enum reg_class *) xmalloc (max_regno * sizeof (enum reg_class));
       for (i = 0; i < max_regno; i++)
Index: gcc/regmove.c
===================================================================
--- gcc/regmove.c	(revision 192194)
+++ gcc/regmove.c	(working copy)
@@ -1,6 +1,7 @@
 /* Move registers around to reduce number of move instructions needed.
    Copyright (C) 1987, 1988, 1989, 1992, 1993, 1994, 1995, 1996, 1997, 1998,
-   1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010
+   1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010,
+   2012
    Free Software Foundation, Inc.
 
 This file is part of GCC.
@@ -1237,7 +1238,7 @@ regmove_optimize (void)
   regstat_compute_ri ();
 
   if (flag_ira_loop_pressure)
-    ira_set_pseudo_classes (dump_file);
+    ira_set_pseudo_classes (true, dump_file);
 
   regno_src_regno = XNEWVEC (int, nregs);
   for (i = nregs; --i >= 0; )
Index: gcc/gcse.c
===================================================================
--- gcc/gcse.c	(revision 192194)
+++ gcc/gcse.c	(working copy)
@@ -1,6 +1,6 @@
 /* Partial redundancy elimination / Hoisting for RTL.
    Copyright (C) 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
-   2006, 2007, 2008, 2009, 2010, 2011 Free Software Foundation, Inc.
+   2006, 2007, 2008, 2009, 2010, 2011, 2012 Free Software Foundation, Inc.
 
 This file is part of GCC.
 
@@ -20,9 +20,11 @@ along with GCC; see the file COPYING3.  If not see
 
 /* TODO
    - reordering of memory allocation and freeing to be more space efficient
-   - do rough calc of how many regs are needed in each block, and a rough
-     calc of how many regs are available in each class and use that to
-     throttle back the code in cases where RTX_COST is minimal.
+   - simulate register pressure change of each basic block accurately during
+     hoist process.  But I doubt the benefit since most expressions hoisted
+     are constant or address, which usually won't reduce register pressure.
+   - calc rough register pressure information and use the info to drive all
+     kinds of code motion(including code hoisting) in a unified way.
 */
 
 /* References searched while implementing this.
@@ -141,11 +143,12 @@ along with GCC; see the file COPYING3.  If not see
 #include "diagnostic-core.h"
 #include "toplev.h"
 
+#include "hard-reg-set.h"
 #include "rtl.h"
 #include "tree.h"
 #include "tm_p.h"
 #include "regs.h"
-#include "hard-reg-set.h"
+#include "ira.h"
 #include "flags.h"
 #include "insn-config.h"
 #include "recog.h"
@@ -412,6 +415,22 @@ static bool doing_code_hoisting_p = false;
 /* For available exprs */
 static sbitmap *ae_kill;
 \f
+/* Data stored for each basic block.  */
+struct bb_data
+{
+  /* Maximal register pressure inside basic block for given register class
+     (defined only for the pressure classes).  */
+  int max_reg_pressure[N_REG_CLASSES];
+};
+
+#define BB_DATA(bb) ((struct bb_data *) (bb)->aux)
+
+static basic_block curr_bb;
+
+/* Currently register pressure for each pressure class.  */
+static int curr_reg_pressure[N_REG_CLASSES];
+\f
+
 static void compute_can_copy (void);
 static void *gmalloc (size_t) ATTRIBUTE_MALLOC;
 static void *gcalloc (size_t, size_t) ATTRIBUTE_MALLOC;
@@ -460,9 +479,11 @@ static void alloc_code_hoist_mem (int, int);
 static void free_code_hoist_mem (void);
 static void compute_code_hoist_vbeinout (void);
 static void compute_code_hoist_data (void);
-static int hoist_expr_reaches_here_p (basic_block, int, basic_block, char *,
-				      int, int *);
+static int should_hoist_expr_to_dom (basic_block, struct expr *, basic_block,
+				     sbitmap, int, int *, enum reg_class,
+				     int *, bitmap);
 static int hoist_code (void);
+static enum reg_class get_pressure_class_and_nregs (rtx insn, int *nregs);
 static int one_code_hoisting_pass (void);
 static rtx process_insert_insn (struct expr *);
 static int pre_edge_insert (struct edge_list *, struct expr **);
@@ -1857,7 +1878,7 @@ prune_expressions (bool pre_p)
 	 a basic block we should account for any side-effects of a subsequent
 	 jump instructions that could clobber the expression.  It would
 	 be best to implement this check along the lines of
-	 hoist_expr_reaches_here_p where the target block is already known
+	 should_hoist_expr_to_dom where the target block is already known
 	 and, hence, there's no need to conservatively prune expressions on
 	 "intermediate" set-and-jump instructions.  */
       FOR_EACH_EDGE (e, ei, bb->preds)
@@ -2825,11 +2846,16 @@ compute_code_hoist_data (void)
     fprintf (dump_file, "\n");
 }
 
-/* Determine if the expression identified by EXPR_INDEX would
-   reach BB unimpared if it was placed at the end of EXPR_BB.
-   Stop the search if the expression would need to be moved more
-   than DISTANCE instructions.
+/* Determine if the expression EXPR should be hoisted to EXPR_BB up in the
+   flow graph, given if it can reach BB unimpared.  Stop the search if the
+   expression would need to be moved more than DISTANCE instructions.
 
+   PRESSURE_CLASS and NREGS are register class and number of hard registers
+   for storing EXPR.
+
+   HOISTED_BBS points to a bitmap indicating basic blocks through which
+   EXPR is hoisted.
+
    It's unclear exactly what Muchnick meant by "unimpared".  It seems
    to me that the expression must either be computed or transparent in
    *every* block in the path(s) from EXPR_BB to BB.  Any other definition
@@ -2841,18 +2867,30 @@ compute_code_hoist_data (void)
    paths.  */
 
 static int
-hoist_expr_reaches_here_p (basic_block expr_bb, int expr_index, basic_block bb,
-			   char *visited, int distance, int *bb_size)
+should_hoist_expr_to_dom (basic_block expr_bb, struct expr *expr,
+			  basic_block bb, sbitmap visited, int distance,
+			  int *bb_size, enum reg_class pressure_class,
+			  int *nregs, bitmap hoisted_bbs)
 {
+  unsigned int i;
   edge pred;
   edge_iterator ei;
+  sbitmap_iterator sbi;
   int visited_allocated_locally = 0;
 
   /* Terminate the search if distance, for which EXPR is allowed to move,
      is exhausted.  */
   if (distance > 0)
     {
-      distance -= bb_size[bb->index];
+      /* Let EXPR be hoisted through basic block at no cost if the block
+	 has low register pressure.  An exception is constant expression,
+	 becasue hoisting constant expr aggresively results in worse code
+	 as observed.  */
+      if (!flag_ira_hoist_pressure
+	  || (BB_DATA (bb)->max_reg_pressure[pressure_class]
+		>= ira_class_hard_regs_num[pressure_class]
+	      || CONST_INT_P (expr->expr)))
+	distance -= bb_size[bb->index];
 
       if (distance <= 0)
 	return 0;
@@ -2863,7 +2901,8 @@ static int
   if (visited == NULL)
     {
       visited_allocated_locally = 1;
-      visited = XCNEWVEC (char, last_basic_block);
+      visited = sbitmap_alloc (last_basic_block);
+      sbitmap_zero (visited);
     }
 
   FOR_EACH_EDGE (pred, ei, bb->preds)
@@ -2874,23 +2913,37 @@ static int
 	break;
       else if (pred_bb == expr_bb)
 	continue;
-      else if (visited[pred_bb->index])
+      else if (TEST_BIT (visited, pred_bb->index))
 	continue;
-
-      else if (! TEST_BIT (transp[pred_bb->index], expr_index))
+      else if (! TEST_BIT (transp[pred_bb->index], expr->bitmap_index))
 	break;
-
       /* Not killed.  */
       else
 	{
-	  visited[pred_bb->index] = 1;
-	  if (! hoist_expr_reaches_here_p (expr_bb, expr_index, pred_bb,
-					   visited, distance, bb_size))
+	  SET_BIT (visited, pred_bb->index);
+	  if (! should_hoist_expr_to_dom (expr_bb, expr, pred_bb,
+					  visited, distance, bb_size,
+					  pressure_class, nregs, hoisted_bbs))
 	    break;
 	}
     }
   if (visited_allocated_locally)
-    free (visited);
+    {
+      /* If EXPR can be hoisted to expr_bb, record basic blocks through
+	 which EXPR is hoisted in hoisted_bbs.  Also update register
+	 pressure for basic blocks newly added in hoisted_bbs.  */
+      if (flag_ira_hoist_pressure && !pred)
+	{
+	  EXECUTE_IF_SET_IN_SBITMAP (visited, 0, i, sbi)
+	    if (!bitmap_bit_p (hoisted_bbs, i))
+	      {
+		bitmap_set_bit (hoisted_bbs, i);
+		BB_DATA (BASIC_BLOCK (i))->max_reg_pressure[pressure_class]
+		    += *nregs;
+	      }
+	}
+      sbitmap_free (visited);
+    }
 
   return (pred == NULL);
 }
@@ -2907,8 +2960,44 @@ find_occr_in_bb (struct occr *occr, basic_block bb
   return occr;
 }
 
-/* Actually perform code hoisting.  */
+/* Actually perform code hoisting.
 
+   The code hoisting pass hoists multiple computations of expression
+   to dominator basic block, like from b2/b3 to b1 as depicted below:
+
+          b1      ------
+          /\         |
+         /  \        |
+        bx   by   distance
+       /      \      |
+      /        \     |
+     b2        b3 ------
+
+   Unfortunately code hoisting generally extend live range of output
+   pseudo register, which increases register pressure and hurts register
+   allocation.  To address this issue, an attribute MAX_DISTANCE is
+   computed and attached to each expression.  The attribute is computed
+   from rtx cost of corresponding expression and it's used to control
+   how long the expression can be hoisted up in flow graph.  As expression
+   is hoisted up in flow graph, GCC decreases its DISTANCE and stops the
+   hoist if DISTANCE reaches 0.
+
+   Option "-fira-hoist-pressure" implements register pressure directed
+   hoist based on upper method.  The rationale is:
+     1. Calculate register pressure for each basic block by reusing IRA
+	facility.
+     2. When expression is hoisted through one basic block, GCC checks
+	register pressure of the basic block and decrease DISTANCE only
+	when the register pressure is high.  In other words, expression
+	will be hoisted through basic block with low register pressure
+	at no cost.
+     3. Update register pressure information for basic blocks through
+ 	which expression is hoisted.
+	TODO: It is possible to have register pressure decreased because
+	of shrinked live range of input pseudo register when hoisting an
+	expression.  For now, this effect is not simulated and we just
+	increase register pressure for hoisted expressions.  */
+
 static int
 hoist_code (void)
 {
@@ -2916,12 +3005,18 @@ hoist_code (void)
   VEC (basic_block, heap) *dom_tree_walk;
   unsigned int dom_tree_walk_index;
   VEC (basic_block, heap) *domby;
-  unsigned int i,j;
+  unsigned int i, j, k;
   struct expr **index_map;
   struct expr *expr;
   int *to_bb_head;
   int *bb_size;
   int changed = 0;
+  struct bb_data *data;
+  /* Basic blocks that have occurrences reachable from BB.  */
+  bitmap from_bbs;
+  /* Basic blocks through which expr is hoisted.  */
+  bitmap hoisted_bbs = NULL;
+  bitmap_iterator bi;
 
   /* Compute a mapping from expression number (`bitmap_index') to
      hash table entry.  */
@@ -2959,6 +3054,10 @@ hoist_code (void)
 	      && (EDGE_SUCC (ENTRY_BLOCK_PTR, 0)->dest
 		  == ENTRY_BLOCK_PTR->next_bb));
 
+  from_bbs = BITMAP_ALLOC (NULL);
+  if (flag_ira_hoist_pressure)
+    hoisted_bbs = BITMAP_ALLOC (NULL);
+
   dom_tree_walk = get_all_dominated_blocks (CDI_DOMINATORS,
 					    ENTRY_BLOCK_PTR->next_bb);
 
@@ -2977,12 +3076,12 @@ hoist_code (void)
 	{
 	  if (TEST_BIT (hoist_vbeout[bb->index], i))
 	    {
+	      int nregs = 0;
+	      enum reg_class pressure_class = NO_REGS;
 	      /* Current expression.  */
 	      struct expr *expr = index_map[i];
 	      /* Number of occurrences of EXPR that can be hoisted to BB.  */
 	      int hoistable = 0;
-	      /* Basic blocks that have occurrences reachable from BB.  */
-	      bitmap_head _from_bbs, *from_bbs = &_from_bbs;
 	      /* Occurrences reachable from BB.  */
 	      VEC (occr_t, heap) *occrs_to_hoist = NULL;
 	      /* We want to insert the expression into BB only once, so
@@ -2990,8 +3089,6 @@ hoist_code (void)
 	      int insn_inserted_p;
 	      occr_t occr;
 
-	      bitmap_initialize (from_bbs, 0);
-
 	      /* If an expression is computed in BB and is available at end of
 		 BB, hoist all occurrences dominated by BB to BB.  */
 	      if (TEST_BIT (comp[bb->index], i))
@@ -3045,13 +3142,18 @@ hoist_code (void)
 		    max_distance += (bb_size[dominated->index]
 				     - to_bb_head[INSN_UID (occr->insn)]);
 
-		  /* Note if the expression would reach the dominated block
-		     unimpared if it was placed at the end of BB.
+		  pressure_class = get_pressure_class_and_nregs (occr->insn,
+								 &nregs);
 
+		  /* Note if the expression should be hoisted from the dominated
+		     block to BB if it can reach DOMINATED unimpared.
+
 		     Keep track of how many times this expression is hoistable
 		     from a dominated block into BB.  */
-		  if (hoist_expr_reaches_here_p (bb, i, dominated, NULL,
-						 max_distance, bb_size))
+		  if (should_hoist_expr_to_dom (bb, expr, dominated, NULL,
+						max_distance, bb_size,
+						pressure_class,	&nregs,
+						hoisted_bbs))
 		    {
 		      hoistable++;
 		      VEC_safe_push (occr_t, heap,
@@ -3072,6 +3174,13 @@ hoist_code (void)
 		 to nullify any benefit we get from code hoisting.  */
 	      if (hoistable > 1 && dbg_cnt (hoist_insn))
 		{
+		  /* Update register pressure for basic block to which expr
+		     is hoisted.  */
+		  if (flag_ira_hoist_pressure)
+		    {
+		      data = BB_DATA (bb);
+		      data->max_reg_pressure[pressure_class] += nregs;
+		    }
 		  /* If (hoistable != VEC_length), then there is
 		     an occurrence of EXPR in BB itself.  Don't waste
 		     time looking for LCA in this case.  */
@@ -3089,8 +3198,20 @@ hoist_code (void)
 		    }
 		}
 	      else
-		/* Punt, no point hoisting a single occurence.  */
-		VEC_free (occr_t, heap, occrs_to_hoist);
+		{
+		  /* Punt, no point hoisting a single occurence.  */
+		  VEC_free (occr_t, heap, occrs_to_hoist);
+		  /* Restore register pressure of basic block recorded in
+ 		     hoisted_bbs when expr will not be hoisted.  */
+		  if (flag_ira_hoist_pressure)
+		    EXECUTE_IF_SET_IN_BITMAP (hoisted_bbs, 0, k, bi)
+		      {
+			data = BB_DATA (BASIC_BLOCK (k));
+			data->max_reg_pressure[pressure_class] -= nregs;
+		      }
+		}
+	      if (flag_ira_hoist_pressure)
+		bitmap_clear (hoisted_bbs);
 
 	      insn_inserted_p = 0;
 
@@ -3140,6 +3261,10 @@ hoist_code (void)
     }
 
   VEC_free (basic_block, heap, dom_tree_walk);
+  BITMAP_FREE (from_bbs);
+  if (flag_ira_hoist_pressure)
+    BITMAP_FREE (hoisted_bbs);
+
   free (bb_size);
   free (to_bb_head);
   free (index_map);
@@ -3147,6 +3272,165 @@ hoist_code (void)
   return changed;
 }
 
+/* Return pressure class and number of needed hard registers (through
+   *NREGS) of register REGNO.  */
+static enum reg_class
+get_regno_pressure_class (int regno, int *nregs)
+{
+  if (regno >= FIRST_PSEUDO_REGISTER)
+    {
+      enum reg_class pressure_class;
+
+      pressure_class = reg_allocno_class (regno);
+      pressure_class = ira_pressure_class_translate[pressure_class];
+      *nregs
+	= ira_reg_class_max_nregs[pressure_class][PSEUDO_REGNO_MODE (regno)];
+      return pressure_class;
+    }
+  else if (! TEST_HARD_REG_BIT (ira_no_alloc_regs, regno)
+	   && ! TEST_HARD_REG_BIT (eliminable_regset, regno))
+    {
+      *nregs = 1;
+      return ira_pressure_class_translate[REGNO_REG_CLASS (regno)];
+    }
+  else
+    {
+      *nregs = 0;
+      return NO_REGS;
+    }
+}
+
+/* Return pressure class and number of hard registers (through *NREGS)
+   for destination of INSN. */
+static enum reg_class
+get_pressure_class_and_nregs (rtx insn, int *nregs)
+{
+  rtx reg;
+  enum reg_class pressure_class;
+  rtx set = single_set (insn);
+
+  /* Considered invariant insns have only one set.  */
+  gcc_assert (set != NULL_RTX);
+  reg = SET_DEST (set);
+  if (GET_CODE (reg) == SUBREG)
+    reg = SUBREG_REG (reg);
+  if (MEM_P (reg))
+    {
+      *nregs = 0;
+      pressure_class = NO_REGS;
+    }
+  else
+    {
+      gcc_assert (REG_P (reg));
+      pressure_class = reg_allocno_class (REGNO (reg));
+      pressure_class = ira_pressure_class_translate[pressure_class];
+      *nregs
+	= ira_reg_class_max_nregs[pressure_class][GET_MODE (SET_SRC (set))];
+    }
+  return pressure_class;
+}
+
+/* Increase (if INCR_P) or decrease current register pressure for
+   register REGNO.  */
+static void
+change_pressure (int regno, bool incr_p)
+{
+  int nregs;
+  enum reg_class pressure_class;
+
+  pressure_class = get_regno_pressure_class (regno, &nregs);
+  if (! incr_p)
+    curr_reg_pressure[pressure_class] -= nregs;
+  else
+    {
+      curr_reg_pressure[pressure_class] += nregs;
+      if (BB_DATA (curr_bb)->max_reg_pressure[pressure_class]
+	  < curr_reg_pressure[pressure_class])
+	BB_DATA (curr_bb)->max_reg_pressure[pressure_class]
+	  = curr_reg_pressure[pressure_class];
+    }
+}
+
+/* Calculate register pressure for each basic block by walking insns
+   from last to first.  */
+static void
+calculate_bb_reg_pressure (void)
+{
+  int i;
+  unsigned int j;
+  rtx insn;
+  basic_block bb;
+  bitmap curr_regs_live;
+  bitmap_iterator bi;
+
+
+  ira_setup_eliminable_regset ();
+  curr_regs_live = BITMAP_ALLOC (&reg_obstack);
+  FOR_EACH_BB (bb)
+    {
+      curr_bb = bb;
+      bitmap_copy (curr_regs_live, DF_LR_OUT (bb));
+      for (i = 0; i < ira_pressure_classes_num; i++)
+	curr_reg_pressure[ira_pressure_classes[i]] = 0;
+      EXECUTE_IF_SET_IN_BITMAP (curr_regs_live, 0, j, bi)
+	change_pressure (j, true);
+
+      FOR_BB_INSNS_REVERSE (bb, insn)
+	{
+	  rtx dreg;
+	  int regno;
+	  df_ref *def_rec, *use_rec;
+
+	  if (! NONDEBUG_INSN_P (insn))
+	    continue;
+
+	  for (def_rec = DF_INSN_DEFS (insn); *def_rec; def_rec++)
+	    {
+	      dreg = DF_REF_REAL_REG (*def_rec);
+	      gcc_assert (REG_P (dreg));
+	      regno = REGNO (dreg);
+	      if (!(DF_REF_FLAGS (*def_rec) 
+		    & (DF_REF_PARTIAL | DF_REF_CONDITIONAL)))
+		{
+		  if (bitmap_clear_bit (curr_regs_live, regno))
+		    change_pressure (regno, false);
+		}
+	    }
+
+	  for (use_rec = DF_INSN_USES (insn); *use_rec; use_rec++)
+	    {
+	      dreg = DF_REF_REAL_REG (*use_rec);
+	      gcc_assert (REG_P (dreg));
+	      regno = REGNO (dreg);
+	      if (bitmap_set_bit (curr_regs_live, regno))
+		change_pressure (regno, true);
+	    }
+	}
+    }
+  BITMAP_FREE (curr_regs_live);
+
+  if (dump_file == NULL)
+    return;
+
+  fprintf (dump_file, "\nRegister Pressure: \n");
+  FOR_EACH_BB (bb)
+    {
+      fprintf (dump_file, "  Basic block %d: \n", bb->index);
+      for (i = 0; (int) i < ira_pressure_classes_num; i++)
+	{
+	  enum reg_class pressure_class;
+
+	  pressure_class = ira_pressure_classes[i];
+	  if (BB_DATA (bb)->max_reg_pressure[pressure_class] == 0)
+	    continue;
+
+	  fprintf (dump_file, "    %s=%d\n", reg_class_names[pressure_class],
+		   BB_DATA (bb)->max_reg_pressure[pressure_class]);
+	}
+    }
+  fprintf (dump_file, "\n");
+}
+
 /* Top level routine to perform one code hoisting (aka unification) pass
 
    Return nonzero if a change was made.  */
@@ -3166,6 +3450,16 @@ one_code_hoisting_pass (void)
 
   doing_code_hoisting_p = true;
 
+  /* Calculate register pressure for each basic block.  */
+  if (flag_ira_hoist_pressure)
+    {
+      regstat_init_n_sets_and_refs ();
+      ira_set_pseudo_classes (false, dump_file);
+      alloc_aux_for_blocks (sizeof (struct bb_data));
+      calculate_bb_reg_pressure ();
+      regstat_free_n_sets_and_refs ();
+    }
+
   /* We need alias.  */
   init_alias_analysis ();
 
@@ -3186,6 +3480,11 @@ one_code_hoisting_pass (void)
       free_code_hoist_mem ();
     }
 
+  if (flag_ira_hoist_pressure)
+    {
+      free_aux_for_blocks ();
+      free_reg_info ();
+    }
   free_hash_table (&expr_hash_table);
   free_gcse_mem ();
   obstack_free (&gcse_obstack, NULL);
Index: gcc/loop-invariant.c
===================================================================
--- gcc/loop-invariant.c	(revision 192194)
+++ gcc/loop-invariant.c	(working copy)
@@ -1,5 +1,5 @@
 /* RTL-level loop invariant motion.
-   Copyright (C) 2004, 2005, 2006, 2007, 2008, 2009, 2010
+   Copyright (C) 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2012
    Free Software Foundation, Inc.
 
 This file is part of GCC.
@@ -1915,7 +1915,7 @@ move_loop_invariants (void)
     {
       df_analyze ();
       regstat_init_n_sets_and_refs ();
-      ira_set_pseudo_classes (dump_file);
+      ira_set_pseudo_classes (true, dump_file);
       calculate_loop_reg_pressure ();
       regstat_free_n_sets_and_refs ();
     }
Index: gcc/common.opt
===================================================================
--- gcc/common.opt	(revision 192194)
+++ gcc/common.opt	(working copy)
@@ -1392,6 +1392,11 @@ Enum(ira_region) String(all) Value(IRA_REGION_ALL)
 EnumValue
 Enum(ira_region) String(mixed) Value(IRA_REGION_MIXED)
 
+fira-hoist-pressure
+Common Report Var(flag_ira_hoist_pressure) Init(1) Optimization
+Use IRA based register pressure calculation
+in RTL hoist optimizations.
+
 fira-loop-pressure
 Common Report Var(flag_ira_loop_pressure)
 Use IRA based register pressure calculation
Index: gcc/ira.c
===================================================================
--- gcc/ira.c	(revision 192194)
+++ gcc/ira.c	(working copy)
@@ -4183,7 +4183,7 @@ ira (FILE *f)
   crtl->is_leaf = leaf_function_p ();
 
   if (resize_reg_info () && flag_ira_loop_pressure)
-    ira_set_pseudo_classes (ira_dump_file);
+    ira_set_pseudo_classes (true, ira_dump_file);
 
   rebuild_p = update_equiv_regs ();
 
Index: gcc/ira.h
===================================================================
--- gcc/ira.h	(revision 192194)
+++ gcc/ira.h	(working copy)
@@ -1,6 +1,6 @@
 /* Communication between the Integrated Register Allocator (IRA) and
    the rest of the compiler.
-   Copyright (C) 2006, 2007, 2008, 2009, 2010
+   Copyright (C) 2006, 2007, 2008, 2009, 2010, 2012
    Free Software Foundation, Inc.
    Contributed by Vladimir Makarov <vmakarov@redhat.com>.
 
@@ -131,7 +131,7 @@ extern void ira_init (void);
 extern void ira_finish_once (void);
 extern void ira_setup_eliminable_regset (void);
 extern rtx ira_eliminate_regs (rtx, enum machine_mode);
-extern void ira_set_pseudo_classes (FILE *);
+extern void ira_set_pseudo_classes (bool, FILE *);
 extern void ira_implicitly_set_insn_hard_regs (HARD_REG_SET *);
 
 extern void ira_sort_regnos_for_alter_reg (int *, int, unsigned int *);
Index: gcc/ira-costs.c
===================================================================
--- gcc/ira-costs.c	(revision 192194)
+++ gcc/ira-costs.c	(working copy)
@@ -2048,9 +2048,10 @@ ira_costs (void)
   ira_free (total_allocno_costs);
 }
 
-/* Entry function which defines classes for pseudos.  */
+/* Entry function which defines classes for pseudos.
+   Set pseudo_classes_defined_p only if DEFINE_PSEUDO_CLASSES is true.  */
 void
-ira_set_pseudo_classes (FILE *dump_file)
+ira_set_pseudo_classes (bool define_pseudo_classes, FILE *dump_file)
 {
   allocno_p = false;
   internal_flag_ira_verbose = flag_ira_verbose;
@@ -2059,7 +2060,9 @@ void
   initiate_regno_cost_classes ();
   find_costs_and_classes (dump_file);
   finish_regno_cost_classes ();
-  pseudo_classes_defined_p = true;
+  if (define_pseudo_classes)
+    pseudo_classes_defined_p = true;
+
   finish_costs ();
 }
 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFA] Implement register pressure directed hoist pass
       [not found]       ` <50766d69.a853420a.2475.fffffbafSMTPIN_ADDED@mx.google.com>
@ 2012-10-11 23:19         ` Steven Bosscher
  2012-10-12  3:34           ` Bin Cheng
  2012-10-12  8:20           ` Bin Cheng
  0 siblings, 2 replies; 16+ messages in thread
From: Steven Bosscher @ 2012-10-11 23:19 UTC (permalink / raw)
  To: Bin Cheng; +Cc: Jeff Law, gcc-patches

On Thu, Oct 11, 2012 at 8:50 AM, Bin Cheng wrote:

> +  /* x+y won't be hoisted without defaultly enabled "-fira-hoist-pressure",

defaulty comment.

> +     kinds of code motion(including code hoisting) in a unified way.

needs space between "motion" and "(".

> +   flow graph, given if it can reach BB unimpared.  Stop the search if the

s/given if/if/


> +should_hoist_expr_to_dom (basic_block expr_bb, struct expr *expr,
> +			  basic_block bb, sbitmap visited, int distance,
> +			  int *bb_size, enum reg_class pressure_class,
> +			  int *nregs, bitmap hoisted_bbs)

At least BB_SIZE and DISTANCE are not documented (also not before your
patch). While there, can you document these please?


> +	 becasue hoisting constant expr aggresively results in worse code
> +	 as observed.  */

s/becasue/because/
s/aggresively/aggressively/

Not sure what you mean with "as observed". Observed where/how?

> -      visited = XCNEWVEC (char, last_basic_block);
> +      visited = sbitmap_alloc (last_basic_block);
> +      sbitmap_zero (visited);

Send a separate patch for this change please. Really. Reviewing
patches is much easier if you post things one change at a time.


> +   The code hoisting pass hoists multiple computations of expression
> +   to dominator basic block, like from b2/b3 to b1 as depicted below:

"The code hoisting pass can hoist multiple computations of the same
expression along dominated paths to a dominating basic block, like
from ..."


> +   Unfortunately code hoisting generally extend live range of output

s/extend/extends the/
s/of output/of an output/


> +   from rtx cost of corresponding expression and it's used to control

s/of corresponding/of the corresponding/
Similarly elsewhere in comments.


> +	of shrinked live range of input pseudo register when hoisting an

s/range/ranges/
s/register/registers/
After all this is only possible, AFAICT, if there's more than one
input register.

I'll leave all the algorithmic stuff to Jeff...

Ciao!
Steven

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH RFA] Implement register pressure directed hoist pass
  2012-10-11 23:19         ` Steven Bosscher
@ 2012-10-12  3:34           ` Bin Cheng
  2012-10-12  8:20           ` Bin Cheng
  1 sibling, 0 replies; 16+ messages in thread
From: Bin Cheng @ 2012-10-12  3:34 UTC (permalink / raw)
  To: 'Steven Bosscher'; +Cc: Jeff Law, gcc-patches



> -----Original Message-----
> From: Steven Bosscher [mailto:stevenb.gcc@gmail.com]
> Sent: Friday, October 12, 2012 7:04 AM
> To: Bin Cheng
> Cc: Jeff Law; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH RFA] Implement register pressure directed hoist pass
> 
> On Thu, Oct 11, 2012 at 8:50 AM, Bin Cheng wrote:
> 
> > +  /* x+y won't be hoisted without defaultly enabled
> > + "-fira-hoist-pressure",
> 
> defaulty comment.
> 
> > +     kinds of code motion(including code hoisting) in a unified way.
> 
> needs space between "motion" and "(".
> 
> > +   flow graph, given if it can reach BB unimpared.  Stop the search
> > + if the
> 
> s/given if/if/
> 
> 
> > +should_hoist_expr_to_dom (basic_block expr_bb, struct expr *expr,
> > +			  basic_block bb, sbitmap visited, int distance,
> > +			  int *bb_size, enum reg_class pressure_class,
> > +			  int *nregs, bitmap hoisted_bbs)
> 
> At least BB_SIZE and DISTANCE are not documented (also not before your
patch).
> While there, can you document these please?
> 
> 
> > +	 becasue hoisting constant expr aggresively results in worse code
> > +	 as observed.  */
> 
> s/becasue/because/
> s/aggresively/aggressively/
> 
> Not sure what you mean with "as observed". Observed where/how?
> 
> > -      visited = XCNEWVEC (char, last_basic_block);
> > +      visited = sbitmap_alloc (last_basic_block);
> > +      sbitmap_zero (visited);
> 
> Send a separate patch for this change please. Really. Reviewing patches is
> much easier if you post things one change at a time.
> 
> 
> > +   The code hoisting pass hoists multiple computations of expression
> > +   to dominator basic block, like from b2/b3 to b1 as depicted below:
> 
> "The code hoisting pass can hoist multiple computations of the same
expression
> along dominated paths to a dominating basic block, like from ..."
> 
> 
> > +   Unfortunately code hoisting generally extend live range of output
> 
> s/extend/extends the/
> s/of output/of an output/
> 
> 
> > +   from rtx cost of corresponding expression and it's used to control
> 
> s/of corresponding/of the corresponding/ Similarly elsewhere in comments.
> 
> 
> > +	of shrinked live range of input pseudo register when hoisting an
> 
> s/range/ranges/
> s/register/registers/
> After all this is only possible, AFAICT, if there's more than one input
> register.
> 
> I'll leave all the algorithmic stuff to Jeff...

Very sorry I didn't realize there are so many language errors. Since English
is not my native language, I will pay more attention to comments in the
future.

Thanks very much for your patient.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH RFA] Implement register pressure directed hoist pass
  2012-10-11 23:19         ` Steven Bosscher
  2012-10-12  3:34           ` Bin Cheng
@ 2012-10-12  8:20           ` Bin Cheng
  2012-10-16  8:50             ` Bin Cheng
  1 sibling, 1 reply; 16+ messages in thread
From: Bin Cheng @ 2012-10-12  8:20 UTC (permalink / raw)
  To: 'Steven Bosscher'; +Cc: Jeff Law, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1971 bytes --]

Hi,
This is the updated patches split from original one according to Steven's
suggestion. Also fixed spelling errors.

Apart from this, I also implemented a draft patch simulating register
pressure accurately during hoisting, unfortunately the size data isn't
better than this patch. If it's right, this can prove my previous
observation that decrease of register pressure during hoisting process is
rare. I will continue investigating the correctness of the patch and see
what I can get.

Please review. Thanks

And the ChangeLog:

2012-10-12  Bin Cheng  <bin.cheng@arm.com>

	* gcse.c: Update copyright dates.
	(hoist_expr_reaches_here_p): Change parameter type from char *
	to sbitmap.

2012-10-12  Bin Cheng  <bin.cheng@arm.com>

	* common.opt (flag_ira_hoist_pressure): New.
	* doc/invoke.texi (-fira-hoist-pressure): Describe.
	* ira-costs.c (ira_set_pseudo_classes): New parameter.
	* ira.h: Update copyright dates.
	(ira_set_pseudo_classes): Update prototype.
	* haifa-sched.c (sched_init): Update call.
	* ira.c (ira): Update call.
	* regmove.c: Update copyright dates.
	(regmove_optimize): Update call.
	* loop-invariant.c: Update copyright dates.
	(move_loop_invariants): Update call.
	* gcse.c: (struct bb_data): New structure.
	(BB_DATA): New macro.
	(curr_bb, curr_reg_pressure): New static variables.
	(should_hoist_expr_to_dom): Rename from hoist_expr_reaches_here_p.
	Change parameter expr_index to expr.
	New parameters pressure_class, nregs and hoisted_bbs.
	Use reg pressure to determine the distance expr can be hoisted.
	(hoist_code): Use reg pressure to direct the hoist process.
	(get_regno_pressure_class, get_pressure_class_and_nregs)
	(change_pressure, calculate_bb_reg_pressure): New.
	(one_code_hoisting_pass): Calculate register pressure. Allocate
	and free data.

gcc/testsuite/ChangeLog
2012-10-12  Bin Cheng  <bin.cheng@arm.com>

	* testsuite/gcc.dg/hoist-register-pressure.c: New test.

[-- Attachment #2: hoist-reg-pressure-20121012-001.txt --]
[-- Type: text/plain, Size: 2337 bytes --]

Index: gcc/gcse.c
===================================================================
--- gcc/gcse.c	(revision 192194)
+++ gcc/gcse.c	(working copy)
@@ -1,6 +1,6 @@
 /* Partial redundancy elimination / Hoisting for RTL.
    Copyright (C) 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
-   2006, 2007, 2008, 2009, 2010, 2011 Free Software Foundation, Inc.
+   2006, 2007, 2008, 2009, 2010, 2011, 2012 Free Software Foundation, Inc.
 
 This file is part of GCC.
 
@@ -460,7 +460,7 @@ static void alloc_code_hoist_mem (int, int);
 static void free_code_hoist_mem (void);
 static void compute_code_hoist_vbeinout (void);
 static void compute_code_hoist_data (void);
-static int hoist_expr_reaches_here_p (basic_block, int, basic_block, char *,
+static int hoist_expr_reaches_here_p (basic_block, int, basic_block, sbitmap,
 				      int, int *);
 static int hoist_code (void);
 static int one_code_hoisting_pass (void);
@@ -2842,7 +2842,7 @@ compute_code_hoist_data (void)
 
 static int
 hoist_expr_reaches_here_p (basic_block expr_bb, int expr_index, basic_block bb,
-			   char *visited, int distance, int *bb_size)
+			   sbitmap visited, int distance, int *bb_size)
 {
   edge pred;
   edge_iterator ei;
@@ -2863,7 +2863,8 @@ hoist_expr_reaches_here_p (basic_block expr_bb, in
   if (visited == NULL)
     {
       visited_allocated_locally = 1;
-      visited = XCNEWVEC (char, last_basic_block);
+      visited = sbitmap_alloc (last_basic_block);
+      sbitmap_zero (visited);
     }
 
   FOR_EACH_EDGE (pred, ei, bb->preds)
@@ -2874,7 +2875,7 @@ hoist_expr_reaches_here_p (basic_block expr_bb, in
 	break;
       else if (pred_bb == expr_bb)
 	continue;
-      else if (visited[pred_bb->index])
+      else if (TEST_BIT (visited, pred_bb->index))
 	continue;
 
       else if (! TEST_BIT (transp[pred_bb->index], expr_index))
@@ -2883,14 +2884,14 @@ hoist_expr_reaches_here_p (basic_block expr_bb, in
       /* Not killed.  */
       else
 	{
-	  visited[pred_bb->index] = 1;
+	  SET_BIT (visited, pred_bb->index);
 	  if (! hoist_expr_reaches_here_p (expr_bb, expr_index, pred_bb,
 					   visited, distance, bb_size))
 	    break;
 	}
     }
   if (visited_allocated_locally)
-    free (visited);
+    sbitmap_free (visited);
 
   return (pred == NULL);
 }

[-- Attachment #3: hoist-reg-pressure-20121012-002.txt --]
[-- Type: text/plain, Size: 33304 bytes --]

diff -x .svn -rpN wc1/gcc/common.opt wc2/gcc/common.opt
*** wc1/gcc/common.opt	2012-10-12 15:13:41.679617846 +0800
--- wc2/gcc/common.opt	2012-10-12 15:12:15.614174292 +0800
*************** Enum(ira_region) String(all) Value(IRA_R
*** 1392,1397 ****
--- 1392,1402 ----
  EnumValue
  Enum(ira_region) String(mixed) Value(IRA_REGION_MIXED)
  
+ fira-hoist-pressure
+ Common Report Var(flag_ira_hoist_pressure) Init(1) Optimization
+ Use IRA based register pressure calculation
+ in RTL hoist optimizations.
+ 
  fira-loop-pressure
  Common Report Var(flag_ira_loop_pressure)
  Use IRA based register pressure calculation
diff -x .svn -rpN wc1/gcc/doc/invoke.texi wc2/gcc/doc/invoke.texi
*** wc1/gcc/doc/invoke.texi	2012-10-12 15:13:40.282479595 +0800
--- wc2/gcc/doc/invoke.texi	2012-10-12 15:12:15.622110427 +0800
*************** Objective-C and Objective-C++ Dialects}.
*** 372,378 ****
  -finline-small-functions -fipa-cp -fipa-cp-clone @gol
  -fipa-pta -fipa-profile -fipa-pure-const -fipa-reference @gol
  -fira-algorithm=@var{algorithm} @gol
! -fira-region=@var{region} @gol
  -fira-loop-pressure -fno-ira-share-save-slots @gol
  -fno-ira-share-spill-slots -fira-verbose=@var{n} @gol
  -fivopts -fkeep-inline-functions -fkeep-static-consts @gol
--- 372,378 ----
  -finline-small-functions -fipa-cp -fipa-cp-clone @gol
  -fipa-pta -fipa-profile -fipa-pure-const -fipa-reference @gol
  -fira-algorithm=@var{algorithm} @gol
! -fira-region=@var{region} -fira-hoist-pressure @gol
  -fira-loop-pressure -fno-ira-share-save-slots @gol
  -fno-ira-share-spill-slots -fira-verbose=@var{n} @gol
  -fivopts -fkeep-inline-functions -fkeep-static-consts @gol
*************** This typically results in the smallest c
*** 6996,7001 ****
--- 6996,7009 ----
  
  @end table
  
+ @item -fira-hoist-pressure
+ @opindex fira-hoist-pressure
+ Use IRA to evaluate register pressure in the code hoisting pass for
+ decisions to hoist expressions.  This option usually results in smaller
+ code, but it can slow the compiler down.
+ 
+ This option is enabled at level @option{-Os} for all targets.
+ 
  @item -fira-loop-pressure
  @opindex fira-loop-pressure
  Use IRA to evaluate register pressure in loops for decisions to move
diff -x .svn -rpN wc1/gcc/gcse.c wc2/gcc/gcse.c
*** wc1/gcc/gcse.c	2012-10-12 15:45:12.822998377 +0800
--- wc2/gcc/gcse.c	2012-10-12 15:31:45.062189749 +0800
*************** along with GCC; see the file COPYING3.  
*** 20,28 ****
  
  /* TODO
     - reordering of memory allocation and freeing to be more space efficient
!    - do rough calc of how many regs are needed in each block, and a rough
!      calc of how many regs are available in each class and use that to
!      throttle back the code in cases where RTX_COST is minimal.
  */
  
  /* References searched while implementing this.
--- 20,30 ----
  
  /* TODO
     - reordering of memory allocation and freeing to be more space efficient
!    - simulate register pressure change of each basic block accurately during
!      hoist process.  But I doubt the benefit since most expressions hoisted
!      are constant or address, which usually won't reduce register pressure.
!    - calc rough register pressure information and use the info to drive all
!      kinds of code motion (including code hoisting) in a unified way.
  */
  
  /* References searched while implementing this.
*************** along with GCC; see the file COPYING3.  
*** 141,151 ****
  #include "diagnostic-core.h"
  #include "toplev.h"
  
  #include "rtl.h"
  #include "tree.h"
  #include "tm_p.h"
  #include "regs.h"
! #include "hard-reg-set.h"
  #include "flags.h"
  #include "insn-config.h"
  #include "recog.h"
--- 143,154 ----
  #include "diagnostic-core.h"
  #include "toplev.h"
  
+ #include "hard-reg-set.h"
  #include "rtl.h"
  #include "tree.h"
  #include "tm_p.h"
  #include "regs.h"
! #include "ira.h"
  #include "flags.h"
  #include "insn-config.h"
  #include "recog.h"
*************** static bool doing_code_hoisting_p = fals
*** 412,417 ****
--- 415,436 ----
  /* For available exprs */
  static sbitmap *ae_kill;
  \f
+ /* Data stored for each basic block.  */
+ struct bb_data
+ {
+   /* Maximal register pressure inside basic block for given register class
+      (defined only for the pressure classes).  */
+   int max_reg_pressure[N_REG_CLASSES];
+ };
+ 
+ #define BB_DATA(bb) ((struct bb_data *) (bb)->aux)
+ 
+ static basic_block curr_bb;
+ 
+ /* Currently register pressure for each pressure class.  */
+ static int curr_reg_pressure[N_REG_CLASSES];
+ \f
+ 
  static void compute_can_copy (void);
  static void *gmalloc (size_t) ATTRIBUTE_MALLOC;
  static void *gcalloc (size_t, size_t) ATTRIBUTE_MALLOC;
*************** static void alloc_code_hoist_mem (int, i
*** 460,468 ****
  static void free_code_hoist_mem (void);
  static void compute_code_hoist_vbeinout (void);
  static void compute_code_hoist_data (void);
! static int hoist_expr_reaches_here_p (basic_block, int, basic_block, sbitmap,
! 				      int, int *);
  static int hoist_code (void);
  static int one_code_hoisting_pass (void);
  static rtx process_insert_insn (struct expr *);
  static int pre_edge_insert (struct edge_list *, struct expr **);
--- 479,489 ----
  static void free_code_hoist_mem (void);
  static void compute_code_hoist_vbeinout (void);
  static void compute_code_hoist_data (void);
! static int should_hoist_expr_to_dom (basic_block, struct expr *, basic_block,
! 				     sbitmap, int, int *, enum reg_class,
! 				     int *, bitmap);
  static int hoist_code (void);
+ static enum reg_class get_pressure_class_and_nregs (rtx insn, int *nregs);
  static int one_code_hoisting_pass (void);
  static rtx process_insert_insn (struct expr *);
  static int pre_edge_insert (struct edge_list *, struct expr **);
*************** prune_expressions (bool pre_p)
*** 1857,1863 ****
  	 a basic block we should account for any side-effects of a subsequent
  	 jump instructions that could clobber the expression.  It would
  	 be best to implement this check along the lines of
! 	 hoist_expr_reaches_here_p where the target block is already known
  	 and, hence, there's no need to conservatively prune expressions on
  	 "intermediate" set-and-jump instructions.  */
        FOR_EACH_EDGE (e, ei, bb->preds)
--- 1878,1884 ----
  	 a basic block we should account for any side-effects of a subsequent
  	 jump instructions that could clobber the expression.  It would
  	 be best to implement this check along the lines of
! 	 should_hoist_expr_to_dom where the target block is already known
  	 and, hence, there's no need to conservatively prune expressions on
  	 "intermediate" set-and-jump instructions.  */
        FOR_EACH_EDGE (e, ei, bb->preds)
*************** compute_code_hoist_data (void)
*** 2825,2834 ****
      fprintf (dump_file, "\n");
  }
  
! /* Determine if the expression identified by EXPR_INDEX would
!    reach BB unimpared if it was placed at the end of EXPR_BB.
!    Stop the search if the expression would need to be moved more
!    than DISTANCE instructions.
  
     It's unclear exactly what Muchnick meant by "unimpared".  It seems
     to me that the expression must either be computed or transparent in
--- 2846,2866 ----
      fprintf (dump_file, "\n");
  }
  
! /* Determine if the expression EXPR should be hoisted to EXPR_BB up in
!    flow graph, if it can reach BB unimpared.  Stop the search if the
!    expression would need to be moved more than DISTANCE instructions.
! 
!    DISTANCE is the number of instructions through which EXPR can be
!    hoisted up in flow graph.
! 
!    BB_SIZE points to an array which contains the number of instructions
!    for each basic block.
! 
!    PRESSURE_CLASS and NREGS are register class and number of hard registers
!    for storing EXPR.
! 
!    HOISTED_BBS points to a bitmap indicating basic blocks through which
!    EXPR is hoisted.
  
     It's unclear exactly what Muchnick meant by "unimpared".  It seems
     to me that the expression must either be computed or transparent in
*************** compute_code_hoist_data (void)
*** 2841,2858 ****
     paths.  */
  
  static int
! hoist_expr_reaches_here_p (basic_block expr_bb, int expr_index, basic_block bb,
! 			   sbitmap visited, int distance, int *bb_size)
  {
    edge pred;
    edge_iterator ei;
    int visited_allocated_locally = 0;
  
    /* Terminate the search if distance, for which EXPR is allowed to move,
       is exhausted.  */
    if (distance > 0)
      {
!       distance -= bb_size[bb->index];
  
        if (distance <= 0)
  	return 0;
--- 2873,2904 ----
     paths.  */
  
  static int
! should_hoist_expr_to_dom (basic_block expr_bb, struct expr *expr,
! 			  basic_block bb, sbitmap visited, int distance,
! 			  int *bb_size, enum reg_class pressure_class,
! 			  int *nregs, bitmap hoisted_bbs)
  {
+   unsigned int i;
    edge pred;
    edge_iterator ei;
+   sbitmap_iterator sbi;
    int visited_allocated_locally = 0;
  
    /* Terminate the search if distance, for which EXPR is allowed to move,
       is exhausted.  */
    if (distance > 0)
      {
!       /* Let EXPR be hoisted through basic block at no cost if the block
! 	 has low register pressure.  An exception is constant expression,
! 	 because hoisting constant expr aggressively results in worse code.
! 	 The exception is made by the observation of CSiBE on ARM target,
! 	 while it has no obvious effect on other targets like x86, x86_64,
! 	 mips and powerpc.  */
!       if (!flag_ira_hoist_pressure
! 	  || (BB_DATA (bb)->max_reg_pressure[pressure_class]
! 		>= ira_class_hard_regs_num[pressure_class]
! 	      || CONST_INT_P (expr->expr)))
! 	distance -= bb_size[bb->index];
  
        if (distance <= 0)
  	return 0;
*************** hoist_expr_reaches_here_p (basic_block e
*** 2877,2897 ****
  	continue;
        else if (TEST_BIT (visited, pred_bb->index))
  	continue;
! 
!       else if (! TEST_BIT (transp[pred_bb->index], expr_index))
  	break;
- 
        /* Not killed.  */
        else
  	{
  	  SET_BIT (visited, pred_bb->index);
! 	  if (! hoist_expr_reaches_here_p (expr_bb, expr_index, pred_bb,
! 					   visited, distance, bb_size))
  	    break;
  	}
      }
    if (visited_allocated_locally)
!     sbitmap_free (visited);
  
    return (pred == NULL);
  }
--- 2923,2957 ----
  	continue;
        else if (TEST_BIT (visited, pred_bb->index))
  	continue;
!       else if (! TEST_BIT (transp[pred_bb->index], expr->bitmap_index))
  	break;
        /* Not killed.  */
        else
  	{
  	  SET_BIT (visited, pred_bb->index);
! 	  if (! should_hoist_expr_to_dom (expr_bb, expr, pred_bb,
! 					  visited, distance, bb_size,
! 					  pressure_class, nregs, hoisted_bbs))
  	    break;
  	}
      }
    if (visited_allocated_locally)
!     {
!       /* If EXPR can be hoisted to expr_bb, record basic blocks through
! 	 which EXPR is hoisted in hoisted_bbs.  Also update register
! 	 pressure for basic blocks newly added in hoisted_bbs.  */
!       if (flag_ira_hoist_pressure && !pred)
! 	{
! 	  EXECUTE_IF_SET_IN_SBITMAP (visited, 0, i, sbi)
! 	    if (!bitmap_bit_p (hoisted_bbs, i))
! 	      {
! 		bitmap_set_bit (hoisted_bbs, i);
! 		BB_DATA (BASIC_BLOCK (i))->max_reg_pressure[pressure_class]
! 		    += *nregs;
! 	      }
! 	}
!       sbitmap_free (visited);
!     }
  
    return (pred == NULL);
  }
*************** find_occr_in_bb (struct occr *occr, basi
*** 2908,2914 ****
    return occr;
  }
  
! /* Actually perform code hoisting.  */
  
  static int
  hoist_code (void)
--- 2968,3011 ----
    return occr;
  }
  
! /* Actually perform code hoisting.
! 
!    The code hoisting pass can hoist multiple computations of the same
!    expression along dominated path to a dominating basic block, like
!    from b2/b3 to b1 as depicted below:
! 
!           b1      ------
!           /\         |
!          /  \        |
!         bx   by   distance
!        /      \      |
!       /        \     |
!      b2        b3 ------
! 
!    Unfortunately code hoisting generally extends the live range of an
!    output pseudo register, which increases register pressure and hurts
!    register allocation.  To address this issue, an attribute MAX_DISTANCE
!    is computed and attached to each expression.  The attribute is computed
!    from rtx cost of the corresponding expression and it's used to control
!    how long the expression can be hoisted up in flow graph.  As the
!    expression is hoisted up in flow graph, GCC decreases its DISTANCE
!    and stops the hoist if DISTANCE reaches 0.
! 
!    Option "-fira-hoist-pressure" implements register pressure directed
!    hoist based on upper method.  The rationale is:
!      1. Calculate register pressure for each basic block by reusing IRA
! 	facility.
!      2. When expression is hoisted through one basic block, GCC checks
! 	register pressure of the basic block and decrease DISTANCE only
! 	when the register pressure is high.  In other words, expression
! 	will be hoisted through basic block with low register pressure
! 	at no cost.
!      3. Update register pressure information for basic blocks through
!  	which expression is hoisted.
! 	TODO: It is possible to have register pressure decreased because
! 	of shrinked live ranges of input pseudo registers when hoisting
! 	an expression.  For now, this effect is not simulated and we just
! 	increase register pressure for hoisted expressions.  */
  
  static int
  hoist_code (void)
*************** hoist_code (void)
*** 2917,2928 ****
    VEC (basic_block, heap) *dom_tree_walk;
    unsigned int dom_tree_walk_index;
    VEC (basic_block, heap) *domby;
!   unsigned int i,j;
    struct expr **index_map;
    struct expr *expr;
    int *to_bb_head;
    int *bb_size;
    int changed = 0;
  
    /* Compute a mapping from expression number (`bitmap_index') to
       hash table entry.  */
--- 3014,3031 ----
    VEC (basic_block, heap) *dom_tree_walk;
    unsigned int dom_tree_walk_index;
    VEC (basic_block, heap) *domby;
!   unsigned int i, j, k;
    struct expr **index_map;
    struct expr *expr;
    int *to_bb_head;
    int *bb_size;
    int changed = 0;
+   struct bb_data *data;
+   /* Basic blocks that have occurrences reachable from BB.  */
+   bitmap from_bbs;
+   /* Basic blocks through which expr is hoisted.  */
+   bitmap hoisted_bbs = NULL;
+   bitmap_iterator bi;
  
    /* Compute a mapping from expression number (`bitmap_index') to
       hash table entry.  */
*************** hoist_code (void)
*** 2960,2965 ****
--- 3063,3072 ----
  	      && (EDGE_SUCC (ENTRY_BLOCK_PTR, 0)->dest
  		  == ENTRY_BLOCK_PTR->next_bb));
  
+   from_bbs = BITMAP_ALLOC (NULL);
+   if (flag_ira_hoist_pressure)
+     hoisted_bbs = BITMAP_ALLOC (NULL);
+ 
    dom_tree_walk = get_all_dominated_blocks (CDI_DOMINATORS,
  					    ENTRY_BLOCK_PTR->next_bb);
  
*************** hoist_code (void)
*** 2978,2989 ****
  	{
  	  if (TEST_BIT (hoist_vbeout[bb->index], i))
  	    {
  	      /* Current expression.  */
  	      struct expr *expr = index_map[i];
  	      /* Number of occurrences of EXPR that can be hoisted to BB.  */
  	      int hoistable = 0;
- 	      /* Basic blocks that have occurrences reachable from BB.  */
- 	      bitmap_head _from_bbs, *from_bbs = &_from_bbs;
  	      /* Occurrences reachable from BB.  */
  	      VEC (occr_t, heap) *occrs_to_hoist = NULL;
  	      /* We want to insert the expression into BB only once, so
--- 3085,3096 ----
  	{
  	  if (TEST_BIT (hoist_vbeout[bb->index], i))
  	    {
+ 	      int nregs = 0;
+ 	      enum reg_class pressure_class = NO_REGS;
  	      /* Current expression.  */
  	      struct expr *expr = index_map[i];
  	      /* Number of occurrences of EXPR that can be hoisted to BB.  */
  	      int hoistable = 0;
  	      /* Occurrences reachable from BB.  */
  	      VEC (occr_t, heap) *occrs_to_hoist = NULL;
  	      /* We want to insert the expression into BB only once, so
*************** hoist_code (void)
*** 2991,2998 ****
  	      int insn_inserted_p;
  	      occr_t occr;
  
- 	      bitmap_initialize (from_bbs, 0);
- 
  	      /* If an expression is computed in BB and is available at end of
  		 BB, hoist all occurrences dominated by BB to BB.  */
  	      if (TEST_BIT (comp[bb->index], i))
--- 3098,3103 ----
*************** hoist_code (void)
*** 3046,3058 ****
  		    max_distance += (bb_size[dominated->index]
  				     - to_bb_head[INSN_UID (occr->insn)]);
  
! 		  /* Note if the expression would reach the dominated block
! 		     unimpared if it was placed at the end of BB.
  
  		     Keep track of how many times this expression is hoistable
  		     from a dominated block into BB.  */
! 		  if (hoist_expr_reaches_here_p (bb, i, dominated, NULL,
! 						 max_distance, bb_size))
  		    {
  		      hoistable++;
  		      VEC_safe_push (occr_t, heap,
--- 3151,3168 ----
  		    max_distance += (bb_size[dominated->index]
  				     - to_bb_head[INSN_UID (occr->insn)]);
  
! 		  pressure_class = get_pressure_class_and_nregs (occr->insn,
! 								 &nregs);
! 
! 		  /* Note if the expression should be hoisted from the dominated
! 		     block to BB if it can reach DOMINATED unimpared.
  
  		     Keep track of how many times this expression is hoistable
  		     from a dominated block into BB.  */
! 		  if (should_hoist_expr_to_dom (bb, expr, dominated, NULL,
! 						max_distance, bb_size,
! 						pressure_class,	&nregs,
! 						hoisted_bbs))
  		    {
  		      hoistable++;
  		      VEC_safe_push (occr_t, heap,
*************** hoist_code (void)
*** 3073,3078 ****
--- 3183,3195 ----
  		 to nullify any benefit we get from code hoisting.  */
  	      if (hoistable > 1 && dbg_cnt (hoist_insn))
  		{
+ 		  /* Update register pressure for basic block to which expr
+ 		     is hoisted.  */
+ 		  if (flag_ira_hoist_pressure)
+ 		    {
+ 		      data = BB_DATA (bb);
+ 		      data->max_reg_pressure[pressure_class] += nregs;
+ 		    }
  		  /* If (hoistable != VEC_length), then there is
  		     an occurrence of EXPR in BB itself.  Don't waste
  		     time looking for LCA in this case.  */
*************** hoist_code (void)
*** 3090,3097 ****
  		    }
  		}
  	      else
! 		/* Punt, no point hoisting a single occurence.  */
! 		VEC_free (occr_t, heap, occrs_to_hoist);
  
  	      insn_inserted_p = 0;
  
--- 3207,3226 ----
  		    }
  		}
  	      else
! 		{
! 		  /* Punt, no point hoisting a single occurence.  */
! 		  VEC_free (occr_t, heap, occrs_to_hoist);
! 		  /* Restore register pressure of basic block recorded in
!  		     hoisted_bbs when expr will not be hoisted.  */
! 		  if (flag_ira_hoist_pressure)
! 		    EXECUTE_IF_SET_IN_BITMAP (hoisted_bbs, 0, k, bi)
! 		      {
! 			data = BB_DATA (BASIC_BLOCK (k));
! 			data->max_reg_pressure[pressure_class] -= nregs;
! 		      }
! 		}
! 	      if (flag_ira_hoist_pressure)
! 		bitmap_clear (hoisted_bbs);
  
  	      insn_inserted_p = 0;
  
*************** hoist_code (void)
*** 3141,3146 ****
--- 3270,3279 ----
      }
  
    VEC_free (basic_block, heap, dom_tree_walk);
+   BITMAP_FREE (from_bbs);
+   if (flag_ira_hoist_pressure)
+     BITMAP_FREE (hoisted_bbs);
+ 
    free (bb_size);
    free (to_bb_head);
    free (index_map);
*************** hoist_code (void)
*** 3148,3153 ****
--- 3281,3445 ----
    return changed;
  }
  
+ /* Return pressure class and number of needed hard registers (through
+    *NREGS) of register REGNO.  */
+ static enum reg_class
+ get_regno_pressure_class (int regno, int *nregs)
+ {
+   if (regno >= FIRST_PSEUDO_REGISTER)
+     {
+       enum reg_class pressure_class;
+ 
+       pressure_class = reg_allocno_class (regno);
+       pressure_class = ira_pressure_class_translate[pressure_class];
+       *nregs
+ 	= ira_reg_class_max_nregs[pressure_class][PSEUDO_REGNO_MODE (regno)];
+       return pressure_class;
+     }
+   else if (! TEST_HARD_REG_BIT (ira_no_alloc_regs, regno)
+ 	   && ! TEST_HARD_REG_BIT (eliminable_regset, regno))
+     {
+       *nregs = 1;
+       return ira_pressure_class_translate[REGNO_REG_CLASS (regno)];
+     }
+   else
+     {
+       *nregs = 0;
+       return NO_REGS;
+     }
+ }
+ 
+ /* Return pressure class and number of hard registers (through *NREGS)
+    for destination of INSN. */
+ static enum reg_class
+ get_pressure_class_and_nregs (rtx insn, int *nregs)
+ {
+   rtx reg;
+   enum reg_class pressure_class;
+   rtx set = single_set (insn);
+ 
+   /* Considered invariant insns have only one set.  */
+   gcc_assert (set != NULL_RTX);
+   reg = SET_DEST (set);
+   if (GET_CODE (reg) == SUBREG)
+     reg = SUBREG_REG (reg);
+   if (MEM_P (reg))
+     {
+       *nregs = 0;
+       pressure_class = NO_REGS;
+     }
+   else
+     {
+       gcc_assert (REG_P (reg));
+       pressure_class = reg_allocno_class (REGNO (reg));
+       pressure_class = ira_pressure_class_translate[pressure_class];
+       *nregs
+ 	= ira_reg_class_max_nregs[pressure_class][GET_MODE (SET_SRC (set))];
+     }
+   return pressure_class;
+ }
+ 
+ /* Increase (if INCR_P) or decrease current register pressure for
+    register REGNO.  */
+ static void
+ change_pressure (int regno, bool incr_p)
+ {
+   int nregs;
+   enum reg_class pressure_class;
+ 
+   pressure_class = get_regno_pressure_class (regno, &nregs);
+   if (! incr_p)
+     curr_reg_pressure[pressure_class] -= nregs;
+   else
+     {
+       curr_reg_pressure[pressure_class] += nregs;
+       if (BB_DATA (curr_bb)->max_reg_pressure[pressure_class]
+ 	  < curr_reg_pressure[pressure_class])
+ 	BB_DATA (curr_bb)->max_reg_pressure[pressure_class]
+ 	  = curr_reg_pressure[pressure_class];
+     }
+ }
+ 
+ /* Calculate register pressure for each basic block by walking insns
+    from last to first.  */
+ static void
+ calculate_bb_reg_pressure (void)
+ {
+   int i;
+   unsigned int j;
+   rtx insn;
+   basic_block bb;
+   bitmap curr_regs_live;
+   bitmap_iterator bi;
+ 
+ 
+   ira_setup_eliminable_regset ();
+   curr_regs_live = BITMAP_ALLOC (&reg_obstack);
+   FOR_EACH_BB (bb)
+     {
+       curr_bb = bb;
+       bitmap_copy (curr_regs_live, DF_LR_OUT (bb));
+       for (i = 0; i < ira_pressure_classes_num; i++)
+ 	curr_reg_pressure[ira_pressure_classes[i]] = 0;
+       EXECUTE_IF_SET_IN_BITMAP (curr_regs_live, 0, j, bi)
+ 	change_pressure (j, true);
+ 
+       FOR_BB_INSNS_REVERSE (bb, insn)
+ 	{
+ 	  rtx dreg;
+ 	  int regno;
+ 	  df_ref *def_rec, *use_rec;
+ 
+ 	  if (! NONDEBUG_INSN_P (insn))
+ 	    continue;
+ 
+ 	  for (def_rec = DF_INSN_DEFS (insn); *def_rec; def_rec++)
+ 	    {
+ 	      dreg = DF_REF_REAL_REG (*def_rec);
+ 	      gcc_assert (REG_P (dreg));
+ 	      regno = REGNO (dreg);
+ 	      if (!(DF_REF_FLAGS (*def_rec) 
+ 		    & (DF_REF_PARTIAL | DF_REF_CONDITIONAL)))
+ 		{
+ 		  if (bitmap_clear_bit (curr_regs_live, regno))
+ 		    change_pressure (regno, false);
+ 		}
+ 	    }
+ 
+ 	  for (use_rec = DF_INSN_USES (insn); *use_rec; use_rec++)
+ 	    {
+ 	      dreg = DF_REF_REAL_REG (*use_rec);
+ 	      gcc_assert (REG_P (dreg));
+ 	      regno = REGNO (dreg);
+ 	      if (bitmap_set_bit (curr_regs_live, regno))
+ 		change_pressure (regno, true);
+ 	    }
+ 	}
+     }
+   BITMAP_FREE (curr_regs_live);
+ 
+   if (dump_file == NULL)
+     return;
+ 
+   fprintf (dump_file, "\nRegister Pressure: \n");
+   FOR_EACH_BB (bb)
+     {
+       fprintf (dump_file, "  Basic block %d: \n", bb->index);
+       for (i = 0; (int) i < ira_pressure_classes_num; i++)
+ 	{
+ 	  enum reg_class pressure_class;
+ 
+ 	  pressure_class = ira_pressure_classes[i];
+ 	  if (BB_DATA (bb)->max_reg_pressure[pressure_class] == 0)
+ 	    continue;
+ 
+ 	  fprintf (dump_file, "    %s=%d\n", reg_class_names[pressure_class],
+ 		   BB_DATA (bb)->max_reg_pressure[pressure_class]);
+ 	}
+     }
+   fprintf (dump_file, "\n");
+ }
+ 
  /* Top level routine to perform one code hoisting (aka unification) pass
  
     Return nonzero if a change was made.  */
*************** one_code_hoisting_pass (void)
*** 3167,3172 ****
--- 3459,3474 ----
  
    doing_code_hoisting_p = true;
  
+   /* Calculate register pressure for each basic block.  */
+   if (flag_ira_hoist_pressure)
+     {
+       regstat_init_n_sets_and_refs ();
+       ira_set_pseudo_classes (false, dump_file);
+       alloc_aux_for_blocks (sizeof (struct bb_data));
+       calculate_bb_reg_pressure ();
+       regstat_free_n_sets_and_refs ();
+     }
+ 
    /* We need alias.  */
    init_alias_analysis ();
  
*************** one_code_hoisting_pass (void)
*** 3187,3192 ****
--- 3489,3499 ----
        free_code_hoist_mem ();
      }
  
+   if (flag_ira_hoist_pressure)
+     {
+       free_aux_for_blocks ();
+       free_reg_info ();
+     }
    free_hash_table (&expr_hash_table);
    free_gcse_mem ();
    obstack_free (&gcse_obstack, NULL);
diff -x .svn -rpN wc1/gcc/haifa-sched.c wc2/gcc/haifa-sched.c
*** wc1/gcc/haifa-sched.c	2012-10-12 15:13:41.582914633 +0800
--- wc2/gcc/haifa-sched.c	2012-10-12 15:12:15.690110976 +0800
*************** sched_init (void)
*** 6629,6635 ****
  	/* We need info about pseudos for rtl dumps about pseudo
  	   classes and costs.  */
  	regstat_init_n_sets_and_refs ();
!       ira_set_pseudo_classes (sched_verbose ? sched_dump : NULL);
        sched_regno_pressure_class
  	= (enum reg_class *) xmalloc (max_regno * sizeof (enum reg_class));
        for (i = 0; i < max_regno; i++)
--- 6629,6635 ----
  	/* We need info about pseudos for rtl dumps about pseudo
  	   classes and costs.  */
  	regstat_init_n_sets_and_refs ();
!       ira_set_pseudo_classes (true, sched_verbose ? sched_dump : NULL);
        sched_regno_pressure_class
  	= (enum reg_class *) xmalloc (max_regno * sizeof (enum reg_class));
        for (i = 0; i < max_regno; i++)
diff -x .svn -rpN wc1/gcc/ira.c wc2/gcc/ira.c
*** wc1/gcc/ira.c	2012-10-12 15:13:41.687618514 +0800
--- wc2/gcc/ira.c	2012-10-12 15:12:15.690110976 +0800
*************** ira (FILE *f)
*** 4183,4189 ****
    crtl->is_leaf = leaf_function_p ();
  
    if (resize_reg_info () && flag_ira_loop_pressure)
!     ira_set_pseudo_classes (ira_dump_file);
  
    rebuild_p = update_equiv_regs ();
  
--- 4183,4189 ----
    crtl->is_leaf = leaf_function_p ();
  
    if (resize_reg_info () && flag_ira_loop_pressure)
!     ira_set_pseudo_classes (true, ira_dump_file);
  
    rebuild_p = update_equiv_regs ();
  
diff -x .svn -rpN wc1/gcc/ira-costs.c wc2/gcc/ira-costs.c
*** wc1/gcc/ira-costs.c	2012-10-12 15:13:41.694110750 +0800
--- wc2/gcc/ira-costs.c	2012-10-12 15:12:15.690110976 +0800
*************** ira_costs (void)
*** 2048,2056 ****
    ira_free (total_allocno_costs);
  }
  
! /* Entry function which defines classes for pseudos.  */
  void
! ira_set_pseudo_classes (FILE *dump_file)
  {
    allocno_p = false;
    internal_flag_ira_verbose = flag_ira_verbose;
--- 2048,2057 ----
    ira_free (total_allocno_costs);
  }
  
! /* Entry function which defines classes for pseudos.
!    Set pseudo_classes_defined_p only if DEFINE_PSEUDO_CLASSES is true.  */
  void
! ira_set_pseudo_classes (bool define_pseudo_classes, FILE *dump_file)
  {
    allocno_p = false;
    internal_flag_ira_verbose = flag_ira_verbose;
*************** ira_set_pseudo_classes (FILE *dump_file)
*** 2059,2065 ****
    initiate_regno_cost_classes ();
    find_costs_and_classes (dump_file);
    finish_regno_cost_classes ();
!   pseudo_classes_defined_p = true;
    finish_costs ();
  }
  
--- 2060,2068 ----
    initiate_regno_cost_classes ();
    find_costs_and_classes (dump_file);
    finish_regno_cost_classes ();
!   if (define_pseudo_classes)
!     pseudo_classes_defined_p = true;
! 
    finish_costs ();
  }
  
diff -x .svn -rpN wc1/gcc/ira.h wc2/gcc/ira.h
*** wc1/gcc/ira.h	2012-10-12 15:13:41.691617905 +0800
--- wc2/gcc/ira.h	2012-10-12 15:12:15.690110976 +0800
***************
*** 1,6 ****
  /* Communication between the Integrated Register Allocator (IRA) and
     the rest of the compiler.
!    Copyright (C) 2006, 2007, 2008, 2009, 2010
     Free Software Foundation, Inc.
     Contributed by Vladimir Makarov <vmakarov@redhat.com>.
  
--- 1,6 ----
  /* Communication between the Integrated Register Allocator (IRA) and
     the rest of the compiler.
!    Copyright (C) 2006, 2007, 2008, 2009, 2010, 2012
     Free Software Foundation, Inc.
     Contributed by Vladimir Makarov <vmakarov@redhat.com>.
  
*************** extern void ira_init (void);
*** 131,137 ****
  extern void ira_finish_once (void);
  extern void ira_setup_eliminable_regset (void);
  extern rtx ira_eliminate_regs (rtx, enum machine_mode);
! extern void ira_set_pseudo_classes (FILE *);
  extern void ira_implicitly_set_insn_hard_regs (HARD_REG_SET *);
  
  extern void ira_sort_regnos_for_alter_reg (int *, int, unsigned int *);
--- 131,137 ----
  extern void ira_finish_once (void);
  extern void ira_setup_eliminable_regset (void);
  extern rtx ira_eliminate_regs (rtx, enum machine_mode);
! extern void ira_set_pseudo_classes (bool, FILE *);
  extern void ira_implicitly_set_insn_hard_regs (HARD_REG_SET *);
  
  extern void ira_sort_regnos_for_alter_reg (int *, int, unsigned int *);
diff -x .svn -rpN wc1/gcc/loop-invariant.c wc2/gcc/loop-invariant.c
*** wc1/gcc/loop-invariant.c	2012-10-12 15:13:41.674117845 +0800
--- wc2/gcc/loop-invariant.c	2012-10-12 15:12:15.710111844 +0800
***************
*** 1,5 ****
  /* RTL-level loop invariant motion.
!    Copyright (C) 2004, 2005, 2006, 2007, 2008, 2009, 2010
     Free Software Foundation, Inc.
  
  This file is part of GCC.
--- 1,5 ----
  /* RTL-level loop invariant motion.
!    Copyright (C) 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2012
     Free Software Foundation, Inc.
  
  This file is part of GCC.
*************** move_loop_invariants (void)
*** 1915,1921 ****
      {
        df_analyze ();
        regstat_init_n_sets_and_refs ();
!       ira_set_pseudo_classes (dump_file);
        calculate_loop_reg_pressure ();
        regstat_free_n_sets_and_refs ();
      }
--- 1915,1921 ----
      {
        df_analyze ();
        regstat_init_n_sets_and_refs ();
!       ira_set_pseudo_classes (true, dump_file);
        calculate_loop_reg_pressure ();
        regstat_free_n_sets_and_refs ();
      }
diff -x .svn -rpN wc1/gcc/regmove.c wc2/gcc/regmove.c
*** wc1/gcc/regmove.c	2012-10-12 15:13:41.659610135 +0800
--- wc2/gcc/regmove.c	2012-10-12 15:12:15.710111844 +0800
***************
*** 1,6 ****
  /* Move registers around to reduce number of move instructions needed.
     Copyright (C) 1987, 1988, 1989, 1992, 1993, 1994, 1995, 1996, 1997, 1998,
!    1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010
     Free Software Foundation, Inc.
  
  This file is part of GCC.
--- 1,7 ----
  /* Move registers around to reduce number of move instructions needed.
     Copyright (C) 1987, 1988, 1989, 1992, 1993, 1994, 1995, 1996, 1997, 1998,
!    1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010,
!    2012
     Free Software Foundation, Inc.
  
  This file is part of GCC.
*************** regmove_optimize (void)
*** 1237,1243 ****
    regstat_compute_ri ();
  
    if (flag_ira_loop_pressure)
!     ira_set_pseudo_classes (dump_file);
  
    regno_src_regno = XNEWVEC (int, nregs);
    for (i = nregs; --i >= 0; )
--- 1238,1244 ----
    regstat_compute_ri ();
  
    if (flag_ira_loop_pressure)
!     ira_set_pseudo_classes (true, dump_file);
  
    regno_src_regno = XNEWVEC (int, nregs);
    for (i = nregs; --i >= 0; )
diff -x .svn -rpN wc1/gcc/testsuite/gcc.dg/hoist-register-pressure.c wc2/gcc/testsuite/gcc.dg/hoist-register-pressure.c
*** wc1/gcc/testsuite/gcc.dg/hoist-register-pressure.c	1970-01-01 08:00:00.000000000 +0800
--- wc2/gcc/testsuite/gcc.dg/hoist-register-pressure.c	2012-10-12 15:12:15.710111844 +0800
***************
*** 0 ****
--- 1,31 ----
+ /* { dg-options "-Os -fdump-rtl-hoist" }  */
+ /* { dg-final { scan-rtl-dump "PRE/HOIST: end of bb .* copying expression" "hoist" } } */
+ 
+ #define BUF 100
+ int a[BUF];
+ 
+ void com (int);
+ void bar (int);
+ 
+ int foo (int x, int y, int z)
+ {
+   /* "x+y" won't be hoisted if "-fira-hoist-pressure" is disabled,
+      because its rtx_cost is too small.  */
+   if (z)
+     {
+       a[1] = a[0] + a[2];
+       a[2] = a[1] + a[3];
+       a[3] = a[2] + a[4];
+       a[4] = a[3] + a[5];
+       a[5] = a[4] + a[6];
+       a[6] = a[5] + a[7];
+       a[7] = a[6] + a[8];
+       com (x+y);
+     }
+   else
+     {
+       bar (x+y);
+     }
+ 
+   return 0;
+ }

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH RFA] Implement register pressure directed hoist pass
  2012-10-12  8:20           ` Bin Cheng
@ 2012-10-16  8:50             ` Bin Cheng
  2012-10-16 17:08               ` Jeff Law
  0 siblings, 1 reply; 16+ messages in thread
From: Bin Cheng @ 2012-10-16  8:50 UTC (permalink / raw)
  To: 'Steven Bosscher', Jeff Law; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 4066 bytes --]

Hi Steven, Jeff,
I found a flaw in original patch, which results in inaccurate register
pressure.
Here comes the updated patches, improving code size a little bit on
Thumb2/powerpc comparing to original patches.
Please review.

Thanks

2012-10-16  Bin Cheng  <bin.cheng@arm.com>

	* gcse.c: Update copyright dates.
	(hoist_expr_reaches_here_p): Change parameter type from char *
	to sbitmap.

2012-10-16  Bin Cheng  <bin.cheng@arm.com>

	* common.opt (flag_ira_hoist_pressure): New.
	* doc/invoke.texi (-fira-hoist-pressure): Describe.
	* ira-costs.c (ira_set_pseudo_classes): New parameter.
	* ira.h: Update copyright dates.
	(ira_set_pseudo_classes): Update prototype.
	* haifa-sched.c (sched_init): Update call.
	* ira.c (ira): Update call.
	* regmove.c: Update copyright dates.
	(regmove_optimize): Update call.
	* loop-invariant.c: Update copyright dates.
	(move_loop_invariants): Update call.
	* gcse.c: (struct bb_data): New structure.
	(BB_DATA): New macro.
	(curr_bb, curr_reg_pressure): New static variables.
	(should_hoist_expr_to_dom): Rename from hoist_expr_reaches_here_p.
	Change parameter expr_index to expr.
	New parameters pressure_class, nregs and hoisted_bbs.
	Use reg pressure to determine the distance expr can be hoisted.
	(hoist_code): Use reg pressure to direct the hoist process.
	(get_regno_pressure_class, get_pressure_class_and_nregs)
	(change_pressure, calculate_bb_reg_pressure): New.
	(one_code_hoisting_pass): Calculate register pressure. Allocate
	and free data.

gcc/testsuite/ChangeLog
2012-10-16  Bin Cheng  <bin.cheng@arm.com>

	* testsuite/gcc.dg/hoist-register-pressure.c: New test.


> -----Original Message-----
> From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches-owner@gcc.gnu.org]
On
> Behalf Of Bin Cheng
> Sent: Friday, October 12, 2012 4:09 PM
> To: 'Steven Bosscher'
> Cc: Jeff Law; gcc-patches@gcc.gnu.org
> Subject: RE: [PATCH RFA] Implement register pressure directed hoist pass
> 
> Hi,
> This is the updated patches split from original one according to Steven's
> suggestion. Also fixed spelling errors.
> 
> Apart from this, I also implemented a draft patch simulating register
pressure
> accurately during hoisting, unfortunately the size data isn't better than
this
> patch. If it's right, this can prove my previous observation that decrease
of
> register pressure during hoisting process is rare. I will continue
> investigating the correctness of the patch and see what I can get.
> 
> Please review. Thanks
> 
> And the ChangeLog:
> 
> 2012-10-12  Bin Cheng  <bin.cheng@arm.com>
> 
> 	* gcse.c: Update copyright dates.
> 	(hoist_expr_reaches_here_p): Change parameter type from char *
> 	to sbitmap.
> 
> 2012-10-12  Bin Cheng  <bin.cheng@arm.com>
> 
> 	* common.opt (flag_ira_hoist_pressure): New.
> 	* doc/invoke.texi (-fira-hoist-pressure): Describe.
> 	* ira-costs.c (ira_set_pseudo_classes): New parameter.
> 	* ira.h: Update copyright dates.
> 	(ira_set_pseudo_classes): Update prototype.
> 	* haifa-sched.c (sched_init): Update call.
> 	* ira.c (ira): Update call.
> 	* regmove.c: Update copyright dates.
> 	(regmove_optimize): Update call.
> 	* loop-invariant.c: Update copyright dates.
> 	(move_loop_invariants): Update call.
> 	* gcse.c: (struct bb_data): New structure.
> 	(BB_DATA): New macro.
> 	(curr_bb, curr_reg_pressure): New static variables.
> 	(should_hoist_expr_to_dom): Rename from hoist_expr_reaches_here_p.
> 	Change parameter expr_index to expr.
> 	New parameters pressure_class, nregs and hoisted_bbs.
> 	Use reg pressure to determine the distance expr can be hoisted.
> 	(hoist_code): Use reg pressure to direct the hoist process.
> 	(get_regno_pressure_class, get_pressure_class_and_nregs)
> 	(change_pressure, calculate_bb_reg_pressure): New.
> 	(one_code_hoisting_pass): Calculate register pressure. Allocate
> 	and free data.
> 
> gcc/testsuite/ChangeLog
> 2012-10-12  Bin Cheng  <bin.cheng@arm.com>
> 
> 	* testsuite/gcc.dg/hoist-register-pressure.c: New test.

[-- Attachment #2: hoist-reg-pressure-20121016-002.txt --]
[-- Type: text/plain, Size: 32854 bytes --]

diff -x .svn -rpN wc1/gcc/common.opt wc2/gcc/common.opt
*** wc1/gcc/common.opt	2012-10-12 15:13:41.679617846 +0800
--- wc2/gcc/common.opt	2012-10-16 15:19:35.231674135 +0800
*************** Enum(ira_region) String(all) Value(IRA_R
*** 1392,1397 ****
--- 1392,1402 ----
  EnumValue
  Enum(ira_region) String(mixed) Value(IRA_REGION_MIXED)
  
+ fira-hoist-pressure
+ Common Report Var(flag_ira_hoist_pressure) Init(1) Optimization
+ Use IRA based register pressure calculation
+ in RTL hoist optimizations.
+ 
  fira-loop-pressure
  Common Report Var(flag_ira_loop_pressure)
  Use IRA based register pressure calculation
diff -x .svn -rpN wc1/gcc/doc/invoke.texi wc2/gcc/doc/invoke.texi
*** wc1/gcc/doc/invoke.texi	2012-10-12 15:13:40.282479595 +0800
--- wc2/gcc/doc/invoke.texi	2012-10-16 15:19:35.239111464 +0800
*************** Objective-C and Objective-C++ Dialects}.
*** 372,378 ****
  -finline-small-functions -fipa-cp -fipa-cp-clone @gol
  -fipa-pta -fipa-profile -fipa-pure-const -fipa-reference @gol
  -fira-algorithm=@var{algorithm} @gol
! -fira-region=@var{region} @gol
  -fira-loop-pressure -fno-ira-share-save-slots @gol
  -fno-ira-share-spill-slots -fira-verbose=@var{n} @gol
  -fivopts -fkeep-inline-functions -fkeep-static-consts @gol
--- 372,378 ----
  -finline-small-functions -fipa-cp -fipa-cp-clone @gol
  -fipa-pta -fipa-profile -fipa-pure-const -fipa-reference @gol
  -fira-algorithm=@var{algorithm} @gol
! -fira-region=@var{region} -fira-hoist-pressure @gol
  -fira-loop-pressure -fno-ira-share-save-slots @gol
  -fno-ira-share-spill-slots -fira-verbose=@var{n} @gol
  -fivopts -fkeep-inline-functions -fkeep-static-consts @gol
*************** This typically results in the smallest c
*** 6996,7001 ****
--- 6996,7009 ----
  
  @end table
  
+ @item -fira-hoist-pressure
+ @opindex fira-hoist-pressure
+ Use IRA to evaluate register pressure in the code hoisting pass for
+ decisions to hoist expressions.  This option usually results in smaller
+ code, but it can slow the compiler down.
+ 
+ This option is enabled at level @option{-Os} for all targets.
+ 
  @item -fira-loop-pressure
  @opindex fira-loop-pressure
  Use IRA to evaluate register pressure in loops for decisions to move
diff -x .svn -rpN wc1/gcc/gcse.c wc2/gcc/gcse.c
*** wc1/gcc/gcse.c	2012-10-12 15:45:12.822998377 +0800
--- wc2/gcc/gcse.c	2012-10-16 15:35:30.378109717 +0800
*************** along with GCC; see the file COPYING3.  
*** 20,28 ****
  
  /* TODO
     - reordering of memory allocation and freeing to be more space efficient
!    - do rough calc of how many regs are needed in each block, and a rough
!      calc of how many regs are available in each class and use that to
!      throttle back the code in cases where RTX_COST is minimal.
  */
  
  /* References searched while implementing this.
--- 20,30 ----
  
  /* TODO
     - reordering of memory allocation and freeing to be more space efficient
!    - simulate register pressure change of each basic block accurately during
!      hoist process.  But I doubt the benefit since most expressions hoisted
!      are constant or address, which usually won't reduce register pressure.
!    - calc rough register pressure information and use the info to drive all
!      kinds of code motion (including code hoisting) in a unified way.
  */
  
  /* References searched while implementing this.
*************** along with GCC; see the file COPYING3.  
*** 141,151 ****
  #include "diagnostic-core.h"
  #include "toplev.h"
  
  #include "rtl.h"
  #include "tree.h"
  #include "tm_p.h"
  #include "regs.h"
! #include "hard-reg-set.h"
  #include "flags.h"
  #include "insn-config.h"
  #include "recog.h"
--- 143,154 ----
  #include "diagnostic-core.h"
  #include "toplev.h"
  
+ #include "hard-reg-set.h"
  #include "rtl.h"
  #include "tree.h"
  #include "tm_p.h"
  #include "regs.h"
! #include "ira.h"
  #include "flags.h"
  #include "insn-config.h"
  #include "recog.h"
*************** static bool doing_code_hoisting_p = fals
*** 412,417 ****
--- 415,436 ----
  /* For available exprs */
  static sbitmap *ae_kill;
  \f
+ /* Data stored for each basic block.  */
+ struct bb_data
+ {
+   /* Maximal register pressure inside basic block for given register class
+      (defined only for the pressure classes).  */
+   int max_reg_pressure[N_REG_CLASSES];
+ };
+ 
+ #define BB_DATA(bb) ((struct bb_data *) (bb)->aux)
+ 
+ static basic_block curr_bb;
+ 
+ /* Currently register pressure for each pressure class.  */
+ static int curr_reg_pressure[N_REG_CLASSES];
+ \f
+ 
  static void compute_can_copy (void);
  static void *gmalloc (size_t) ATTRIBUTE_MALLOC;
  static void *gcalloc (size_t, size_t) ATTRIBUTE_MALLOC;
*************** static void alloc_code_hoist_mem (int, i
*** 460,468 ****
  static void free_code_hoist_mem (void);
  static void compute_code_hoist_vbeinout (void);
  static void compute_code_hoist_data (void);
! static int hoist_expr_reaches_here_p (basic_block, int, basic_block, sbitmap,
! 				      int, int *);
  static int hoist_code (void);
  static int one_code_hoisting_pass (void);
  static rtx process_insert_insn (struct expr *);
  static int pre_edge_insert (struct edge_list *, struct expr **);
--- 479,489 ----
  static void free_code_hoist_mem (void);
  static void compute_code_hoist_vbeinout (void);
  static void compute_code_hoist_data (void);
! static int should_hoist_expr_to_dom (basic_block, struct expr *, basic_block,
! 				     sbitmap, int, int *, enum reg_class,
! 				     int *, bitmap);
  static int hoist_code (void);
+ static enum reg_class get_pressure_class_and_nregs (rtx insn, int *nregs);
  static int one_code_hoisting_pass (void);
  static rtx process_insert_insn (struct expr *);
  static int pre_edge_insert (struct edge_list *, struct expr **);
*************** prune_expressions (bool pre_p)
*** 1857,1863 ****
  	 a basic block we should account for any side-effects of a subsequent
  	 jump instructions that could clobber the expression.  It would
  	 be best to implement this check along the lines of
! 	 hoist_expr_reaches_here_p where the target block is already known
  	 and, hence, there's no need to conservatively prune expressions on
  	 "intermediate" set-and-jump instructions.  */
        FOR_EACH_EDGE (e, ei, bb->preds)
--- 1878,1884 ----
  	 a basic block we should account for any side-effects of a subsequent
  	 jump instructions that could clobber the expression.  It would
  	 be best to implement this check along the lines of
! 	 should_hoist_expr_to_dom where the target block is already known
  	 and, hence, there's no need to conservatively prune expressions on
  	 "intermediate" set-and-jump instructions.  */
        FOR_EACH_EDGE (e, ei, bb->preds)
*************** compute_code_hoist_data (void)
*** 2825,2834 ****
      fprintf (dump_file, "\n");
  }
  
! /* Determine if the expression identified by EXPR_INDEX would
!    reach BB unimpared if it was placed at the end of EXPR_BB.
!    Stop the search if the expression would need to be moved more
!    than DISTANCE instructions.
  
     It's unclear exactly what Muchnick meant by "unimpared".  It seems
     to me that the expression must either be computed or transparent in
--- 2846,2866 ----
      fprintf (dump_file, "\n");
  }
  
! /* Determine if the expression EXPR should be hoisted to EXPR_BB up in
!    flow graph, if it can reach BB unimpared.  Stop the search if the
!    expression would need to be moved more than DISTANCE instructions.
! 
!    DISTANCE is the number of instructions through which EXPR can be
!    hoisted up in flow graph.
! 
!    BB_SIZE points to an array which contains the number of instructions
!    for each basic block.
! 
!    PRESSURE_CLASS and NREGS are register class and number of hard registers
!    for storing EXPR.
! 
!    HOISTED_BBS points to a bitmap indicating basic blocks through which
!    EXPR is hoisted.
  
     It's unclear exactly what Muchnick meant by "unimpared".  It seems
     to me that the expression must either be computed or transparent in
*************** compute_code_hoist_data (void)
*** 2841,2858 ****
     paths.  */
  
  static int
! hoist_expr_reaches_here_p (basic_block expr_bb, int expr_index, basic_block bb,
! 			   sbitmap visited, int distance, int *bb_size)
  {
    edge pred;
    edge_iterator ei;
    int visited_allocated_locally = 0;
  
    /* Terminate the search if distance, for which EXPR is allowed to move,
       is exhausted.  */
    if (distance > 0)
      {
!       distance -= bb_size[bb->index];
  
        if (distance <= 0)
  	return 0;
--- 2873,2904 ----
     paths.  */
  
  static int
! should_hoist_expr_to_dom (basic_block expr_bb, struct expr *expr,
! 			  basic_block bb, sbitmap visited, int distance,
! 			  int *bb_size, enum reg_class pressure_class,
! 			  int *nregs, bitmap hoisted_bbs)
  {
+   unsigned int i;
    edge pred;
    edge_iterator ei;
+   sbitmap_iterator sbi;
    int visited_allocated_locally = 0;
  
    /* Terminate the search if distance, for which EXPR is allowed to move,
       is exhausted.  */
    if (distance > 0)
      {
!       /* Let EXPR be hoisted through basic block at no cost if the block
! 	 has low register pressure.  An exception is constant expression,
! 	 because hoisting constant expr aggressively results in worse code.
! 	 The exception is made by the observation of CSiBE on ARM target,
! 	 while it has no obvious effect on other targets like x86, x86_64,
! 	 mips and powerpc.  */
!       if (!flag_ira_hoist_pressure
! 	  || (BB_DATA (bb)->max_reg_pressure[pressure_class]
! 		>= ira_class_hard_regs_num[pressure_class]
! 	      || CONST_INT_P (expr->expr)))
! 	distance -= bb_size[bb->index];
  
        if (distance <= 0)
  	return 0;
*************** hoist_expr_reaches_here_p (basic_block e
*** 2877,2897 ****
  	continue;
        else if (TEST_BIT (visited, pred_bb->index))
  	continue;
! 
!       else if (! TEST_BIT (transp[pred_bb->index], expr_index))
  	break;
- 
        /* Not killed.  */
        else
  	{
  	  SET_BIT (visited, pred_bb->index);
! 	  if (! hoist_expr_reaches_here_p (expr_bb, expr_index, pred_bb,
! 					   visited, distance, bb_size))
  	    break;
  	}
      }
    if (visited_allocated_locally)
!     sbitmap_free (visited);
  
    return (pred == NULL);
  }
--- 2923,2957 ----
  	continue;
        else if (TEST_BIT (visited, pred_bb->index))
  	continue;
!       else if (! TEST_BIT (transp[pred_bb->index], expr->bitmap_index))
  	break;
        /* Not killed.  */
        else
  	{
  	  SET_BIT (visited, pred_bb->index);
! 	  if (! should_hoist_expr_to_dom (expr_bb, expr, pred_bb,
! 					  visited, distance, bb_size,
! 					  pressure_class, nregs, hoisted_bbs))
  	    break;
  	}
      }
    if (visited_allocated_locally)
!     {
!       /* If EXPR can be hoisted to expr_bb, record basic blocks through
! 	 which EXPR is hoisted in hoisted_bbs.  Also update register
! 	 pressure for basic blocks newly added in hoisted_bbs.  */
!       if (flag_ira_hoist_pressure && !pred)
! 	{
! 	  EXECUTE_IF_SET_IN_SBITMAP (visited, 0, i, sbi)
! 	    if (!bitmap_bit_p (hoisted_bbs, i))
! 	      {
! 		bitmap_set_bit (hoisted_bbs, i);
! 		BB_DATA (BASIC_BLOCK (i))->max_reg_pressure[pressure_class]
! 		    += *nregs;
! 	      }
! 	}
!       sbitmap_free (visited);
!     }
  
    return (pred == NULL);
  }
*************** find_occr_in_bb (struct occr *occr, basi
*** 2908,2914 ****
    return occr;
  }
  
! /* Actually perform code hoisting.  */
  
  static int
  hoist_code (void)
--- 2968,3011 ----
    return occr;
  }
  
! /* Actually perform code hoisting.
! 
!    The code hoisting pass can hoist multiple computations of the same
!    expression along dominated path to a dominating basic block, like
!    from b2/b3 to b1 as depicted below:
! 
!           b1      ------
!           /\         |
!          /  \        |
!         bx   by   distance
!        /      \      |
!       /        \     |
!      b2        b3 ------
! 
!    Unfortunately code hoisting generally extends the live range of an
!    output pseudo register, which increases register pressure and hurts
!    register allocation.  To address this issue, an attribute MAX_DISTANCE
!    is computed and attached to each expression.  The attribute is computed
!    from rtx cost of the corresponding expression and it's used to control
!    how long the expression can be hoisted up in flow graph.  As the
!    expression is hoisted up in flow graph, GCC decreases its DISTANCE
!    and stops the hoist if DISTANCE reaches 0.
! 
!    Option "-fira-hoist-pressure" implements register pressure directed
!    hoist based on upper method.  The rationale is:
!      1. Calculate register pressure for each basic block by reusing IRA
! 	facility.
!      2. When expression is hoisted through one basic block, GCC checks
! 	register pressure of the basic block and decrease DISTANCE only
! 	when the register pressure is high.  In other words, expression
! 	will be hoisted through basic block with low register pressure
! 	at no cost.
!      3. Update register pressure information for basic blocks through
!  	which expression is hoisted.
! 	TODO: It is possible to have register pressure decreased because
! 	of shrinked live ranges of input pseudo registers when hoisting
! 	an expression.  For now, this effect is not simulated and we just
! 	increase register pressure for hoisted expressions.  */
  
  static int
  hoist_code (void)
*************** hoist_code (void)
*** 2917,2928 ****
    VEC (basic_block, heap) *dom_tree_walk;
    unsigned int dom_tree_walk_index;
    VEC (basic_block, heap) *domby;
!   unsigned int i,j;
    struct expr **index_map;
    struct expr *expr;
    int *to_bb_head;
    int *bb_size;
    int changed = 0;
  
    /* Compute a mapping from expression number (`bitmap_index') to
       hash table entry.  */
--- 3014,3031 ----
    VEC (basic_block, heap) *dom_tree_walk;
    unsigned int dom_tree_walk_index;
    VEC (basic_block, heap) *domby;
!   unsigned int i, j, k;
    struct expr **index_map;
    struct expr *expr;
    int *to_bb_head;
    int *bb_size;
    int changed = 0;
+   struct bb_data *data;
+   /* Basic blocks that have occurrences reachable from BB.  */
+   bitmap from_bbs;
+   /* Basic blocks through which expr is hoisted.  */
+   bitmap hoisted_bbs = NULL;
+   bitmap_iterator bi;
  
    /* Compute a mapping from expression number (`bitmap_index') to
       hash table entry.  */
*************** hoist_code (void)
*** 2960,2965 ****
--- 3063,3072 ----
  	      && (EDGE_SUCC (ENTRY_BLOCK_PTR, 0)->dest
  		  == ENTRY_BLOCK_PTR->next_bb));
  
+   from_bbs = BITMAP_ALLOC (NULL);
+   if (flag_ira_hoist_pressure)
+     hoisted_bbs = BITMAP_ALLOC (NULL);
+ 
    dom_tree_walk = get_all_dominated_blocks (CDI_DOMINATORS,
  					    ENTRY_BLOCK_PTR->next_bb);
  
*************** hoist_code (void)
*** 2978,2989 ****
  	{
  	  if (TEST_BIT (hoist_vbeout[bb->index], i))
  	    {
  	      /* Current expression.  */
  	      struct expr *expr = index_map[i];
  	      /* Number of occurrences of EXPR that can be hoisted to BB.  */
  	      int hoistable = 0;
- 	      /* Basic blocks that have occurrences reachable from BB.  */
- 	      bitmap_head _from_bbs, *from_bbs = &_from_bbs;
  	      /* Occurrences reachable from BB.  */
  	      VEC (occr_t, heap) *occrs_to_hoist = NULL;
  	      /* We want to insert the expression into BB only once, so
--- 3085,3096 ----
  	{
  	  if (TEST_BIT (hoist_vbeout[bb->index], i))
  	    {
+ 	      int nregs = 0;
+ 	      enum reg_class pressure_class = NO_REGS;
  	      /* Current expression.  */
  	      struct expr *expr = index_map[i];
  	      /* Number of occurrences of EXPR that can be hoisted to BB.  */
  	      int hoistable = 0;
  	      /* Occurrences reachable from BB.  */
  	      VEC (occr_t, heap) *occrs_to_hoist = NULL;
  	      /* We want to insert the expression into BB only once, so
*************** hoist_code (void)
*** 2991,2998 ****
  	      int insn_inserted_p;
  	      occr_t occr;
  
- 	      bitmap_initialize (from_bbs, 0);
- 
  	      /* If an expression is computed in BB and is available at end of
  		 BB, hoist all occurrences dominated by BB to BB.  */
  	      if (TEST_BIT (comp[bb->index], i))
--- 3098,3103 ----
*************** hoist_code (void)
*** 3046,3058 ****
  		    max_distance += (bb_size[dominated->index]
  				     - to_bb_head[INSN_UID (occr->insn)]);
  
! 		  /* Note if the expression would reach the dominated block
! 		     unimpared if it was placed at the end of BB.
  
  		     Keep track of how many times this expression is hoistable
  		     from a dominated block into BB.  */
! 		  if (hoist_expr_reaches_here_p (bb, i, dominated, NULL,
! 						 max_distance, bb_size))
  		    {
  		      hoistable++;
  		      VEC_safe_push (occr_t, heap,
--- 3151,3168 ----
  		    max_distance += (bb_size[dominated->index]
  				     - to_bb_head[INSN_UID (occr->insn)]);
  
! 		  pressure_class = get_pressure_class_and_nregs (occr->insn,
! 								 &nregs);
! 
! 		  /* Note if the expression should be hoisted from the dominated
! 		     block to BB if it can reach DOMINATED unimpared.
  
  		     Keep track of how many times this expression is hoistable
  		     from a dominated block into BB.  */
! 		  if (should_hoist_expr_to_dom (bb, expr, dominated, NULL,
! 						max_distance, bb_size,
! 						pressure_class,	&nregs,
! 						hoisted_bbs))
  		    {
  		      hoistable++;
  		      VEC_safe_push (occr_t, heap,
*************** hoist_code (void)
*** 3093,3098 ****
--- 3203,3230 ----
  		/* Punt, no point hoisting a single occurence.  */
  		VEC_free (occr_t, heap, occrs_to_hoist);
  
+ 	      if (flag_ira_hoist_pressure
+ 		  && !VEC_empty (occr_t, occrs_to_hoist))
+ 		{
+ 		  /* Update register pressure for basic block to which expr
+ 		     is hoisted.  */
+ 		  data = BB_DATA (bb);
+ 		  data->max_reg_pressure[pressure_class] += nregs;
+ 		}
+ 	      else if (flag_ira_hoist_pressure)
+ 		{
+ 		  /* Restore register pressure of basic block recorded in
+ 		     hoisted_bbs when expr will not be hoisted.  */
+ 		  EXECUTE_IF_SET_IN_BITMAP (hoisted_bbs, 0, k, bi)
+ 		    {
+ 		      data = BB_DATA (BASIC_BLOCK (k));
+ 		      data->max_reg_pressure[pressure_class] -= nregs;
+ 		    }
+ 		}
+ 
+ 	      if (flag_ira_hoist_pressure)
+ 		bitmap_clear (hoisted_bbs);
+ 
  	      insn_inserted_p = 0;
  
  	      /* Walk through occurrences of I'th expressions we want
*************** hoist_code (void)
*** 3141,3146 ****
--- 3273,3282 ----
      }
  
    VEC_free (basic_block, heap, dom_tree_walk);
+   BITMAP_FREE (from_bbs);
+   if (flag_ira_hoist_pressure)
+     BITMAP_FREE (hoisted_bbs);
+ 
    free (bb_size);
    free (to_bb_head);
    free (index_map);
*************** hoist_code (void)
*** 3148,3153 ****
--- 3284,3448 ----
    return changed;
  }
  
+ /* Return pressure class and number of needed hard registers (through
+    *NREGS) of register REGNO.  */
+ static enum reg_class
+ get_regno_pressure_class (int regno, int *nregs)
+ {
+   if (regno >= FIRST_PSEUDO_REGISTER)
+     {
+       enum reg_class pressure_class;
+ 
+       pressure_class = reg_allocno_class (regno);
+       pressure_class = ira_pressure_class_translate[pressure_class];
+       *nregs
+ 	= ira_reg_class_max_nregs[pressure_class][PSEUDO_REGNO_MODE (regno)];
+       return pressure_class;
+     }
+   else if (! TEST_HARD_REG_BIT (ira_no_alloc_regs, regno)
+ 	   && ! TEST_HARD_REG_BIT (eliminable_regset, regno))
+     {
+       *nregs = 1;
+       return ira_pressure_class_translate[REGNO_REG_CLASS (regno)];
+     }
+   else
+     {
+       *nregs = 0;
+       return NO_REGS;
+     }
+ }
+ 
+ /* Return pressure class and number of hard registers (through *NREGS)
+    for destination of INSN. */
+ static enum reg_class
+ get_pressure_class_and_nregs (rtx insn, int *nregs)
+ {
+   rtx reg;
+   enum reg_class pressure_class;
+   rtx set = single_set (insn);
+ 
+   /* Considered invariant insns have only one set.  */
+   gcc_assert (set != NULL_RTX);
+   reg = SET_DEST (set);
+   if (GET_CODE (reg) == SUBREG)
+     reg = SUBREG_REG (reg);
+   if (MEM_P (reg))
+     {
+       *nregs = 0;
+       pressure_class = NO_REGS;
+     }
+   else
+     {
+       gcc_assert (REG_P (reg));
+       pressure_class = reg_allocno_class (REGNO (reg));
+       pressure_class = ira_pressure_class_translate[pressure_class];
+       *nregs
+ 	= ira_reg_class_max_nregs[pressure_class][GET_MODE (SET_SRC (set))];
+     }
+   return pressure_class;
+ }
+ 
+ /* Increase (if INCR_P) or decrease current register pressure for
+    register REGNO.  */
+ static void
+ change_pressure (int regno, bool incr_p)
+ {
+   int nregs;
+   enum reg_class pressure_class;
+ 
+   pressure_class = get_regno_pressure_class (regno, &nregs);
+   if (! incr_p)
+     curr_reg_pressure[pressure_class] -= nregs;
+   else
+     {
+       curr_reg_pressure[pressure_class] += nregs;
+       if (BB_DATA (curr_bb)->max_reg_pressure[pressure_class]
+ 	  < curr_reg_pressure[pressure_class])
+ 	BB_DATA (curr_bb)->max_reg_pressure[pressure_class]
+ 	  = curr_reg_pressure[pressure_class];
+     }
+ }
+ 
+ /* Calculate register pressure for each basic block by walking insns
+    from last to first.  */
+ static void
+ calculate_bb_reg_pressure (void)
+ {
+   int i;
+   unsigned int j;
+   rtx insn;
+   basic_block bb;
+   bitmap curr_regs_live;
+   bitmap_iterator bi;
+ 
+ 
+   ira_setup_eliminable_regset ();
+   curr_regs_live = BITMAP_ALLOC (&reg_obstack);
+   FOR_EACH_BB (bb)
+     {
+       curr_bb = bb;
+       bitmap_copy (curr_regs_live, DF_LR_OUT (bb));
+       for (i = 0; i < ira_pressure_classes_num; i++)
+ 	curr_reg_pressure[ira_pressure_classes[i]] = 0;
+       EXECUTE_IF_SET_IN_BITMAP (curr_regs_live, 0, j, bi)
+ 	change_pressure (j, true);
+ 
+       FOR_BB_INSNS_REVERSE (bb, insn)
+ 	{
+ 	  rtx dreg;
+ 	  int regno;
+ 	  df_ref *def_rec, *use_rec;
+ 
+ 	  if (! NONDEBUG_INSN_P (insn))
+ 	    continue;
+ 
+ 	  for (def_rec = DF_INSN_DEFS (insn); *def_rec; def_rec++)
+ 	    {
+ 	      dreg = DF_REF_REAL_REG (*def_rec);
+ 	      gcc_assert (REG_P (dreg));
+ 	      regno = REGNO (dreg);
+ 	      if (!(DF_REF_FLAGS (*def_rec) 
+ 		    & (DF_REF_PARTIAL | DF_REF_CONDITIONAL)))
+ 		{
+ 		  if (bitmap_clear_bit (curr_regs_live, regno))
+ 		    change_pressure (regno, false);
+ 		}
+ 	    }
+ 
+ 	  for (use_rec = DF_INSN_USES (insn); *use_rec; use_rec++)
+ 	    {
+ 	      dreg = DF_REF_REAL_REG (*use_rec);
+ 	      gcc_assert (REG_P (dreg));
+ 	      regno = REGNO (dreg);
+ 	      if (bitmap_set_bit (curr_regs_live, regno))
+ 		change_pressure (regno, true);
+ 	    }
+ 	}
+     }
+   BITMAP_FREE (curr_regs_live);
+ 
+   if (dump_file == NULL)
+     return;
+ 
+   fprintf (dump_file, "\nRegister Pressure: \n");
+   FOR_EACH_BB (bb)
+     {
+       fprintf (dump_file, "  Basic block %d: \n", bb->index);
+       for (i = 0; (int) i < ira_pressure_classes_num; i++)
+ 	{
+ 	  enum reg_class pressure_class;
+ 
+ 	  pressure_class = ira_pressure_classes[i];
+ 	  if (BB_DATA (bb)->max_reg_pressure[pressure_class] == 0)
+ 	    continue;
+ 
+ 	  fprintf (dump_file, "    %s=%d\n", reg_class_names[pressure_class],
+ 		   BB_DATA (bb)->max_reg_pressure[pressure_class]);
+ 	}
+     }
+   fprintf (dump_file, "\n");
+ }
+ 
  /* Top level routine to perform one code hoisting (aka unification) pass
  
     Return nonzero if a change was made.  */
*************** one_code_hoisting_pass (void)
*** 3167,3172 ****
--- 3462,3477 ----
  
    doing_code_hoisting_p = true;
  
+   /* Calculate register pressure for each basic block.  */
+   if (flag_ira_hoist_pressure)
+     {
+       regstat_init_n_sets_and_refs ();
+       ira_set_pseudo_classes (false, dump_file);
+       alloc_aux_for_blocks (sizeof (struct bb_data));
+       calculate_bb_reg_pressure ();
+       regstat_free_n_sets_and_refs ();
+     }
+ 
    /* We need alias.  */
    init_alias_analysis ();
  
*************** one_code_hoisting_pass (void)
*** 3187,3192 ****
--- 3492,3502 ----
        free_code_hoist_mem ();
      }
  
+   if (flag_ira_hoist_pressure)
+     {
+       free_aux_for_blocks ();
+       free_reg_info ();
+     }
    free_hash_table (&expr_hash_table);
    free_gcse_mem ();
    obstack_free (&gcse_obstack, NULL);
diff -x .svn -rpN wc1/gcc/haifa-sched.c wc2/gcc/haifa-sched.c
*** wc1/gcc/haifa-sched.c	2012-10-12 15:13:41.582914633 +0800
--- wc2/gcc/haifa-sched.c	2012-10-16 15:19:35.243118188 +0800
*************** sched_init (void)
*** 6629,6635 ****
  	/* We need info about pseudos for rtl dumps about pseudo
  	   classes and costs.  */
  	regstat_init_n_sets_and_refs ();
!       ira_set_pseudo_classes (sched_verbose ? sched_dump : NULL);
        sched_regno_pressure_class
  	= (enum reg_class *) xmalloc (max_regno * sizeof (enum reg_class));
        for (i = 0; i < max_regno; i++)
--- 6629,6635 ----
  	/* We need info about pseudos for rtl dumps about pseudo
  	   classes and costs.  */
  	regstat_init_n_sets_and_refs ();
!       ira_set_pseudo_classes (true, sched_verbose ? sched_dump : NULL);
        sched_regno_pressure_class
  	= (enum reg_class *) xmalloc (max_regno * sizeof (enum reg_class));
        for (i = 0; i < max_regno; i++)
diff -x .svn -rpN wc1/gcc/ira.c wc2/gcc/ira.c
*** wc1/gcc/ira.c	2012-10-12 15:13:41.687618514 +0800
--- wc2/gcc/ira.c	2012-10-16 15:19:35.246119338 +0800
*************** ira (FILE *f)
*** 4183,4189 ****
    crtl->is_leaf = leaf_function_p ();
  
    if (resize_reg_info () && flag_ira_loop_pressure)
!     ira_set_pseudo_classes (ira_dump_file);
  
    rebuild_p = update_equiv_regs ();
  
--- 4183,4189 ----
    crtl->is_leaf = leaf_function_p ();
  
    if (resize_reg_info () && flag_ira_loop_pressure)
!     ira_set_pseudo_classes (true, ira_dump_file);
  
    rebuild_p = update_equiv_regs ();
  
diff -x .svn -rpN wc1/gcc/ira-costs.c wc2/gcc/ira-costs.c
*** wc1/gcc/ira-costs.c	2012-10-12 15:13:41.694110750 +0800
--- wc2/gcc/ira-costs.c	2012-10-16 15:19:35.246119338 +0800
*************** ira_costs (void)
*** 2048,2056 ****
    ira_free (total_allocno_costs);
  }
  
! /* Entry function which defines classes for pseudos.  */
  void
! ira_set_pseudo_classes (FILE *dump_file)
  {
    allocno_p = false;
    internal_flag_ira_verbose = flag_ira_verbose;
--- 2048,2057 ----
    ira_free (total_allocno_costs);
  }
  
! /* Entry function which defines classes for pseudos.
!    Set pseudo_classes_defined_p only if DEFINE_PSEUDO_CLASSES is true.  */
  void
! ira_set_pseudo_classes (bool define_pseudo_classes, FILE *dump_file)
  {
    allocno_p = false;
    internal_flag_ira_verbose = flag_ira_verbose;
*************** ira_set_pseudo_classes (FILE *dump_file)
*** 2059,2065 ****
    initiate_regno_cost_classes ();
    find_costs_and_classes (dump_file);
    finish_regno_cost_classes ();
!   pseudo_classes_defined_p = true;
    finish_costs ();
  }
  
--- 2060,2068 ----
    initiate_regno_cost_classes ();
    find_costs_and_classes (dump_file);
    finish_regno_cost_classes ();
!   if (define_pseudo_classes)
!     pseudo_classes_defined_p = true;
! 
    finish_costs ();
  }
  
diff -x .svn -rpN wc1/gcc/ira.h wc2/gcc/ira.h
*** wc1/gcc/ira.h	2012-10-12 15:13:41.691617905 +0800
--- wc2/gcc/ira.h	2012-10-16 15:19:35.246119338 +0800
***************
*** 1,6 ****
  /* Communication between the Integrated Register Allocator (IRA) and
     the rest of the compiler.
!    Copyright (C) 2006, 2007, 2008, 2009, 2010
     Free Software Foundation, Inc.
     Contributed by Vladimir Makarov <vmakarov@redhat.com>.
  
--- 1,6 ----
  /* Communication between the Integrated Register Allocator (IRA) and
     the rest of the compiler.
!    Copyright (C) 2006, 2007, 2008, 2009, 2010, 2012
     Free Software Foundation, Inc.
     Contributed by Vladimir Makarov <vmakarov@redhat.com>.
  
*************** extern void ira_init (void);
*** 131,137 ****
  extern void ira_finish_once (void);
  extern void ira_setup_eliminable_regset (void);
  extern rtx ira_eliminate_regs (rtx, enum machine_mode);
! extern void ira_set_pseudo_classes (FILE *);
  extern void ira_implicitly_set_insn_hard_regs (HARD_REG_SET *);
  
  extern void ira_sort_regnos_for_alter_reg (int *, int, unsigned int *);
--- 131,137 ----
  extern void ira_finish_once (void);
  extern void ira_setup_eliminable_regset (void);
  extern rtx ira_eliminate_regs (rtx, enum machine_mode);
! extern void ira_set_pseudo_classes (bool, FILE *);
  extern void ira_implicitly_set_insn_hard_regs (HARD_REG_SET *);
  
  extern void ira_sort_regnos_for_alter_reg (int *, int, unsigned int *);
diff -x .svn -rpN wc1/gcc/loop-invariant.c wc2/gcc/loop-invariant.c
*** wc1/gcc/loop-invariant.c	2012-10-12 15:13:41.674117845 +0800
--- wc2/gcc/loop-invariant.c	2012-10-16 15:19:35.246119338 +0800
***************
*** 1,5 ****
  /* RTL-level loop invariant motion.
!    Copyright (C) 2004, 2005, 2006, 2007, 2008, 2009, 2010
     Free Software Foundation, Inc.
  
  This file is part of GCC.
--- 1,5 ----
  /* RTL-level loop invariant motion.
!    Copyright (C) 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2012
     Free Software Foundation, Inc.
  
  This file is part of GCC.
*************** move_loop_invariants (void)
*** 1915,1921 ****
      {
        df_analyze ();
        regstat_init_n_sets_and_refs ();
!       ira_set_pseudo_classes (dump_file);
        calculate_loop_reg_pressure ();
        regstat_free_n_sets_and_refs ();
      }
--- 1915,1921 ----
      {
        df_analyze ();
        regstat_init_n_sets_and_refs ();
!       ira_set_pseudo_classes (true, dump_file);
        calculate_loop_reg_pressure ();
        regstat_free_n_sets_and_refs ();
      }
diff -x .svn -rpN wc1/gcc/regmove.c wc2/gcc/regmove.c
*** wc1/gcc/regmove.c	2012-10-12 15:13:41.659610135 +0800
--- wc2/gcc/regmove.c	2012-10-16 15:19:35.246119338 +0800
***************
*** 1,6 ****
  /* Move registers around to reduce number of move instructions needed.
     Copyright (C) 1987, 1988, 1989, 1992, 1993, 1994, 1995, 1996, 1997, 1998,
!    1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010
     Free Software Foundation, Inc.
  
  This file is part of GCC.
--- 1,7 ----
  /* Move registers around to reduce number of move instructions needed.
     Copyright (C) 1987, 1988, 1989, 1992, 1993, 1994, 1995, 1996, 1997, 1998,
!    1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010,
!    2012
     Free Software Foundation, Inc.
  
  This file is part of GCC.
*************** regmove_optimize (void)
*** 1237,1243 ****
    regstat_compute_ri ();
  
    if (flag_ira_loop_pressure)
!     ira_set_pseudo_classes (dump_file);
  
    regno_src_regno = XNEWVEC (int, nregs);
    for (i = nregs; --i >= 0; )
--- 1238,1244 ----
    regstat_compute_ri ();
  
    if (flag_ira_loop_pressure)
!     ira_set_pseudo_classes (true, dump_file);
  
    regno_src_regno = XNEWVEC (int, nregs);
    for (i = nregs; --i >= 0; )
diff -x .svn -rpN wc1/gcc/testsuite/gcc.dg/hoist-register-pressure.c wc2/gcc/testsuite/gcc.dg/hoist-register-pressure.c
*** wc1/gcc/testsuite/gcc.dg/hoist-register-pressure.c	1970-01-01 08:00:00.000000000 +0800
--- wc2/gcc/testsuite/gcc.dg/hoist-register-pressure.c	2012-10-16 15:19:35.246119338 +0800
***************
*** 0 ****
--- 1,31 ----
+ /* { dg-options "-Os -fdump-rtl-hoist" }  */
+ /* { dg-final { scan-rtl-dump "PRE/HOIST: end of bb .* copying expression" "hoist" } } */
+ 
+ #define BUF 100
+ int a[BUF];
+ 
+ void com (int);
+ void bar (int);
+ 
+ int foo (int x, int y, int z)
+ {
+   /* "x+y" won't be hoisted if "-fira-hoist-pressure" is disabled,
+      because its rtx_cost is too small.  */
+   if (z)
+     {
+       a[1] = a[0] + a[2];
+       a[2] = a[1] + a[3];
+       a[3] = a[2] + a[4];
+       a[4] = a[3] + a[5];
+       a[5] = a[4] + a[6];
+       a[6] = a[5] + a[7];
+       a[7] = a[6] + a[8];
+       com (x+y);
+     }
+   else
+     {
+       bar (x+y);
+     }
+ 
+   return 0;
+ }

[-- Attachment #3: hoist-reg-pressure-20121016-001.txt --]
[-- Type: text/plain, Size: 2337 bytes --]

Index: gcc/gcse.c
===================================================================
--- gcc/gcse.c	(revision 192194)
+++ gcc/gcse.c	(working copy)
@@ -1,6 +1,6 @@
 /* Partial redundancy elimination / Hoisting for RTL.
    Copyright (C) 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
-   2006, 2007, 2008, 2009, 2010, 2011 Free Software Foundation, Inc.
+   2006, 2007, 2008, 2009, 2010, 2011, 2012 Free Software Foundation, Inc.
 
 This file is part of GCC.
 
@@ -460,7 +460,7 @@ static void alloc_code_hoist_mem (int, int);
 static void free_code_hoist_mem (void);
 static void compute_code_hoist_vbeinout (void);
 static void compute_code_hoist_data (void);
-static int hoist_expr_reaches_here_p (basic_block, int, basic_block, char *,
+static int hoist_expr_reaches_here_p (basic_block, int, basic_block, sbitmap,
 				      int, int *);
 static int hoist_code (void);
 static int one_code_hoisting_pass (void);
@@ -2842,7 +2842,7 @@ compute_code_hoist_data (void)
 
 static int
 hoist_expr_reaches_here_p (basic_block expr_bb, int expr_index, basic_block bb,
-			   char *visited, int distance, int *bb_size)
+			   sbitmap visited, int distance, int *bb_size)
 {
   edge pred;
   edge_iterator ei;
@@ -2863,7 +2863,8 @@ hoist_expr_reaches_here_p (basic_block expr_bb, in
   if (visited == NULL)
     {
       visited_allocated_locally = 1;
-      visited = XCNEWVEC (char, last_basic_block);
+      visited = sbitmap_alloc (last_basic_block);
+      sbitmap_zero (visited);
     }
 
   FOR_EACH_EDGE (pred, ei, bb->preds)
@@ -2874,7 +2875,7 @@ hoist_expr_reaches_here_p (basic_block expr_bb, in
 	break;
       else if (pred_bb == expr_bb)
 	continue;
-      else if (visited[pred_bb->index])
+      else if (TEST_BIT (visited, pred_bb->index))
 	continue;
 
       else if (! TEST_BIT (transp[pred_bb->index], expr_index))
@@ -2883,14 +2884,14 @@ hoist_expr_reaches_here_p (basic_block expr_bb, in
       /* Not killed.  */
       else
 	{
-	  visited[pred_bb->index] = 1;
+	  SET_BIT (visited, pred_bb->index);
 	  if (! hoist_expr_reaches_here_p (expr_bb, expr_index, pred_bb,
 					   visited, distance, bb_size))
 	    break;
 	}
     }
   if (visited_allocated_locally)
-    free (visited);
+    sbitmap_free (visited);
 
   return (pred == NULL);
 }

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFA] Implement register pressure directed hoist pass
  2012-10-16  8:50             ` Bin Cheng
@ 2012-10-16 17:08               ` Jeff Law
  2012-10-19  6:54                 ` Bin Cheng
  0 siblings, 1 reply; 16+ messages in thread
From: Jeff Law @ 2012-10-16 17:08 UTC (permalink / raw)
  To: Bin Cheng; +Cc: 'Steven Bosscher', gcc-patches

On 10/16/2012 01:44 AM, Bin Cheng wrote:
> Hi Steven, Jeff,
> I found a flaw in original patch, which results in inaccurate register
> pressure.
> Here comes the updated patches, improving code size a little bit on
> Thumb2/powerpc comparing to original patches.
> Please review.
>
> Thanks
>
> 2012-10-16  Bin Cheng<bin.cheng@arm.com>
>
> 	* gcse.c: Update copyright dates.
> 	(hoist_expr_reaches_here_p): Change parameter type from char *
> 	to sbitmap.
>
> 2012-10-16  Bin Cheng<bin.cheng@arm.com>
>
> 	* common.opt (flag_ira_hoist_pressure): New.
> 	* doc/invoke.texi (-fira-hoist-pressure): Describe.
> 	* ira-costs.c (ira_set_pseudo_classes): New parameter.
> 	* ira.h: Update copyright dates.
> 	(ira_set_pseudo_classes): Update prototype.
> 	* haifa-sched.c (sched_init): Update call.
> 	* ira.c (ira): Update call.
> 	* regmove.c: Update copyright dates.
> 	(regmove_optimize): Update call.
> 	* loop-invariant.c: Update copyright dates.
> 	(move_loop_invariants): Update call.
> 	* gcse.c: (struct bb_data): New structure.
> 	(BB_DATA): New macro.
> 	(curr_bb, curr_reg_pressure): New static variables.
> 	(should_hoist_expr_to_dom): Rename from hoist_expr_reaches_here_p.
> 	Change parameter expr_index to expr.
> 	New parameters pressure_class, nregs and hoisted_bbs.
> 	Use reg pressure to determine the distance expr can be hoisted.
> 	(hoist_code): Use reg pressure to direct the hoist process.
> 	(get_regno_pressure_class, get_pressure_class_and_nregs)
> 	(change_pressure, calculate_bb_reg_pressure): New.
> 	(one_code_hoisting_pass): Calculate register pressure. Allocate
> 	and free data.



> +
> + /* Currently register pressure for each pressure class.  */
> + static int curr_reg_pressure[N_REG_CLASSES];
"Currently" -> "Current"

OK for the mainline sources.

Thanks for your patience,
Jeff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH RFA] Implement register pressure directed hoist pass
  2012-10-16 17:08               ` Jeff Law
@ 2012-10-19  6:54                 ` Bin Cheng
  0 siblings, 0 replies; 16+ messages in thread
From: Bin Cheng @ 2012-10-19  6:54 UTC (permalink / raw)
  To: 'Jeff Law'; +Cc: 'Steven Bosscher', gcc-patches



> -----Original Message-----
> From: Jeff Law [mailto:law@redhat.com]
> Sent: Wednesday, October 17, 2012 1:02 AM
> To: Bin Cheng
> Cc: 'Steven Bosscher'; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH RFA] Implement register pressure directed hoist pass
> 
> On 10/16/2012 01:44 AM, Bin Cheng wrote:
> > Hi Steven, Jeff,
> > I found a flaw in original patch, which results in inaccurate register
> > pressure.
> > Here comes the updated patches, improving code size a little bit on
> > Thumb2/powerpc comparing to original patches.
> > Please review.
> >
> > Thanks
> >
> > 2012-10-16  Bin Cheng<bin.cheng@arm.com>
> >
> > 	* gcse.c: Update copyright dates.
> > 	(hoist_expr_reaches_here_p): Change parameter type from char *
> > 	to sbitmap.
> >
> > 2012-10-16  Bin Cheng<bin.cheng@arm.com>
> >
> > 	* common.opt (flag_ira_hoist_pressure): New.
> > 	* doc/invoke.texi (-fira-hoist-pressure): Describe.
> > 	* ira-costs.c (ira_set_pseudo_classes): New parameter.
> > 	* ira.h: Update copyright dates.
> > 	(ira_set_pseudo_classes): Update prototype.
> > 	* haifa-sched.c (sched_init): Update call.
> > 	* ira.c (ira): Update call.
> > 	* regmove.c: Update copyright dates.
> > 	(regmove_optimize): Update call.
> > 	* loop-invariant.c: Update copyright dates.
> > 	(move_loop_invariants): Update call.
> > 	* gcse.c: (struct bb_data): New structure.
> > 	(BB_DATA): New macro.
> > 	(curr_bb, curr_reg_pressure): New static variables.
> > 	(should_hoist_expr_to_dom): Rename from hoist_expr_reaches_here_p.
> > 	Change parameter expr_index to expr.
> > 	New parameters pressure_class, nregs and hoisted_bbs.
> > 	Use reg pressure to determine the distance expr can be hoisted.
> > 	(hoist_code): Use reg pressure to direct the hoist process.
> > 	(get_regno_pressure_class, get_pressure_class_and_nregs)
> > 	(change_pressure, calculate_bb_reg_pressure): New.
> > 	(one_code_hoisting_pass): Calculate register pressure. Allocate
> > 	and free data.
> 
> 
> 
> > +
> > + /* Currently register pressure for each pressure class.  */ static
> > + int curr_reg_pressure[N_REG_CLASSES];
> "Currently" -> "Current"
> 
> OK for the mainline sources.
> 

Committed to trunk as r192603/r192604, after re-testing against recent trunk
codes.

Thanks.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH RFA] Implement register pressure directed hoist pass
@ 2012-09-28  8:21 Bin Cheng
  0 siblings, 0 replies; 16+ messages in thread
From: Bin Cheng @ 2012-09-28  8:21 UTC (permalink / raw)
  To: gcc-patches
  Cc: 'Eric Botcazou', 'Richard Sandiford', vmakarov

[-- Attachment #1: Type: text/plain, Size: 3164 bytes --]

Hi,

This patch implements register pressure directed hoist pass. Basically it
calculates register pressure for each basic block and use that information
to determine the hoist distance of each candidate expression. The register
pressure is calculated by re-using IRA utilities.

I measured the benefit on Thumb1/Thumb2/ARM/x86/MIPS instruction sets and
targets. For CSiBE, it improves code size by more than 0.1% on thumb1/ARM
instruction set; it improves code size nearly 0.2% on MIPS, all with very
small regressions. Since the hoist itself improves code size by only about
0.1% on Thumb1 instruction set, this is considerable improvement.
Unfortunately this patch has no obvious effect on Thumb2 and X86, so
currently I enabled it on Thumb1 when optimizing for size. Other targets can
take advantage of it as necessary after upstream.

Apart from the change in hoist pass, this patch also changes prototype of
function ira_set_pseudo_classes in IRA. This change is to make IRA
re-calculate cost information by itself, rather than re-using the info
calculated by hoist pass, because hoist is an early pass and the information
cannot be used directly in IRA. You can refer to
http://gcc.gnu.org/ml/gcc/2012-08/msg00299.html for some discussion.

I bootstrap gcc x86_64 on trunk r190769, since the head revision when I was
working on this fails bootstrap that time. The results are:
	bootstrap                        time(real/user/sys)
	trunk/Os                         122m9s/118m30s/19m48s
	patched/Os                       122m20s/118m19s/19m52s
	patched/Os/fira-hoist-pressure   120m47s/119m9s/19m38s
It seems the patch has no obvious slowdown on gcc compilation time. I also
measured miscellaneous binaries generated like cc1/cc1plus, the code size of
text section has been improved by only 0.05% on x86_64, not as obvious as
ARM/MIPS(0.1-0.2%).

I ran regression test on cortex-m3/cortex-m0/X86 with Os and everything was
fine.

Is it ok for upstream?

Thanks

2012-09-28  Bin Cheng  <bin.cheng@arm.com>

	* common.opt (flag_ira_hoist_pressure): New.
	* doc/invoke.texi (-fira-hoist-pressure): Describe.
	* ira-costs.c (ira_set_pseudo_classes): New parameter.
	* ira.h (ira_set_pseudo_classes): Update prototype.
	* haifa-sched.c (sched_init): Update call.
	* ira.c (ira): Update call.
	* regmove.c (regmove_optimize): Update call.
	* loop-invariant.c (move_loop_invariants): Update call.
	* gcse.c (struct bb_data): New structure.
	(BB_DATA): New macro.
	(curr_bb, curr_regs_live, curr_reg_pressure, regs_set, n_regs_set):
New
	static variables.
	(hoist_expr_reaches_here_p): Use reg pressure to determin the
distance
	expr can be hoisted.
	(hoist_code): Use reg pressure to direct the hoist process.
	(get_regno_pressure_class, get_pressure_class_and_nregs)
	(change_pressure, mark_regno_live, mark_regno_death, mark_reg_death)
	(mark_reg_store, mark_reg_clobber, calculate_bb_reg_pressure)
	(free_bb_data): New.
	(one_code_hoisting_pass): Calculate register pressure. Free data.
	* config/arm/arm.c (arm_option_override): Set
flag_ira_hoist_pressure
	on Thumb1 when optimizing for size.

[-- Attachment #2: hoist-reg-pressure-20120928.txt --]
[-- Type: text/plain, Size: 23985 bytes --]

Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	(revision 191816)
+++ gcc/doc/invoke.texi	(working copy)
@@ -370,7 +370,7 @@ Objective-C and Objective-C++ Dialects}.
 -finline-small-functions -fipa-cp -fipa-cp-clone @gol
 -fipa-pta -fipa-profile -fipa-pure-const -fipa-reference @gol
 -fira-algorithm=@var{algorithm} @gol
--fira-region=@var{region} @gol
+-fira-region=@var{region} -fira-hoist-pressure @gol
 -fira-loop-pressure -fno-ira-share-save-slots @gol
 -fno-ira-share-spill-slots -fira-verbose=@var{n} @gol
 -fivopts -fkeep-inline-functions -fkeep-static-consts @gol
@@ -6904,6 +6904,14 @@ This typically results in the smallest code size,
 
 @end table
 
+@item -fira-hoist-pressure
+@opindex fira-hoist-pressure
+Use IRA to evaluate register pressure in hoist pass for decisions to hoist
+expressions.  This option usually results in generation of smaller code on
+RISC machines, but it can slow the compiler down.
+
+This option is enabled at level @option{-Os} for some targets.
+
 @item -fira-loop-pressure
 @opindex fira-loop-pressure
 Use IRA to evaluate register pressure in loops for decisions to move
Index: gcc/haifa-sched.c
===================================================================
--- gcc/haifa-sched.c	(revision 191816)
+++ gcc/haifa-sched.c	(working copy)
@@ -6629,7 +6629,7 @@ sched_init (void)
 	/* We need info about pseudos for rtl dumps about pseudo
 	   classes and costs.  */
 	regstat_init_n_sets_and_refs ();
-      ira_set_pseudo_classes (sched_verbose ? sched_dump : NULL);
+      ira_set_pseudo_classes (true, sched_verbose ? sched_dump : NULL);
       sched_regno_pressure_class
 	= (enum reg_class *) xmalloc (max_regno * sizeof (enum reg_class));
       for (i = 0; i < max_regno; i++)
Index: gcc/regmove.c
===================================================================
--- gcc/regmove.c	(revision 191816)
+++ gcc/regmove.c	(working copy)
@@ -1237,7 +1237,7 @@ regmove_optimize (void)
   regstat_compute_ri ();
 
   if (flag_ira_loop_pressure)
-    ira_set_pseudo_classes (dump_file);
+    ira_set_pseudo_classes (true, dump_file);
 
   regno_src_regno = XNEWVEC (int, nregs);
   for (i = nregs; --i >= 0; )
Index: gcc/gcse.c
===================================================================
--- gcc/gcse.c	(revision 191816)
+++ gcc/gcse.c	(working copy)
@@ -20,9 +20,9 @@ along with GCC; see the file COPYING3.  If not see
 
 /* TODO
    - reordering of memory allocation and freeing to be more space efficient
-   - do rough calc of how many regs are needed in each block, and a rough
-     calc of how many regs are available in each class and use that to
-     throttle back the code in cases where RTX_COST is minimal.
+   - simulate register pressure change of each basic block accurately during
+     hoist process. But I doubt the benefit since most expressions hoisted
+     are constant or address, which usually won't reduce register pressure.
 */
 
 /* References searched while implementing this.
@@ -141,11 +141,12 @@ along with GCC; see the file COPYING3.  If not see
 #include "diagnostic-core.h"
 #include "toplev.h"
 
+#include "hard-reg-set.h"
 #include "rtl.h"
 #include "tree.h"
 #include "tm_p.h"
 #include "regs.h"
-#include "hard-reg-set.h"
+#include "ira.h"
 #include "flags.h"
 #include "insn-config.h"
 #include "recog.h"
@@ -412,6 +413,33 @@ static bool doing_code_hoisting_p = false;
 /* For available exprs */
 static sbitmap *ae_kill;
 \f
+/* Data stored for each basic block.  */
+struct bb_data
+{
+  /* Maximal register pressure inside basic block for given register class
+     (defined only for the pressure classes).  */
+  int max_reg_pressure[N_REG_CLASSES];
+};
+
+#define BB_DATA(bb) ((struct bb_data *) (bb)->aux)
+
+static basic_block curr_bb;
+
+/* Register currently living.  */
+static bitmap_head curr_regs_live;
+
+/* Currently register pressure for each pressure class.  */
+static int curr_reg_pressure[N_REG_CLASSES];
+
+/* Record all regs that are set in any one insn.  Communication from
+   mark_reg_{store,clobber} and global_conflicts.  Asm can refer to
+   all hard-registers.  */
+static rtx regs_set[(FIRST_PSEUDO_REGISTER > MAX_RECOG_OPERANDS
+		     ? FIRST_PSEUDO_REGISTER : MAX_RECOG_OPERANDS) * 2];
+/* Number of regs stored in the previous array.  */
+static int n_regs_set;
+\f
+
 static void compute_can_copy (void);
 static void *gmalloc (size_t) ATTRIBUTE_MALLOC;
 static void *gcalloc (size_t, size_t) ATTRIBUTE_MALLOC;
@@ -460,9 +488,11 @@ static void alloc_code_hoist_mem (int, int);
 static void free_code_hoist_mem (void);
 static void compute_code_hoist_vbeinout (void);
 static void compute_code_hoist_data (void);
-static int hoist_expr_reaches_here_p (basic_block, int, basic_block, char *,
-				      int, int *);
+static int hoist_expr_reaches_here_p (basic_block, struct expr*, basic_block,
+				      sbitmap, int, int *, enum reg_class,
+				      int *, bitmap_head *);
 static int hoist_code (void);
+static enum reg_class get_pressure_class_and_nregs (rtx insn, int *nregs);
 static int one_code_hoisting_pass (void);
 static rtx process_insert_insn (struct expr *);
 static int pre_edge_insert (struct edge_list *, struct expr **);
@@ -2825,11 +2855,16 @@ compute_code_hoist_data (void)
     fprintf (dump_file, "\n");
 }
 
-/* Determine if the expression identified by EXPR_INDEX would
-   reach BB unimpared if it was placed at the end of EXPR_BB.
-   Stop the search if the expression would need to be moved more
-   than DISTANCE instructions.
+/* Determine if the expression EXPR would reach BB unimpared if it was
+   placed at the end of EXPR_BB. Stop the search if the expression would
+   need to be moved more than DISTANCE instructions.
 
+   PRESSURE_CLASS and NREGS are register class and number of hard registers
+   for storing EXPR. 
+
+   HOISTED_BBS points to a bitmap indicating basic blocks through which
+   EXPR is hoisted.
+
    It's unclear exactly what Muchnick meant by "unimpared".  It seems
    to me that the expression must either be computed or transparent in
    *every* block in the path(s) from EXPR_BB to BB.  Any other definition
@@ -2841,18 +2876,29 @@ compute_code_hoist_data (void)
    paths.  */
 
 static int
-hoist_expr_reaches_here_p (basic_block expr_bb, int expr_index, basic_block bb,
-			   char *visited, int distance, int *bb_size)
+hoist_expr_reaches_here_p (basic_block expr_bb, struct expr *expr,
+			   basic_block bb, sbitmap visited, int distance,
+			   int *bb_size, enum reg_class pressure_class,
+			   int *nregs, bitmap_head *hoisted_bbs)
 {
+  unsigned int i;
   edge pred;
   edge_iterator ei;
+  sbitmap_iterator sbi;
   int visited_allocated_locally = 0;
 
   /* Terminate the search if distance, for which EXPR is allowed to move,
      is exhausted.  */
   if (distance > 0)
     {
-      distance -= bb_size[bb->index];
+      /* Only decrease distance if bb has high register pressure or EXPR
+	 is const expr, otherwise EXPR can be hoisted through bb without
+	 cost.  */
+      if (flag_ira_hoist_pressure != 1
+	  || (BB_DATA (bb)->max_reg_pressure[pressure_class]
+		>= ira_class_hard_regs_num[pressure_class]
+	      || CONST_INT_P (expr->expr)))
+	distance -= bb_size[bb->index];
 
       if (distance <= 0)
 	return 0;
@@ -2863,7 +2909,8 @@ static int
   if (visited == NULL)
     {
       visited_allocated_locally = 1;
-      visited = XCNEWVEC (char, last_basic_block);
+      visited = sbitmap_alloc (last_basic_block);
+      sbitmap_zero (visited);
     }
 
   FOR_EACH_EDGE (pred, ei, bb->preds)
@@ -2874,23 +2921,37 @@ static int
 	break;
       else if (pred_bb == expr_bb)
 	continue;
-      else if (visited[pred_bb->index])
+      else if (TEST_BIT (visited, pred_bb->index))
 	continue;
-
-      else if (! TEST_BIT (transp[pred_bb->index], expr_index))
+      else if (! TEST_BIT (transp[pred_bb->index], expr->bitmap_index))
 	break;
-
       /* Not killed.  */
       else
 	{
-	  visited[pred_bb->index] = 1;
-	  if (! hoist_expr_reaches_here_p (expr_bb, expr_index, pred_bb,
-					   visited, distance, bb_size))
+	  SET_BIT (visited, pred_bb->index);
+	  if (! hoist_expr_reaches_here_p (expr_bb, expr, pred_bb,
+					   visited, distance, bb_size,
+					   pressure_class, nregs, hoisted_bbs))
 	    break;
 	}
     }
   if (visited_allocated_locally)
-    free (visited);
+    {
+      /* If EXPR can be hoisted to expr_bb, record basic blocks through
+	 which EXPR is hoisted in hoisted_bbs. Also update register
+	 pressure for basic blocks newly added in hoisted_bbs.  */
+      if ((flag_ira_hoist_pressure == 1) && !pred)
+	{
+	  EXECUTE_IF_SET_IN_SBITMAP (visited, 0, i, sbi)
+	    if (!bitmap_bit_p (hoisted_bbs, i))
+	      {
+		bitmap_set_bit (hoisted_bbs, i);
+		BB_DATA (BASIC_BLOCK (i))->max_reg_pressure[pressure_class]
+		    += *nregs;
+	      }
+	}
+      sbitmap_free (visited);
+    }
 
   return (pred == NULL);
 }
@@ -2916,12 +2977,14 @@ hoist_code (void)
   VEC (basic_block, heap) *dom_tree_walk;
   unsigned int dom_tree_walk_index;
   VEC (basic_block, heap) *domby;
-  unsigned int i,j;
+  unsigned int i, j, k;
   struct expr **index_map;
   struct expr *expr;
   int *to_bb_head;
   int *bb_size;
   int changed = 0;
+  struct bb_data *data;
+  bitmap_iterator bi;
 
   /* Compute a mapping from expression number (`bitmap_index') to
      hash table entry.  */
@@ -2977,12 +3040,16 @@ hoist_code (void)
 	{
 	  if (TEST_BIT (hoist_vbeout[bb->index], i))
 	    {
+	      int nregs = 0;
+	      enum reg_class pressure_class = NO_REGS;
 	      /* Current expression.  */
 	      struct expr *expr = index_map[i];
 	      /* Number of occurrences of EXPR that can be hoisted to BB.  */
 	      int hoistable = 0;
 	      /* Basic blocks that have occurrences reachable from BB.  */
 	      bitmap_head _from_bbs, *from_bbs = &_from_bbs;
+	      /* Basic blocks through which expr is hoisted.  */
+	      bitmap_head _hoisted_bbs, *hoisted_bbs = &_hoisted_bbs;
 	      /* Occurrences reachable from BB.  */
 	      VEC (occr_t, heap) *occrs_to_hoist = NULL;
 	      /* We want to insert the expression into BB only once, so
@@ -2991,6 +3058,8 @@ hoist_code (void)
 	      occr_t occr;
 
 	      bitmap_initialize (from_bbs, 0);
+	      if (flag_ira_hoist_pressure == 1)
+		bitmap_initialize (hoisted_bbs, 0);
 
 	      /* If an expression is computed in BB and is available at end of
 		 BB, hoist all occurrences dominated by BB to BB.  */
@@ -3045,13 +3114,17 @@ hoist_code (void)
 		    max_distance += (bb_size[dominated->index]
 				     - to_bb_head[INSN_UID (occr->insn)]);
 
+		  pressure_class = get_pressure_class_and_nregs (occr->insn,
+								 &nregs);
+
 		  /* Note if the expression would reach the dominated block
 		     unimpared if it was placed at the end of BB.
 
 		     Keep track of how many times this expression is hoistable
 		     from a dominated block into BB.  */
-		  if (hoist_expr_reaches_here_p (bb, i, dominated, NULL,
-						 max_distance, bb_size))
+		  if (hoist_expr_reaches_here_p (bb, expr, dominated, NULL,
+						 max_distance, bb_size, pressure_class,
+						 &nregs, hoisted_bbs))
 		    {
 		      hoistable++;
 		      VEC_safe_push (occr_t, heap,
@@ -3072,6 +3145,13 @@ hoist_code (void)
 		 to nullify any benefit we get from code hoisting.  */
 	      if (hoistable > 1 && dbg_cnt (hoist_insn))
 		{
+		  /* Update register pressure for basic block to which expr
+		     is hoisted.  */
+		  if (flag_ira_hoist_pressure == 1)
+		    {
+		      data = BB_DATA (bb);
+		      data->max_reg_pressure[pressure_class] += nregs;
+		    }
 		  /* If (hoistable != VEC_length), then there is
 		     an occurrence of EXPR in BB itself.  Don't waste
 		     time looking for LCA in this case.  */
@@ -3089,8 +3169,20 @@ hoist_code (void)
 		    }
 		}
 	      else
+	      {
 		/* Punt, no point hoisting a single occurence.  */
 		VEC_free (occr_t, heap, occrs_to_hoist);
+		/* Restore register pressure of basic block recorded in
+ 		   hoisted_bbs when expr will not be hoisted.  */
+		if (flag_ira_hoist_pressure == 1)
+		  EXECUTE_IF_SET_IN_BITMAP (hoisted_bbs, 0, k, bi)
+		    {
+		      data = BB_DATA (BASIC_BLOCK (k));
+		      data->max_reg_pressure[pressure_class] -= nregs;
+		    }
+	      }
+	      if (flag_ira_hoist_pressure == 1)
+		bitmap_clear (hoisted_bbs);
 
 	      insn_inserted_p = 0;
 
@@ -3147,6 +3239,265 @@ hoist_code (void)
   return changed;
 }
 
+/* Return pressure class and number of needed hard registers (through
+   *NREGS) of register REGNO.  */
+static enum reg_class
+get_regno_pressure_class (int regno, int *nregs)
+{
+  if (regno >= FIRST_PSEUDO_REGISTER)
+    {
+      enum reg_class pressure_class;
+
+      pressure_class = reg_allocno_class (regno);
+      pressure_class = ira_pressure_class_translate[pressure_class];
+      *nregs
+	= ira_reg_class_max_nregs[pressure_class][PSEUDO_REGNO_MODE (regno)];
+      return pressure_class;
+    }
+  else if (! TEST_HARD_REG_BIT (ira_no_alloc_regs, regno)
+	   && ! TEST_HARD_REG_BIT (eliminable_regset, regno))
+    {
+      *nregs = 1;
+      return ira_pressure_class_translate[REGNO_REG_CLASS (regno)];
+    }
+  else
+    {
+      *nregs = 0;
+      return NO_REGS;
+    }
+}
+
+/* Return pressure class and number of hard registers (through *NREGS)
+   for destination of INSN. */
+static enum reg_class
+get_pressure_class_and_nregs (rtx insn, int *nregs)
+{
+  rtx reg;
+  enum reg_class pressure_class;
+  rtx set = single_set (insn);
+
+  /* Considered invariant insns have only one set.  */
+  gcc_assert (set != NULL_RTX);
+  reg = SET_DEST (set);
+  if (GET_CODE (reg) == SUBREG)
+    reg = SUBREG_REG (reg);
+  if (MEM_P (reg))
+    {
+      *nregs = 0;
+      pressure_class = NO_REGS;
+    }
+  else
+    {
+      if (! REG_P (reg))
+	reg = NULL_RTX;
+      if (reg == NULL_RTX)
+	pressure_class = GENERAL_REGS;
+      else
+	{
+	  pressure_class = reg_allocno_class (REGNO (reg));
+	  pressure_class = ira_pressure_class_translate[pressure_class];
+	}
+      *nregs
+	= ira_reg_class_max_nregs[pressure_class][GET_MODE (SET_SRC (set))];
+    }
+  return pressure_class;
+}
+
+/* Increase (if INCR_P) or decrease current register pressure for
+   register REGNO.  */
+static void
+change_pressure (int regno, bool incr_p)
+{
+  int nregs;
+  enum reg_class pressure_class;
+
+  pressure_class = get_regno_pressure_class (regno, &nregs);
+  if (! incr_p)
+    curr_reg_pressure[pressure_class] -= nregs;
+  else
+    {
+      curr_reg_pressure[pressure_class] += nregs;
+      if (BB_DATA (curr_bb)->max_reg_pressure[pressure_class]
+	  < curr_reg_pressure[pressure_class])
+	BB_DATA (curr_bb)->max_reg_pressure[pressure_class]
+	  = curr_reg_pressure[pressure_class];
+    }
+}
+
+/* Mark REGNO birth.  */
+static void
+mark_regno_live (int regno)
+{
+  if (!bitmap_set_bit (&curr_regs_live, regno))
+    return;
+  change_pressure (regno, true);
+}
+
+/* Mark REGNO death.  */
+static void
+mark_regno_death (int regno)
+{
+  if (! bitmap_clear_bit (&curr_regs_live, regno))
+    return;
+  change_pressure (regno, false);
+}
+
+/* Mark register REG death.  */
+static void
+mark_reg_death (rtx reg)
+{
+  int regno = REGNO (reg);
+
+  if (regno >= FIRST_PSEUDO_REGISTER)
+    mark_regno_death (regno);
+  else
+    {
+      int last = regno + hard_regno_nregs[regno][GET_MODE (reg)];
+
+      while (regno < last)
+	{
+	  mark_regno_death (regno);
+	  regno++;
+	}
+    }
+}
+
+/* Mark setting register REG.  */
+static void
+mark_reg_store (rtx reg, const_rtx setter ATTRIBUTE_UNUSED,
+		void *data ATTRIBUTE_UNUSED)
+{
+  int regno;
+
+  if (GET_CODE (reg) == SUBREG)
+    reg = SUBREG_REG (reg);
+
+  if (! REG_P (reg))
+    return;
+
+  regs_set[n_regs_set++] = reg;
+
+  regno = REGNO (reg);
+
+  if (regno >= FIRST_PSEUDO_REGISTER)
+    mark_regno_live (regno);
+  else
+    {
+      int last = regno + hard_regno_nregs[regno][GET_MODE (reg)];
+
+      while (regno < last)
+	{
+	  mark_regno_live (regno);
+	  regno++;
+	}
+    }
+}
+
+/* Mark clobbering register REG.  */
+static void
+mark_reg_clobber (rtx reg, const_rtx setter, void *data)
+{
+  if (GET_CODE (setter) == CLOBBER)
+    mark_reg_store (reg, setter, data);
+}
+
+/* Calculate register pressure of each basic block.  */
+static void
+calculate_bb_reg_pressure (void)
+{
+  int i;
+  unsigned int j;
+  bitmap_iterator bi;
+  basic_block bb;
+  rtx insn, link;
+
+  ira_setup_eliminable_regset ();
+  bitmap_initialize (&curr_regs_live, &reg_obstack);
+  FOR_EACH_BB (bb)
+    {
+      curr_bb = bb;
+      bb->aux = xcalloc (1, sizeof (struct bb_data));
+      bitmap_copy (&curr_regs_live, DF_LR_IN (bb));
+      for (i = 0; i < ira_pressure_classes_num; i++)
+	curr_reg_pressure[ira_pressure_classes[i]] = 0;
+      EXECUTE_IF_SET_IN_BITMAP (&curr_regs_live, 0, j, bi)
+	change_pressure (j, true);
+
+      FOR_BB_INSNS (bb, insn)
+	{
+	  if (! NONDEBUG_INSN_P (insn))
+	    continue;
+
+	  n_regs_set = 0;
+	  note_stores (PATTERN (insn), mark_reg_clobber, NULL);
+
+	  /* Mark any registers dead after INSN as dead now.  */
+
+	  for (link = REG_NOTES (insn); link; link = XEXP (link, 1))
+	    if (REG_NOTE_KIND (link) == REG_DEAD)
+	      mark_reg_death (XEXP (link, 0));
+
+	  /* Mark any registers set in INSN as live,
+	     and mark them as conflicting with all other live regs.
+	     Clobbers are processed again, so they conflict with
+	     the registers that are set.  */
+
+	  note_stores (PATTERN (insn), mark_reg_store, NULL);
+
+#ifdef AUTO_INC_DEC
+	  for (link = REG_NOTES (insn); link; link = XEXP (link, 1))
+	    if (REG_NOTE_KIND (link) == REG_INC)
+	      mark_reg_store (XEXP (link, 0), NULL_RTX, NULL);
+#endif
+	  while (n_regs_set-- > 0)
+	    {
+	      rtx note = find_regno_note (insn, REG_UNUSED,
+					  REGNO (regs_set[n_regs_set]));
+	      if (! note)
+		continue;
+
+	      mark_reg_death (XEXP (note, 0));
+	    }
+	}
+    }
+  bitmap_clear (&curr_regs_live);
+
+  if (dump_file == NULL)
+    return;
+  FOR_EACH_BB (bb)
+    {
+      fprintf (dump_file, "\n  Basic block %d Pressure: \n", bb->index);
+      for (i = 0; (int) i < ira_pressure_classes_num; i++)
+	{
+	  enum reg_class pressure_class;
+
+	  pressure_class = ira_pressure_classes[i];
+	  if (BB_DATA (bb)->max_reg_pressure[pressure_class] == 0)
+	    continue;
+	  fprintf (dump_file, " %s=%d", reg_class_names[pressure_class],
+		   BB_DATA (bb)->max_reg_pressure[pressure_class]);
+	}
+      fprintf (dump_file, "\n");
+    }
+}
+
+/* Free struct bb_data in each basic block.  */
+static void
+free_bb_data (void)
+{
+  basic_block bb;
+
+  FOR_EACH_BB (bb)
+    {
+      struct bb_data *data = BB_DATA (bb);
+      if (!data)
+	continue;
+
+      free (data);
+      bb->aux = NULL;
+    }
+}
+
 /* Top level routine to perform one code hoisting (aka unification) pass
 
    Return nonzero if a change was made.  */
@@ -3166,6 +3517,15 @@ one_code_hoisting_pass (void)
 
   doing_code_hoisting_p = true;
 
+  /* Calculate register pressure for each basic block.  */
+  if (flag_ira_hoist_pressure == 1)
+    {
+      regstat_init_n_sets_and_refs ();
+      ira_set_pseudo_classes (false, dump_file);
+      calculate_bb_reg_pressure ();
+      regstat_free_n_sets_and_refs ();
+    }
+
   /* We need alias.  */
   init_alias_analysis ();
 
@@ -3186,6 +3546,11 @@ one_code_hoisting_pass (void)
       free_code_hoist_mem ();
     }
 
+  if (flag_ira_hoist_pressure == 1)
+    {
+      free_bb_data ();
+      free_reg_info ();
+    }
   free_hash_table (&expr_hash_table);
   free_gcse_mem ();
   obstack_free (&gcse_obstack, NULL);
Index: gcc/loop-invariant.c
===================================================================
--- gcc/loop-invariant.c	(revision 191816)
+++ gcc/loop-invariant.c	(working copy)
@@ -1915,7 +1915,7 @@ move_loop_invariants (void)
     {
       df_analyze ();
       regstat_init_n_sets_and_refs ();
-      ira_set_pseudo_classes (dump_file);
+      ira_set_pseudo_classes (true, dump_file);
       calculate_loop_reg_pressure ();
       regstat_free_n_sets_and_refs ();
     }
Index: gcc/common.opt
===================================================================
--- gcc/common.opt	(revision 191816)
+++ gcc/common.opt	(working copy)
@@ -1395,6 +1395,11 @@ Enum(ira_region) String(all) Value(IRA_REGION_ALL)
 EnumValue
 Enum(ira_region) String(mixed) Value(IRA_REGION_MIXED)
 
+fira-hoist-pressure
+Common Report Var(flag_ira_hoist_pressure) Init(-1)
+Use IRA based register pressure calculation
+in hoist optimizations.
+
 fira-loop-pressure
 Common Report Var(flag_ira_loop_pressure)
 Use IRA based register pressure calculation
Index: gcc/ira.c
===================================================================
--- gcc/ira.c	(revision 191816)
+++ gcc/ira.c	(working copy)
@@ -4164,7 +4164,7 @@ ira (FILE *f)
   crtl->is_leaf = leaf_function_p ();
 
   if (resize_reg_info () && flag_ira_loop_pressure)
-    ira_set_pseudo_classes (ira_dump_file);
+    ira_set_pseudo_classes (true, ira_dump_file);
 
   rebuild_p = update_equiv_regs ();
 
Index: gcc/ira.h
===================================================================
--- gcc/ira.h	(revision 191816)
+++ gcc/ira.h	(working copy)
@@ -125,7 +125,7 @@ extern void ira_init (void);
 extern void ira_finish_once (void);
 extern void ira_setup_eliminable_regset (void);
 extern rtx ira_eliminate_regs (rtx, enum machine_mode);
-extern void ira_set_pseudo_classes (FILE *);
+extern void ira_set_pseudo_classes (bool, FILE *);
 extern void ira_implicitly_set_insn_hard_regs (HARD_REG_SET *);
 
 extern void ira_sort_regnos_for_alter_reg (int *, int, unsigned int *);
Index: gcc/ira-costs.c
===================================================================
--- gcc/ira-costs.c	(revision 191816)
+++ gcc/ira-costs.c	(working copy)
@@ -2048,9 +2048,10 @@ ira_costs (void)
   ira_free (total_allocno_costs);
 }
 
-/* Entry function which defines classes for pseudos.  */
+/* Entry function which defines classes for pseudos.
+   Set pseudo_classes_defined_p only if DEFINE_PSEUDO_CLASSES is true.  */
 void
-ira_set_pseudo_classes (FILE *dump_file)
+ira_set_pseudo_classes (bool define_pseudo_classes, FILE *dump_file)
 {
   allocno_p = false;
   internal_flag_ira_verbose = flag_ira_verbose;
@@ -2059,7 +2060,9 @@ void
   initiate_regno_cost_classes ();
   find_costs_and_classes (dump_file);
   finish_regno_cost_classes ();
-  pseudo_classes_defined_p = true;
+  if (define_pseudo_classes)
+    pseudo_classes_defined_p = true;
+
   finish_costs ();
 }
 
Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c	(revision 191816)
+++ gcc/config/arm/arm.c	(working copy)
@@ -2021,6 +2021,11 @@ arm_option_override (void)
       && current_tune->num_prefetch_slots > 0)
     flag_prefetch_loop_arrays = 1;
 
+  /* Enable register pressure hoist when optimizing for size on Thumb1 set.  */
+  if (TARGET_THUMB1 && optimize_function_for_size_p (cfun)
+      && flag_ira_hoist_pressure == -1)
+    flag_ira_hoist_pressure = 1;
+
   /* Set up parameters to be used in prefetching algorithm.  Do not override the
      defaults unless we are tuning for a core we have researched values for.  */
   if (current_tune->num_prefetch_slots > 0)

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2012-10-19  6:13 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <50655073.e54c420a.651e.ffffac0fSMTPIN_ADDED@mx.google.com>
2012-09-28  9:24 ` [PATCH RFA] Implement register pressure directed hoist pass Steven Bosscher
2012-09-28 11:05   ` Bin Cheng
2012-09-28 11:26   ` Pedro Alves
2012-09-29 11:54   ` Bin Cheng
2012-10-02 12:58     ` Jeff Law
2012-10-08  3:53       ` Bin Cheng
2012-10-11  7:10       ` Bin Cheng
     [not found]       ` <50766d69.a853420a.2475.fffffbafSMTPIN_ADDED@mx.google.com>
2012-10-11 23:19         ` Steven Bosscher
2012-10-12  3:34           ` Bin Cheng
2012-10-12  8:20           ` Bin Cheng
2012-10-16  8:50             ` Bin Cheng
2012-10-16 17:08               ` Jeff Law
2012-10-19  6:54                 ` Bin Cheng
     [not found]   ` <50669835.e988440a.4421.ffffa17cSMTPIN_ADDED@mx.google.com>
2012-10-01  9:40     ` Steven Bosscher
2012-10-08  2:59       ` Bin Cheng
2012-09-28  8:21 Bin Cheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).