Possible IRA improvements for irregular register architectures

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

From: "Ian Bolton" <bolton@IceraSemi.com>
To: <gcc@gcc.gnu.org>
Subject: Possible IRA improvements for irregular register architectures
Date: Fri, 18 Dec 2009 15:33:00 -0000	[thread overview]
Message-ID: <4D60B0700D1DB54A8C0C6E9BE69163700CC3B6D4@EXCHANGEVS.IceraSemi.local> (raw)
In-Reply-To: <4D60B0700D1DB54A8C0C6E9BE69163700CB4E8EE@EXCHANGEVS.IceraSemi.local>

[-- Attachment #1: Type: text/plain, Size: 2820 bytes --]

Let's assume I have two sub-classes of ALL_REGS: BOTTOM_REGS (c0-c15)
and TOP_CREGS (c16-c31).

Let's also assume I have two main types of instruction: A-type
Instructions, which can use ALL 32 registers, and B-type Instructions,
which can only use the 16 BOTTOM_REGS.

IRA will correctly calculate costs (in ira-costs.c) for allocnos
appearing in B-type instructions, such that TOP_CREGS has a higher
cost than BOTTOM_REGS.  It will also calculate costs for the A-type
instructions such that TOP_CREGS and BOTTOM_REGS have the same cost.

The order of coloring will be determined by the algorithm chosen:
Priority or Chaitin-Briggs.  As each allocno is colored, the costs
will be inspected and the best available hard register will be chosen,
mainly based on the register class costs mentioned above, so allocnos
in B-type Instructions will usually be assigned a BOTTOM_REG if one is
free.  If two or more hard registers share the same cost then
whichever one appears first in the REG_ALLOC_ORDER will be assigned.
(If no hard register can be found, the allocno is assigned memory and
will require a "reload" in a later pass to get a hard register.)

I do not wish to alter the coloring algorithms or the coloring order.
I believe they are good at determing the order to color allocnos,
which dictates the likelihood of being assigned a hard register.  What
I wish to change is the hard register that is assigned, given that the
coloring order has determined that this allocno should get one next.

Why do I need to do this?  Since the BOTTOM_REGS can be used by all
instructions, it makes sense to put them first in the REG_ALLOC_ORDER,
so we minimise the number of registers consumed by a low-pressure
function.  But it also makes sense, in high-pressure functions, to
steer A-type Instructions away from using BOTTOM_REGS so that they are
free for B-type Instructions to use.

To achieve this, I tried altering the costs calculated in ira-costs.c,
either explicitly with various hacks or by altering operand
constraints.  The problem with this approach was that it is static and
independent, occurring before any coloring order has been determined
and without any awareness of the needs of other allocnos.  I believe I
require a dynamic way to alter the costs, based on which allocnos
conflict with the allocno currently being colored and which hard
registers are still available at this point.

The patch I have attached here is my first reasonable successful
attempt at this dynamic approach, which has led to performance
improvements on some of our benchmarks and no significant
regressions.

I am hoping it will be useful to others, but I post it more as a
talking point or perhaps to inspire others to come up with better
solutions and share them with me :-)

[-- Attachment #2: bolton_ira_subset_experiments.diff --]
[-- Type: application/octet-stream, Size: 11903 bytes --]

Index: toplev.c
===================================================================
--- toplev.c	(revision 155343)
+++ toplev.c	(working copy)
@@ -287,7 +287,7 @@
 /* Set the default region and algorithm for the integrated register
    allocator.  */
 
-enum ira_algorithm flag_ira_algorithm = IRA_ALGORITHM_CB;
+enum ira_algorithm flag_ira_algorithm = IRA_ALGORITHM_PRIORITY;
 enum ira_region flag_ira_region = IRA_REGION_MIXED;
 
 /* Set the default value for -fira-verbose.  */
Index: ira-int.h
===================================================================
--- ira-int.h	(revision 155343)
+++ ira-int.h	(working copy)
@@ -418,6 +418,9 @@
   ira_allocno_t prev_bucket_allocno;
   /* Used for temporary purposes.  */
   int temp;
+  /* make note that this allocno can only use a subset of ALL_REGS.
+     e.g. BOTTOM_REGS */
+  int requires_register_subset;
 };
 
 /* All members of the allocno structures should be accessed only
@@ -481,6 +484,7 @@
 #define ALLOCNO_MIN(A) ((A)->min)
 #define ALLOCNO_MAX(A) ((A)->max)
 #define ALLOCNO_CONFLICT_ID(A) ((A)->conflict_id)
+#define ALLOCNO_REQUIRES_REGISTER_SUBSET(A) ((A)->requires_register_subset)
 
 /* Map regno -> allocnos with given regno (see comments for
    allocno member `next_regno_allocno').  */
Index: ira-color.c
===================================================================
--- ira-color.c	(revision 155343)
+++ ira-color.c	(working copy)
@@ -452,6 +452,7 @@
 #ifdef STACK_REGS
   bool no_stack_reg_p;
 #endif
+  int leave_bottom = 0;
 
   ira_assert (! ALLOCNO_ASSIGNED_P (allocno));
   cover_class = ALLOCNO_COVER_CLASS (allocno);
@@ -502,6 +503,20 @@
 				     ALLOCNO_NUM (conflict_allocno)))
 	  {
 	    conflict_cover_class = ALLOCNO_COVER_CLASS (conflict_allocno);
+
+            /* If we don't need the register subset, but a conflicting allocno does,
+               we should leave the bottom regs, to ensure that allocno gets what it needs
+               No change is made in reload pass, however. */
+            if (!retry_p &&
+                !ALLOCNO_ASSIGNED_P (conflict_allocno) &&
+                 (ALLOCNO_REQUIRES_REGISTER_SUBSET (a) == 0) && 
+                 (ALLOCNO_REQUIRES_REGISTER_SUBSET (conflict_allocno)))
+              { 
+                /* WE DON'T WANT BOTTOM, BUT THEY DO ...
+                   .... SO INCREASE THE COSTS FOR BOTTOM_REGS SO WE ERR TOWARDS TOP_REGS */
+                leave_bottom += 1;
+              }
+       
 	    ira_assert (ira_reg_classes_intersect_p
 			[cover_class][conflict_cover_class]);
 	    if (allocno_coalesced_p)
@@ -600,6 +615,27 @@
 	}
       if (min_cost > cost)
 	min_cost = cost;
+
+      /* Nudge the cost of this reg up because someone else wants this one.
+
+         Note: if we only penalised BOTTOM_REGS then we would skip over c2, c3, c4, c5, c6
+               and go straight for c16 because it is cheaper than currently non-allocated
+               callee-save TOP_CREGS.  By penalising c16 too, the cost for c2, c3, c4, c5,
+               c6 and c16 will all be the same AND lower than any of the currently
+               unallocated TOP_CREGS, so we will take the lowest number (c2).  This is good
+               for limiting the registers we use up in low-pressure cases. */
+      if (leave_bottom)
+        {
+          /* Always add 1 to full_cost of BOTTOM_REGS
+             Only add 1 to full_cost of c16 when c17 (the first callee-save TOP_CREG) has
+             not yet been allocated. This effectively means anyone can get c1-c6 until we
+             grab our first callee-save TOP_CREG; after that, we try to reserve c1-c6 for
+             instructions that want them specifically by encouraging use of c16, c17 first. */
+          if (hard_regno < 16 ||
+              ((hard_regno == 16) && (!allocated_hardreg_p[17])))
+            full_cost += 1;
+        }
+
       if (min_full_cost > full_cost)
 	{
 	  min_full_cost = full_cost;
@@ -2853,14 +2889,42 @@
 ira_reassign_pseudos (int *spilled_pseudo_regs, int num,
 		      HARD_REG_SET bad_spill_regs,
 		      HARD_REG_SET *pseudo_forbidden_regs,
-		      HARD_REG_SET *pseudo_previous_regs,  bitmap spilled)
+		      HARD_REG_SET *pseudo_previous_regs,
+		      bitmap spilled)
 {
   int i, m, n, regno;
   bool changed_p;
   ira_allocno_t a, conflict_a;
   HARD_REG_SET forbidden_regs;
   ira_allocno_conflict_iterator aci;
+  bitmap temp = BITMAP_ALLOC (NULL);
 
+  /* Add pseudos which conflict with pseudos already in
+     SPILLED_PSEUDO_REGS to SPILLED_PSEUDO_REGS.  This is preferable
+     to allocating in two steps as some of the conflicts might have
+     a higher priority than the pseudos passed in SPILLED_PSEUDO_REGS.  */
+  for (i = 0; i < num; i++)
+    bitmap_set_bit (temp, spilled_pseudo_regs[i]);
+
+  for (i = 0, n = num; i < n; i++)
+    {
+      int regno = spilled_pseudo_regs[i];
+      bitmap_set_bit (temp, regno);
+
+      a = ira_regno_allocno_map[regno];
+      FOR_EACH_ALLOCNO_CONFLICT (a, conflict_a, aci)
+	if (ALLOCNO_HARD_REGNO (conflict_a) < 0
+	    && ! ALLOCNO_DONT_REASSIGN_P (conflict_a)
+	    && ! bitmap_bit_p (temp, ALLOCNO_REGNO (conflict_a)))
+	  {
+	    spilled_pseudo_regs[num++] = ALLOCNO_REGNO (conflict_a);
+	    bitmap_set_bit (temp, ALLOCNO_REGNO (conflict_a));
+	    /* ?!? This seems wrong.  */
+	    bitmap_set_bit (consideration_allocno_bitmap,
+			    ALLOCNO_NUM (conflict_a));
+	  }
+    }
+
   if (num > 1)
     qsort (spilled_pseudo_regs, num, sizeof (int), pseudo_reg_compare);
   changed_p = false;
@@ -2878,7 +2942,7 @@
       ira_assert (reg_renumber[regno] < 0);
       if (internal_flag_ira_verbose > 3 && ira_dump_file != NULL)
 	fprintf (ira_dump_file,
-		 "      Spill %d(a%d), cost=%d", regno, ALLOCNO_NUM (a),
+		 "      Try Assign %d(a%d), cost=%d", regno, ALLOCNO_NUM (a),
 		 ALLOCNO_MEMORY_COST (a)
 		 - ALLOCNO_COVER_CLASS_COST (a));
       allocno_reload_assign (a, forbidden_regs);
@@ -2887,60 +2951,8 @@
 	  CLEAR_REGNO_REG_SET (spilled, regno);
 	  changed_p = true;
 	}
-      else
-	spilled_pseudo_regs[m++] = regno;
     }
-  if (m == 0)
-    return changed_p;
-  if (internal_flag_ira_verbose > 3 && ira_dump_file != NULL)
-    {
-      fprintf (ira_dump_file, "      Spilled regs");
-      for (i = 0; i < m; i++)
-	fprintf (ira_dump_file, " %d", spilled_pseudo_regs[i]);
-      fprintf (ira_dump_file, "\n");
-    }
-  /* Try to assign hard registers to pseudos conflicting with ones
-     from SPILLED_PSEUDO_REGS.  */
-  for (i = n = 0; i < m; i++)
-    {
-      regno = spilled_pseudo_regs[i];
-      a = ira_regno_allocno_map[regno];
-      FOR_EACH_ALLOCNO_CONFLICT (a, conflict_a, aci)
-	if (ALLOCNO_HARD_REGNO (conflict_a) < 0
-	    && ! ALLOCNO_DONT_REASSIGN_P (conflict_a)
-	    && ! bitmap_bit_p (consideration_allocno_bitmap,
-			       ALLOCNO_NUM (conflict_a)))
-	  {
-	    sorted_allocnos[n++] = conflict_a;
-	    bitmap_set_bit (consideration_allocno_bitmap,
-			    ALLOCNO_NUM (conflict_a));
-	  }
-    }
-  if (n != 0)
-    {
-      setup_allocno_priorities (sorted_allocnos, n);
-      qsort (sorted_allocnos, n, sizeof (ira_allocno_t),
-	     allocno_priority_compare_func);
-      for (i = 0; i < n; i++)
-	{
-	  a = sorted_allocnos[i];
-	  regno = ALLOCNO_REGNO (a);
-	  COPY_HARD_REG_SET (forbidden_regs, bad_spill_regs);
-	  IOR_HARD_REG_SET (forbidden_regs, pseudo_forbidden_regs[regno]);
-	  IOR_HARD_REG_SET (forbidden_regs, pseudo_previous_regs[regno]);
-	  if (internal_flag_ira_verbose > 3 && ira_dump_file != NULL)
-	    fprintf (ira_dump_file,
-		     "        Try assign %d(a%d), cost=%d",
-		     regno, ALLOCNO_NUM (a),
-		     ALLOCNO_MEMORY_COST (a)
-		     - ALLOCNO_COVER_CLASS_COST (a));
-	  if (allocno_reload_assign (a, forbidden_regs))
-	    {
-	      changed_p = true;
-	      bitmap_clear_bit (spilled, regno);
-	    }
-	}
-    }
+  BITMAP_FREE (temp);
   return changed_p;
 }
 
Index: ira-build.c
===================================================================
--- ira-build.c	(revision 155343)
+++ ira-build.c	(working copy)
@@ -482,6 +482,7 @@
   ALLOCNO_LIVE_RANGES (a) = NULL;
   ALLOCNO_MIN (a) = INT_MAX;
   ALLOCNO_MAX (a) = -1;
+  ALLOCNO_REQUIRES_REGISTER_SUBSET (a) = 0;
   ALLOCNO_CONFLICT_ID (a) = ira_allocnos_num;
   VEC_safe_push (ira_allocno_t, heap, allocno_vec, a);
   ira_allocnos = VEC_address (ira_allocno_t, allocno_vec);
Index: ira-costs.c
===================================================================
--- ira-costs.c	(revision 155343)
+++ ira-costs.c	(working copy)
@@ -637,6 +637,14 @@
 	    struct costs *pp = op_costs[i], *qq = this_op_costs[i];
 	    int scale = 1 + (recog_data.operand_type[i] == OP_INOUT);
 
+            /* catch any allocnos that want bottom_regs and record this for later */
+            if (allocno_pref && allocno_pref[ALLOCNO_NUM
+				       (ira_curr_regno_allocno_map
+					[REGNO (ops[i])])] == BOTTOM_REGS)
+              {
+                ALLOCNO_REQUIRES_REGISTER_SUBSET (ira_curr_regno_allocno_map[REGNO (ops[i])]) = 1;
+              }
+
 	    pp->mem_cost = MIN (pp->mem_cost,
 				(qq->mem_cost + op_cost_add) * scale);
 
Index: reload1.c
===================================================================
--- reload1.c	(revision 155343)
+++ reload1.c	(working copy)
@@ -48,6 +48,7 @@
 #include "df.h"
 #include "target.h"
 #include "emit-rtl.h"
+#include "ira-int.h"
 
 /* This file contains the reload pass of the compiler, which is
    run after register allocation has been done.  It checks that
@@ -5720,6 +5721,48 @@
   return 0;
 }
 
+/* Return nonzero if REGNO is a particularly bad choice for reloading X.
+   (Part of the Jeff Law hack) */
+static int
+ira_bad_reload_regno_1 (int regno, rtx x)
+{
+   int x_regno;
+   ira_allocno_t a;
+   enum reg_class pref;
+
+   /* We only deal with pseudo regs.  */
+   if (! x || GET_CODE (x) != REG)
+     return 0;
+
+   x_regno = REGNO (x);
+   if (x_regno < FIRST_PSEUDO_REGISTER)
+     return 0;
+
+   /* If the pseudo prefers REGNO explicitly, then do not consider
+      REGNO a bad spill choice.  */
+   pref = reg_preferred_class (x_regno);
+   if (reg_class_size[pref] == 1
+&& TEST_HARD_REG_BIT (reg_class_contents[pref], regno))
+     return 0;
+
+   /* If the pseudo conflicts with REGNO, then we consider REGNO a
+      poor choice for a reload regno.  */
+   a = ira_regno_allocno_map[x_regno];
+   if (TEST_HARD_REG_BIT (ALLOCNO_TOTAL_CONFLICT_HARD_REGS (a), regno))
+     return 1;
+
+   return 0;
+}
+
+/* Return nonzero if REGNO is a particularly bad choice for reloading
+    IN or OUT.  (Part of the Jeff Law hack) */
+int
+ira_bad_reload_regno (int regno, rtx in, rtx out)
+{
+   return (ira_bad_reload_regno_1 (regno, in)
+           || ira_bad_reload_regno_1 (regno, out));
+}
+
 /* Find a spill register to use as a reload register for reload R.
    LAST_RELOAD is nonzero if this is the last reload for the insn being
    processed.
@@ -5761,8 +5804,11 @@
      run out of reload regs.  Suppose we have three reloads, and
      reloads A and B can share regs.  These need two regs.
      Suppose A and B are given different regs.
-     That leaves none for C.  */
-  for (pass = 0; pass < 2; pass++)
+     That leaves none for C.
+
+     NOTE: There are now three passes in accordance with the Jeff
+     Law hack.  */
+  for (pass = 0; pass < 3; pass++)
     {
       /* I is the index in spill_regs.
 	 We advance it round-robin between insns to use all spill regs
@@ -5801,6 +5847,11 @@
 		      && ! TEST_HARD_REG_BIT (reload_reg_used_for_inherit,
 					      regnum))))
 	    {
+              /* BEGIN: Jeff Law's hack */
+              if (pass == 1
+                  && ira_bad_reload_regno (regnum, rld[r].in, rld[r].out))
+                continue;
+              /* END: Jeff Law's hack */
 	      int nr = hard_regno_nregs[regnum][rld[r].mode];
 	      /* Avoid the problem where spilling a GENERAL_OR_FP_REG
 		 (on 68000) got us two FP regs.  If NR is 1,

next prev parent reply	other threads:[~2009-12-18 15:33 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-12-14 11:15 Understanding IRA (The Story so far) Ian Bolton
2009-12-18 15:33 ` Ian Bolton [this message]
2010-01-04 14:19   ` Possible IRA improvements for irregular register architectures Ian Bolton
2010-01-11 14:34     ` Ian Bolton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D60B0700D1DB54A8C0C6E9BE69163700CC3B6D4@EXCHANGEVS.IceraSemi.local \
    --to=bolton@icerasemi.com \
    --cc=gcc@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).