RE: Understanding IRA (The Story so far)

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* RE: Understanding IRA (The Story so far)
@ 2009-12-14 11:15 Ian Bolton
  2009-12-18 15:33 ` Possible IRA improvements for irregular register architectures Ian Bolton
  0 siblings, 1 reply; 4+ messages in thread
From: Ian Bolton @ 2009-12-14 11:15 UTC (permalink / raw)
  To: gcc

For those who have been following my adventure in the world of IRA, or
just
want to take advantage of some of the time I've spent on this, here's a
detailed recap of my exploits and my current status (which I'm very
happy
with):

I initially thought IRA was calculating costs wrongly for our
architecture
and so I experimented with adding more costs in different places.  This
effectively raised the priority of coloring certain allocnos and it led
to
good performance on some benchmarks, but it was really a complete hack
and
not an acceptable solution, so I went back to the drawing board.

After chatting with Vlad and Jeff, I realised IRA was calculating costs
correctly and the issue was that the costs only reflect the preferences
of
each allocno, as opposed to their "agnosticism" (i.e. not caring what
register they get), and so instructions that will happily take any
register
would take whichever one you showed them first; this meant that high
priority
agnostic instructions (that are colored first) were pinching BOTTOM_REGS
before the lower-priority special instructions got a chance.  To address
this, I came up with the simple solution of changing the
REG_ALLOC_ORDER,
where I show TOP_CREGS (the top half of our register set) to agnostic
instructions before the callee-save BOTTOM_REGS (c7-c15) by default, so
they
take TOP_CREGS when all other costs are equal; meanwhile, allocnos
requiring
BOTTOM_REGS were given them because they'd been left unallocated and
their
costs were lower than the TOP_CREGS.

This resulted in good performance improvements on some key benchmarks,
where
previously agnostic instructions were unwittingly using up these special
BOTTOM_REGS and leading to us allocating TOP_CREGS for operands in
special
instructions (which then required spills and moves to get a BOTTOM_REG
in
time to do the special instruction).  The main drawback of this
alternate
REG_ALLOC_ORDER was that we reduce the chance to reuse a callee-save
register
if we pick a TOP_CREG, because it doesn't work on all instructions, and
I
ended up getting regressions on some low-pressure benchmarks because we
used
more callee-save registers than previously.

I then tried various other ways of achieving the performance
improvements I
got on the high register pressure benchmarks and eliminating the
regressions
on the low-pressure ones:

* I tried using '?' constraints on agnostic instructions, such that they
erred to TOP_CREGS instead of callee-save BOTTOM_REGS, but this led to
too
much bias away from the more reusable BOTTOM_REGS (even when there was
no
need to avoid them) and I subsequently got regressions;

* I tried restricting the effect of the '?', by not using it for the
register
preferencing pass and by only adding on the alt_cost if I'd determined
the
allocno never requires BOTTOM_REG, and this got some decent performance,
but
I was still effectively penalising agnostic instructions for using
BOTTOM_REGS regardless of whether others allocnos were demanding them,
so I
got regressions;

* I tried coloring the ones that need BOTTOM_REGS first, which maximises
the
reuse opportunities, but this was essentially a rejection of the
existing
tried-and-tested coloring heuristics in IRA and consequently led to some
regressions;

* I tried combining the new REG_ALLOC_ORDER with a modified coloring
heuristic that uses the need for BOTTOM_REGS as a tie-breaker when two
allocnos are otherwise indistinguishable, but this was overriding the
ALLOCNO_NUM ordering (which I guess represents live range starting
points?),
so I got some regressions.

Fortunately, I was hit by another epiphany on Monday night and I
realised I
could emulate the new REG_ALLOC_ORDER dynamically within assign_hard_reg
and
only when necessary.  When allocating for an agnostic allocno, if it
conflicted with other allocnos that needed BOTTOM_REGS, I added 1 to the
full_cost of each of the BOTTOM_REGS so this allocno took a TOP_CREG.
This
gave me my best performance overall, but there were still some
regressions on
low-pressure benchmarks.  Whilst investigating them, I realised I had
gone
beyond my original new REG_ALLOC_ORDER because I was now avoiding all
BOTTOM_REGS instead of just the callee-save BOTTOM_REGS, so I changed
assign_hard_reg so it only adds 1 to each callee-save BOTTOM_REG.  This
means
we happily use up the caller-save BOTTOM_REGS (c1 - c6), regardless of
whether the allocno is agnostic or not, and this leads to more
reusability in
low-pressure cases (whereas avoiding all BOTTOM_REGS for an agnostic
instruction decreases reusability).

Note that I had been experimenting with Priority algorithm and
Chaitin-Briggs
up until this point and have decided to stick with Priority for now
because
we are using Priority already and this minimises the disruption and
reduces
the risk of regressions on untested inputs.  (Results generally show CB
to be
better performing and smaller code-size, but it suffers badly in very
high
pressure cases and spills more than Priority.)

As of yesterday, I had two versions of the Priority algorithm IRA: one
that
biases against all BOTTOM_REGS when an agnostic allocno conflicts with
one
that wants BOTTOM_REGS; and one that biases against only the callee-save
BOTTOM_REGS.  The former got me better performance on high-pressure
benchmarks and the latter got me better performance on the ones with
some low
pressure.  Either were an improvement on what I started with (default
IRA),
but I wanted to have the best of both worlds and so I needed a hybrid
that
biases against only callee-save BOTTOM_REGS for low-pressure and biases
against all BOTTOM_REGS for high-pressure.

Sure, I could have put in some ugly hack that looks at the register
pressure
number and picks one or the other, but instead I came up with a more
intelligent solution, which takes advantage of our single caller-save
TOP_CREG (c16), which appears after the caller-save BOTTOM_REGS in our
REG_ALLOC_ORDER.  Now, when an agnostic allocno conflicts with one that
wants
BOTTOM_REGS, I add 1 to the cost of c16 too, so we still give out c1, c2
etc
early on (because they have lower costs than all the callee-save regs
and
they appear earlier in the REG_ALLOC_ORDER than c16) to whichever
allocno
gets there first.  But later, when all the caller-save regs are
allocated, we
will use our first callee-save reg (agnostic allocnos will take c17,
ones
that need BOTTOM_REGS will take c7).  At this point, the future cost of
this
allocated callee-save reg becomes cheaper in future than the caller-save
regs, and we opt to use this in preference to the caller-save regs
(including
c1-c6).  The result is that, for high pressure, we are holding back all
the
BOTTOM_REGS for those that need them.

The performance of this version is my best ever and I am very happy with
it.
It doesn't quite get the highest performance I've seen on one of the
high-
pressure benchmarks, because we may still be giving up some BOTTOM_REGS
to
agnostic instructions early on, but it's pretty pretty good otherwise.

Thanks for reading and happy allocating!

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Possible IRA improvements for irregular register architectures
  2009-12-14 11:15 Understanding IRA (The Story so far) Ian Bolton
@ 2009-12-18 15:33 ` Ian Bolton
  2010-01-04 14:19   ` Ian Bolton
  0 siblings, 1 reply; 4+ messages in thread
From: Ian Bolton @ 2009-12-18 15:33 UTC (permalink / raw)
  To: gcc

[-- Attachment #1: Type: text/plain, Size: 2820 bytes --]

Let's assume I have two sub-classes of ALL_REGS: BOTTOM_REGS (c0-c15)
and TOP_CREGS (c16-c31).

Let's also assume I have two main types of instruction: A-type
Instructions, which can use ALL 32 registers, and B-type Instructions,
which can only use the 16 BOTTOM_REGS.

IRA will correctly calculate costs (in ira-costs.c) for allocnos
appearing in B-type instructions, such that TOP_CREGS has a higher
cost than BOTTOM_REGS.  It will also calculate costs for the A-type
instructions such that TOP_CREGS and BOTTOM_REGS have the same cost.

The order of coloring will be determined by the algorithm chosen:
Priority or Chaitin-Briggs.  As each allocno is colored, the costs
will be inspected and the best available hard register will be chosen,
mainly based on the register class costs mentioned above, so allocnos
in B-type Instructions will usually be assigned a BOTTOM_REG if one is
free.  If two or more hard registers share the same cost then
whichever one appears first in the REG_ALLOC_ORDER will be assigned.
(If no hard register can be found, the allocno is assigned memory and
will require a "reload" in a later pass to get a hard register.)

I do not wish to alter the coloring algorithms or the coloring order.
I believe they are good at determing the order to color allocnos,
which dictates the likelihood of being assigned a hard register.  What
I wish to change is the hard register that is assigned, given that the
coloring order has determined that this allocno should get one next.

Why do I need to do this?  Since the BOTTOM_REGS can be used by all
instructions, it makes sense to put them first in the REG_ALLOC_ORDER,
so we minimise the number of registers consumed by a low-pressure
function.  But it also makes sense, in high-pressure functions, to
steer A-type Instructions away from using BOTTOM_REGS so that they are
free for B-type Instructions to use.

To achieve this, I tried altering the costs calculated in ira-costs.c,
either explicitly with various hacks or by altering operand
constraints.  The problem with this approach was that it is static and
independent, occurring before any coloring order has been determined
and without any awareness of the needs of other allocnos.  I believe I
require a dynamic way to alter the costs, based on which allocnos
conflict with the allocno currently being colored and which hard
registers are still available at this point.

The patch I have attached here is my first reasonable successful
attempt at this dynamic approach, which has led to performance
improvements on some of our benchmarks and no significant
regressions.

I am hoping it will be useful to others, but I post it more as a
talking point or perhaps to inspire others to come up with better
solutions and share them with me :-)

[-- Attachment #2: bolton_ira_subset_experiments.diff --]
[-- Type: application/octet-stream, Size: 11903 bytes --]

Index: toplev.c
===================================================================
--- toplev.c	(revision 155343)
+++ toplev.c	(working copy)
@@ -287,7 +287,7 @@
 /* Set the default region and algorithm for the integrated register
    allocator.  */
 
-enum ira_algorithm flag_ira_algorithm = IRA_ALGORITHM_CB;
+enum ira_algorithm flag_ira_algorithm = IRA_ALGORITHM_PRIORITY;
 enum ira_region flag_ira_region = IRA_REGION_MIXED;
 
 /* Set the default value for -fira-verbose.  */
Index: ira-int.h
===================================================================
--- ira-int.h	(revision 155343)
+++ ira-int.h	(working copy)
@@ -418,6 +418,9 @@
   ira_allocno_t prev_bucket_allocno;
   /* Used for temporary purposes.  */
   int temp;
+  /* make note that this allocno can only use a subset of ALL_REGS.
+     e.g. BOTTOM_REGS */
+  int requires_register_subset;
 };
 
 /* All members of the allocno structures should be accessed only
@@ -481,6 +484,7 @@
 #define ALLOCNO_MIN(A) ((A)->min)
 #define ALLOCNO_MAX(A) ((A)->max)
 #define ALLOCNO_CONFLICT_ID(A) ((A)->conflict_id)
+#define ALLOCNO_REQUIRES_REGISTER_SUBSET(A) ((A)->requires_register_subset)
 
 /* Map regno -> allocnos with given regno (see comments for
    allocno member `next_regno_allocno').  */
Index: ira-color.c
===================================================================
--- ira-color.c	(revision 155343)
+++ ira-color.c	(working copy)
@@ -452,6 +452,7 @@
 #ifdef STACK_REGS
   bool no_stack_reg_p;
 #endif
+  int leave_bottom = 0;
 
   ira_assert (! ALLOCNO_ASSIGNED_P (allocno));
   cover_class = ALLOCNO_COVER_CLASS (allocno);
@@ -502,6 +503,20 @@
 				     ALLOCNO_NUM (conflict_allocno)))
 	  {
 	    conflict_cover_class = ALLOCNO_COVER_CLASS (conflict_allocno);
+
+            /* If we don't need the register subset, but a conflicting allocno does,
+               we should leave the bottom regs, to ensure that allocno gets what it needs
+               No change is made in reload pass, however. */
+            if (!retry_p &&
+                !ALLOCNO_ASSIGNED_P (conflict_allocno) &&
+                 (ALLOCNO_REQUIRES_REGISTER_SUBSET (a) == 0) && 
+                 (ALLOCNO_REQUIRES_REGISTER_SUBSET (conflict_allocno)))
+              { 
+                /* WE DON'T WANT BOTTOM, BUT THEY DO ...
+                   .... SO INCREASE THE COSTS FOR BOTTOM_REGS SO WE ERR TOWARDS TOP_REGS */
+                leave_bottom += 1;
+              }
+       
 	    ira_assert (ira_reg_classes_intersect_p
 			[cover_class][conflict_cover_class]);
 	    if (allocno_coalesced_p)
@@ -600,6 +615,27 @@
 	}
       if (min_cost > cost)
 	min_cost = cost;
+
+      /* Nudge the cost of this reg up because someone else wants this one.
+
+         Note: if we only penalised BOTTOM_REGS then we would skip over c2, c3, c4, c5, c6
+               and go straight for c16 because it is cheaper than currently non-allocated
+               callee-save TOP_CREGS.  By penalising c16 too, the cost for c2, c3, c4, c5,
+               c6 and c16 will all be the same AND lower than any of the currently
+               unallocated TOP_CREGS, so we will take the lowest number (c2).  This is good
+               for limiting the registers we use up in low-pressure cases. */
+      if (leave_bottom)
+        {
+          /* Always add 1 to full_cost of BOTTOM_REGS
+             Only add 1 to full_cost of c16 when c17 (the first callee-save TOP_CREG) has
+             not yet been allocated. This effectively means anyone can get c1-c6 until we
+             grab our first callee-save TOP_CREG; after that, we try to reserve c1-c6 for
+             instructions that want them specifically by encouraging use of c16, c17 first. */
+          if (hard_regno < 16 ||
+              ((hard_regno == 16) && (!allocated_hardreg_p[17])))
+            full_cost += 1;
+        }
+
       if (min_full_cost > full_cost)
 	{
 	  min_full_cost = full_cost;
@@ -2853,14 +2889,42 @@
 ira_reassign_pseudos (int *spilled_pseudo_regs, int num,
 		      HARD_REG_SET bad_spill_regs,
 		      HARD_REG_SET *pseudo_forbidden_regs,
-		      HARD_REG_SET *pseudo_previous_regs,  bitmap spilled)
+		      HARD_REG_SET *pseudo_previous_regs,
+		      bitmap spilled)
 {
   int i, m, n, regno;
   bool changed_p;
   ira_allocno_t a, conflict_a;
   HARD_REG_SET forbidden_regs;
   ira_allocno_conflict_iterator aci;
+  bitmap temp = BITMAP_ALLOC (NULL);
 
+  /* Add pseudos which conflict with pseudos already in
+     SPILLED_PSEUDO_REGS to SPILLED_PSEUDO_REGS.  This is preferable
+     to allocating in two steps as some of the conflicts might have
+     a higher priority than the pseudos passed in SPILLED_PSEUDO_REGS.  */
+  for (i = 0; i < num; i++)
+    bitmap_set_bit (temp, spilled_pseudo_regs[i]);
+
+  for (i = 0, n = num; i < n; i++)
+    {
+      int regno = spilled_pseudo_regs[i];
+      bitmap_set_bit (temp, regno);
+
+      a = ira_regno_allocno_map[regno];
+      FOR_EACH_ALLOCNO_CONFLICT (a, conflict_a, aci)
+	if (ALLOCNO_HARD_REGNO (conflict_a) < 0
+	    && ! ALLOCNO_DONT_REASSIGN_P (conflict_a)
+	    && ! bitmap_bit_p (temp, ALLOCNO_REGNO (conflict_a)))
+	  {
+	    spilled_pseudo_regs[num++] = ALLOCNO_REGNO (conflict_a);
+	    bitmap_set_bit (temp, ALLOCNO_REGNO (conflict_a));
+	    /* ?!? This seems wrong.  */
+	    bitmap_set_bit (consideration_allocno_bitmap,
+			    ALLOCNO_NUM (conflict_a));
+	  }
+    }
+
   if (num > 1)
     qsort (spilled_pseudo_regs, num, sizeof (int), pseudo_reg_compare);
   changed_p = false;
@@ -2878,7 +2942,7 @@
       ira_assert (reg_renumber[regno] < 0);
       if (internal_flag_ira_verbose > 3 && ira_dump_file != NULL)
 	fprintf (ira_dump_file,
-		 "      Spill %d(a%d), cost=%d", regno, ALLOCNO_NUM (a),
+		 "      Try Assign %d(a%d), cost=%d", regno, ALLOCNO_NUM (a),
 		 ALLOCNO_MEMORY_COST (a)
 		 - ALLOCNO_COVER_CLASS_COST (a));
       allocno_reload_assign (a, forbidden_regs);
@@ -2887,60 +2951,8 @@
 	  CLEAR_REGNO_REG_SET (spilled, regno);
 	  changed_p = true;
 	}
-      else
-	spilled_pseudo_regs[m++] = regno;
     }
-  if (m == 0)
-    return changed_p;
-  if (internal_flag_ira_verbose > 3 && ira_dump_file != NULL)
-    {
-      fprintf (ira_dump_file, "      Spilled regs");
-      for (i = 0; i < m; i++)
-	fprintf (ira_dump_file, " %d", spilled_pseudo_regs[i]);
-      fprintf (ira_dump_file, "\n");
-    }
-  /* Try to assign hard registers to pseudos conflicting with ones
-     from SPILLED_PSEUDO_REGS.  */
-  for (i = n = 0; i < m; i++)
-    {
-      regno = spilled_pseudo_regs[i];
-      a = ira_regno_allocno_map[regno];
-      FOR_EACH_ALLOCNO_CONFLICT (a, conflict_a, aci)
-	if (ALLOCNO_HARD_REGNO (conflict_a) < 0
-	    && ! ALLOCNO_DONT_REASSIGN_P (conflict_a)
-	    && ! bitmap_bit_p (consideration_allocno_bitmap,
-			       ALLOCNO_NUM (conflict_a)))
-	  {
-	    sorted_allocnos[n++] = conflict_a;
-	    bitmap_set_bit (consideration_allocno_bitmap,
-			    ALLOCNO_NUM (conflict_a));
-	  }
-    }
-  if (n != 0)
-    {
-      setup_allocno_priorities (sorted_allocnos, n);
-      qsort (sorted_allocnos, n, sizeof (ira_allocno_t),
-	     allocno_priority_compare_func);
-      for (i = 0; i < n; i++)
-	{
-	  a = sorted_allocnos[i];
-	  regno = ALLOCNO_REGNO (a);
-	  COPY_HARD_REG_SET (forbidden_regs, bad_spill_regs);
-	  IOR_HARD_REG_SET (forbidden_regs, pseudo_forbidden_regs[regno]);
-	  IOR_HARD_REG_SET (forbidden_regs, pseudo_previous_regs[regno]);
-	  if (internal_flag_ira_verbose > 3 && ira_dump_file != NULL)
-	    fprintf (ira_dump_file,
-		     "        Try assign %d(a%d), cost=%d",
-		     regno, ALLOCNO_NUM (a),
-		     ALLOCNO_MEMORY_COST (a)
-		     - ALLOCNO_COVER_CLASS_COST (a));
-	  if (allocno_reload_assign (a, forbidden_regs))
-	    {
-	      changed_p = true;
-	      bitmap_clear_bit (spilled, regno);
-	    }
-	}
-    }
+  BITMAP_FREE (temp);
   return changed_p;
 }
 
Index: ira-build.c
===================================================================
--- ira-build.c	(revision 155343)
+++ ira-build.c	(working copy)
@@ -482,6 +482,7 @@
   ALLOCNO_LIVE_RANGES (a) = NULL;
   ALLOCNO_MIN (a) = INT_MAX;
   ALLOCNO_MAX (a) = -1;
+  ALLOCNO_REQUIRES_REGISTER_SUBSET (a) = 0;
   ALLOCNO_CONFLICT_ID (a) = ira_allocnos_num;
   VEC_safe_push (ira_allocno_t, heap, allocno_vec, a);
   ira_allocnos = VEC_address (ira_allocno_t, allocno_vec);
Index: ira-costs.c
===================================================================
--- ira-costs.c	(revision 155343)
+++ ira-costs.c	(working copy)
@@ -637,6 +637,14 @@
 	    struct costs *pp = op_costs[i], *qq = this_op_costs[i];
 	    int scale = 1 + (recog_data.operand_type[i] == OP_INOUT);
 
+            /* catch any allocnos that want bottom_regs and record this for later */
+            if (allocno_pref && allocno_pref[ALLOCNO_NUM
+				       (ira_curr_regno_allocno_map
+					[REGNO (ops[i])])] == BOTTOM_REGS)
+              {
+                ALLOCNO_REQUIRES_REGISTER_SUBSET (ira_curr_regno_allocno_map[REGNO (ops[i])]) = 1;
+              }
+
 	    pp->mem_cost = MIN (pp->mem_cost,
 				(qq->mem_cost + op_cost_add) * scale);
 
Index: reload1.c
===================================================================
--- reload1.c	(revision 155343)
+++ reload1.c	(working copy)
@@ -48,6 +48,7 @@
 #include "df.h"
 #include "target.h"
 #include "emit-rtl.h"
+#include "ira-int.h"
 
 /* This file contains the reload pass of the compiler, which is
    run after register allocation has been done.  It checks that
@@ -5720,6 +5721,48 @@
   return 0;
 }
 
+/* Return nonzero if REGNO is a particularly bad choice for reloading X.
+   (Part of the Jeff Law hack) */
+static int
+ira_bad_reload_regno_1 (int regno, rtx x)
+{
+   int x_regno;
+   ira_allocno_t a;
+   enum reg_class pref;
+
+   /* We only deal with pseudo regs.  */
+   if (! x || GET_CODE (x) != REG)
+     return 0;
+
+   x_regno = REGNO (x);
+   if (x_regno < FIRST_PSEUDO_REGISTER)
+     return 0;
+
+   /* If the pseudo prefers REGNO explicitly, then do not consider
+      REGNO a bad spill choice.  */
+   pref = reg_preferred_class (x_regno);
+   if (reg_class_size[pref] == 1
+&& TEST_HARD_REG_BIT (reg_class_contents[pref], regno))
+     return 0;
+
+   /* If the pseudo conflicts with REGNO, then we consider REGNO a
+      poor choice for a reload regno.  */
+   a = ira_regno_allocno_map[x_regno];
+   if (TEST_HARD_REG_BIT (ALLOCNO_TOTAL_CONFLICT_HARD_REGS (a), regno))
+     return 1;
+
+   return 0;
+}
+
+/* Return nonzero if REGNO is a particularly bad choice for reloading
+    IN or OUT.  (Part of the Jeff Law hack) */
+int
+ira_bad_reload_regno (int regno, rtx in, rtx out)
+{
+   return (ira_bad_reload_regno_1 (regno, in)
+           || ira_bad_reload_regno_1 (regno, out));
+}
+
 /* Find a spill register to use as a reload register for reload R.
    LAST_RELOAD is nonzero if this is the last reload for the insn being
    processed.
@@ -5761,8 +5804,11 @@
      run out of reload regs.  Suppose we have three reloads, and
      reloads A and B can share regs.  These need two regs.
      Suppose A and B are given different regs.
-     That leaves none for C.  */
-  for (pass = 0; pass < 2; pass++)
+     That leaves none for C.
+
+     NOTE: There are now three passes in accordance with the Jeff
+     Law hack.  */
+  for (pass = 0; pass < 3; pass++)
     {
       /* I is the index in spill_regs.
 	 We advance it round-robin between insns to use all spill regs
@@ -5801,6 +5847,11 @@
 		      && ! TEST_HARD_REG_BIT (reload_reg_used_for_inherit,
 					      regnum))))
 	    {
+              /* BEGIN: Jeff Law's hack */
+              if (pass == 1
+                  && ira_bad_reload_regno (regnum, rld[r].in, rld[r].out))
+                continue;
+              /* END: Jeff Law's hack */
 	      int nr = hard_regno_nregs[regnum][rld[r].mode];
 	      /* Avoid the problem where spilling a GENERAL_OR_FP_REG
 		 (on 68000) got us two FP regs.  If NR is 1,

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: Possible IRA improvements for irregular register architectures
  2009-12-18 15:33 ` Possible IRA improvements for irregular register architectures Ian Bolton
@ 2010-01-04 14:19   ` Ian Bolton
  2010-01-11 14:34     ` Ian Bolton
  0 siblings, 1 reply; 4+ messages in thread
From: Ian Bolton @ 2010-01-04 14:19 UTC (permalink / raw)
  To: gcc

Happy New Year!

I was hoping for some kind of response to this, but maybe I didn't give
enough info?  I'd appreciate some pointers on what I could do to prompt
some discussion because I have some promising new ideas that lead on
from what I've described below.

Cheers,
Ian

> -----Original Message-----
> From: gcc-owner@gcc.gnu.org [mailto:gcc-owner@gcc.gnu.org] On Behalf
Of
> Ian Bolton
> Sent: 18 December 2009 15:34
> To: gcc@gcc.gnu.org
> Subject: Possible IRA improvements for irregular register
architectures
> 
> Let's assume I have two sub-classes of ALL_REGS: BOTTOM_REGS (c0-c15)
> and TOP_CREGS (c16-c31).
> 
> Let's also assume I have two main types of instruction: A-type
> Instructions, which can use ALL 32 registers, and B-type Instructions,
> which can only use the 16 BOTTOM_REGS.
> 
> IRA will correctly calculate costs (in ira-costs.c) for allocnos
> appearing in B-type instructions, such that TOP_CREGS has a higher
> cost than BOTTOM_REGS.  It will also calculate costs for the A-type
> instructions such that TOP_CREGS and BOTTOM_REGS have the same cost.
> 
> The order of coloring will be determined by the algorithm chosen:
> Priority or Chaitin-Briggs.  As each allocno is colored, the costs
> will be inspected and the best available hard register will be chosen,
> mainly based on the register class costs mentioned above, so allocnos
> in B-type Instructions will usually be assigned a BOTTOM_REG if one is
> free.  If two or more hard registers share the same cost then
> whichever one appears first in the REG_ALLOC_ORDER will be assigned.
> (If no hard register can be found, the allocno is assigned memory and
> will require a "reload" in a later pass to get a hard register.)
> 
> I do not wish to alter the coloring algorithms or the coloring order.
> I believe they are good at determing the order to color allocnos,
> which dictates the likelihood of being assigned a hard register.  What
> I wish to change is the hard register that is assigned, given that the
> coloring order has determined that this allocno should get one next.
> 
> Why do I need to do this?  Since the BOTTOM_REGS can be used by all
> instructions, it makes sense to put them first in the REG_ALLOC_ORDER,
> so we minimise the number of registers consumed by a low-pressure
> function.  But it also makes sense, in high-pressure functions, to
> steer A-type Instructions away from using BOTTOM_REGS so that they are
> free for B-type Instructions to use.
> 
> To achieve this, I tried altering the costs calculated in ira-costs.c,
> either explicitly with various hacks or by altering operand
> constraints.  The problem with this approach was that it is static and
> independent, occurring before any coloring order has been determined
> and without any awareness of the needs of other allocnos.  I believe I
> require a dynamic way to alter the costs, based on which allocnos
> conflict with the allocno currently being colored and which hard
> registers are still available at this point.
> 
> The patch I have attached here is my first reasonable successful
> attempt at this dynamic approach, which has led to performance
> improvements on some of our benchmarks and no significant
> regressions.
> 
> I am hoping it will be useful to others, but I post it more as a
> talking point or perhaps to inspire others to come up with better
> solutions and share them with me :-)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: Possible IRA improvements for irregular register architectures
  2010-01-04 14:19   ` Ian Bolton
@ 2010-01-11 14:34     ` Ian Bolton
  0 siblings, 0 replies; 4+ messages in thread
From: Ian Bolton @ 2010-01-11 14:34 UTC (permalink / raw)
  To: gcc

[-- Attachment #1: Type: text/plain, Size: 5407 bytes --]

In the absence of any discussion, I just went ahead and implemented my
new ideas, which are attached as a patch!

Whilst it is tailored to the specifics of our architecture - where we
have 32 C_REGS, of which there are 16 BOTTOM_REGS, and some insns
that only work with BOTTOM_REGS - I believe it could be adapted to
other architectures where you want to optimise the allocation of
any special subsets of your register set.

The key file to look at is ira-color.c.  I have something called
"determine_max_abcd", which considers all conflicting allocnos for the
one currently being colored, and finds the maximum depth of overlap
of those yet-to-be-colored allocnos that require BOTTOM_REGS and those
already-colored allocnos that have already been assigned a BOTTOM_REG.

I use this to tell me whether to give out a BOTTOM_REG to an allocno
that can use any of our 32 registers.  If the ABCD is greater than or
equal to the number of BOTTOM_REGS then I adjust the costs of the
BOTTOM_REGS to nudge this allocno towards using a TOP_CREG instead.
Without this intervention, a future allocno that needed BOTTOM_REGS
would not have got one and we would need to plant a spill and fill
later to satisfy the needs of its instruction.

This improves performance on all benchmarks I have run on our
architecture.  I still have other ideas for making it even better,
but I figured now was a good time to share what I have.

Comments and questions would be greatly appreciated.

Cheers,
Ian

> -----Original Message-----
> From: gcc-owner@gcc.gnu.org [mailto:gcc-owner@gcc.gnu.org] On Behalf
Of
> Ian Bolton
> Sent: 04 January 2010 14:19
> To: gcc@gcc.gnu.org
> Subject: RE: Possible IRA improvements for irregular register
> architectures
> 
> Happy New Year!
> 
> I was hoping for some kind of response to this, but maybe I didn't
give
> enough info?  I'd appreciate some pointers on what I could do to
prompt
> some discussion because I have some promising new ideas that lead on
> from what I've described below.
> 
> Cheers,
> Ian
> 
> > -----Original Message-----
> > From: gcc-owner@gcc.gnu.org [mailto:gcc-owner@gcc.gnu.org] On Behalf
> Of
> > Ian Bolton
> > Sent: 18 December 2009 15:34
> > To: gcc@gcc.gnu.org
> > Subject: Possible IRA improvements for irregular register
> architectures
> >
> > Let's assume I have two sub-classes of ALL_REGS: BOTTOM_REGS
(c0-c15)
> > and TOP_CREGS (c16-c31).
> >
> > Let's also assume I have two main types of instruction: A-type
> > Instructions, which can use ALL 32 registers, and B-type
> Instructions,
> > which can only use the 16 BOTTOM_REGS.
> >
> > IRA will correctly calculate costs (in ira-costs.c) for allocnos
> > appearing in B-type instructions, such that TOP_CREGS has a higher
> > cost than BOTTOM_REGS.  It will also calculate costs for the A-type
> > instructions such that TOP_CREGS and BOTTOM_REGS have the same cost.
> >
> > The order of coloring will be determined by the algorithm chosen:
> > Priority or Chaitin-Briggs.  As each allocno is colored, the costs
> > will be inspected and the best available hard register will be
> chosen,
> > mainly based on the register class costs mentioned above, so
allocnos
> > in B-type Instructions will usually be assigned a BOTTOM_REG if one
> is
> > free.  If two or more hard registers share the same cost then
> > whichever one appears first in the REG_ALLOC_ORDER will be assigned.
> > (If no hard register can be found, the allocno is assigned memory
and
> > will require a "reload" in a later pass to get a hard register.)
> >
> > I do not wish to alter the coloring algorithms or the coloring
order.
> > I believe they are good at determing the order to color allocnos,
> > which dictates the likelihood of being assigned a hard register.
> What
> > I wish to change is the hard register that is assigned, given that
> the
> > coloring order has determined that this allocno should get one next.
> >
> > Why do I need to do this?  Since the BOTTOM_REGS can be used by all
> > instructions, it makes sense to put them first in the
> REG_ALLOC_ORDER,
> > so we minimise the number of registers consumed by a low-pressure
> > function.  But it also makes sense, in high-pressure functions, to
> > steer A-type Instructions away from using BOTTOM_REGS so that they
> are
> > free for B-type Instructions to use.
> >
> > To achieve this, I tried altering the costs calculated in ira-
> costs.c,
> > either explicitly with various hacks or by altering operand
> > constraints.  The problem with this approach was that it is static
> and
> > independent, occurring before any coloring order has been determined
> > and without any awareness of the needs of other allocnos.  I believe
> I
> > require a dynamic way to alter the costs, based on which allocnos
> > conflict with the allocno currently being colored and which hard
> > registers are still available at this point.
> >
> > The patch I have attached here is my first reasonable successful
> > attempt at this dynamic approach, which has led to performance
> > improvements on some of our benchmarks and no significant
> > regressions.
> >
> > I am hoping it will be useful to others, but I post it more as a
> > talking point or perhaps to inspire others to come up with better
> > solutions and share them with me :-)

[-- Attachment #2: bolton_ira_subset_ABCD_experiments.diff --]
[-- Type: application/octet-stream, Size: 13977 bytes --]

Index: toplev.c
===================================================================
--- toplev.c	(revision 155343)
+++ toplev.c	(working copy)
@@ -287,7 +287,7 @@
 /* Set the default region and algorithm for the integrated register
    allocator.  */
 
-enum ira_algorithm flag_ira_algorithm = IRA_ALGORITHM_CB;
+enum ira_algorithm flag_ira_algorithm = IRA_ALGORITHM_PRIORITY;
 enum ira_region flag_ira_region = IRA_REGION_MIXED;
 
 /* Set the default value for -fira-verbose.  */
Index: ira-int.h
===================================================================
--- ira-int.h	(revision 155343)
+++ ira-int.h	(working copy)
@@ -418,6 +418,9 @@
   ira_allocno_t prev_bucket_allocno;
   /* Used for temporary purposes.  */
   int temp;
+  /* make note that this allocno can only use a subset of ALL_REGS.
+     e.g. BOTTOM_REGS */
+  int requires_register_subset;
 };
 
 /* All members of the allocno structures should be accessed only
@@ -481,6 +484,7 @@
 #define ALLOCNO_MIN(A) ((A)->min)
 #define ALLOCNO_MAX(A) ((A)->max)
 #define ALLOCNO_CONFLICT_ID(A) ((A)->conflict_id)
+#define ALLOCNO_REQUIRES_REGISTER_SUBSET(A) ((A)->requires_register_subset)
 
 /* Map regno -> allocnos with given regno (see comments for
    allocno member `next_regno_allocno').  */
Index: ira-color.c
===================================================================
--- ira-color.c	(revision 155343)
+++ ira-color.c	(working copy)
@@ -429,6 +429,65 @@
     }
 }
 
+
+/*
+  Calculates the maximum ALLOCNO BOTTOM_REG CONFLICT DEPTH.
+
+  Examine the entire live range for this allocno and look for one or more points
+  where taking a BOTTOM_REG for this allocno will deprive a yet-to-be-colored
+  allocno that needs BOTTOM_REGS. I identify these points by calculating the
+  "Allocno BOTTOM_REG Conflict Depth" (ABCD), which is the sum of the number of
+  yet-to-be-colored allocnos that require BOTTOM_REGS and the number of
+  BOTTOM_REGS already allocated to allocnos that conflict with this one.
+
+  If the max ABCD across the live range is 15 or greater (the number of non-fixed
+  BOTTOM_REGS) then we should not take a BOTTOM_REG.
+*/
+static int determine_max_abcd (ira_allocno_t a)
+{
+  allocno_live_range_t r;
+  ira_allocno_t conflict_allocno;
+  ira_allocno_conflict_iterator aci;
+
+  /* if this one needs BOTTOM_REGS, the ABCD is irrelevant so return 0 */
+  if (ALLOCNO_REQUIRES_REGISTER_SUBSET (a))
+    return 0;
+
+  int max = 0;
+  for (r = ALLOCNO_LIVE_RANGES (a); r != NULL; r = r->next)
+    {
+      int prog_point;
+      for (prog_point = r->finish; prog_point >= r->start; prog_point--)
+        {
+          int depth = 0;
+          FOR_EACH_ALLOCNO_CONFLICT (a, conflict_allocno, aci)
+            {
+              allocno_live_range_t s;
+              if (ALLOCNO_REQUIRES_REGISTER_SUBSET (conflict_allocno) ||
+                  (ALLOCNO_ASSIGNED_P (conflict_allocno) && (ALLOCNO_HARD_REGNO (conflict_allocno) < 16)))
+                {
+                  for (s = ALLOCNO_LIVE_RANGES (conflict_allocno); s != NULL; s = s->next)
+                    {
+                      if (s->finish >= prog_point && s->start <= prog_point)
+                        {
+                          depth++;
+                         break;
+                        }
+                    }
+                }
+            }
+          
+          if (max < depth)
+            {
+              max = depth;
+            }
+        }
+    }
+
+  return max;
+}
+
+
 /* Choose a hard register for ALLOCNO (or for all coalesced allocnos
    represented by ALLOCNO).  If RETRY_P is TRUE, it means that the
    function called from function `ira_reassign_conflict_allocnos' and
@@ -452,6 +511,7 @@
 #ifdef STACK_REGS
   bool no_stack_reg_p;
 #endif
+  int leave_bottom = 0;
 
   ira_assert (! ALLOCNO_ASSIGNED_P (allocno));
   cover_class = ALLOCNO_COVER_CLASS (allocno);
@@ -494,6 +554,17 @@
 	    costs[i] += cost;
 	    full_costs[i] += cost;
 	  }
+
+
+      // let's determine the "ALLOCNO_BOTTOM_CONFLICT_DEPTH"
+      // If it's 15 or more then we cannot take a BOTTOM_REG this time
+      if (determine_max_abcd(a) >= 15)
+        {
+          leave_bottom = 1;
+          if (internal_flag_ira_verbose > 3 && ira_dump_file != NULL)
+            fprintf(ira_dump_file, "[ABCD optimisation triggered for a%d]\n", ALLOCNO_NUM(a));        
+        }
+
       /* Take preferences of conflicting allocnos into account.  */
       FOR_EACH_ALLOCNO_CONFLICT (a, conflict_allocno, aci)
 	/* Reload can give another class so we need to check all
@@ -502,7 +573,8 @@
 				     ALLOCNO_NUM (conflict_allocno)))
 	  {
 	    conflict_cover_class = ALLOCNO_COVER_CLASS (conflict_allocno);
-	    ira_assert (ira_reg_classes_intersect_p
+
+            ira_assert (ira_reg_classes_intersect_p
 			[cover_class][conflict_cover_class]);
 	    if (allocno_coalesced_p)
 	      {
@@ -600,6 +672,15 @@
 	}
       if (min_cost > cost)
 	min_cost = cost;
+
+      /* Nudge the cost of this reg up because someone else wants this one. */
+      if (leave_bottom && hard_regno < 16)
+        {
+          /* we need to add on the add_cost so that caller-save BOTTOM_REGS end up more
+             expensive than callee-save BOTTOM_REGS */
+          full_cost += (1 + add_cost);
+        }
+
       if (min_full_cost > full_cost)
 	{
 	  min_full_cost = full_cost;
@@ -607,6 +688,15 @@
 	  ira_assert (hard_regno >= 0);
 	}
     }
+
+  /* Detect cases where what we added to the full_cost was not enough to force BOTTOM_REG to be avoided.
+     This can be caused by cost adjustments due to hard reg moves and allocno copies. */
+  if (leave_bottom && best_hard_regno < 16)
+    {
+      if (internal_flag_ira_verbose > 3 && ira_dump_file != NULL)
+        fprintf(ira_dump_file, "[ABCD optimisation failed to force TOP_CREG usage (a%d got %d)]\n", ALLOCNO_NUM(allocno), best_hard_regno);
+    }
+
   if (min_full_cost > mem_cost)
     {
       if (! retry_p && internal_flag_ira_verbose > 3 && ira_dump_file != NULL)
@@ -2853,14 +2943,42 @@
 ira_reassign_pseudos (int *spilled_pseudo_regs, int num,
 		      HARD_REG_SET bad_spill_regs,
 		      HARD_REG_SET *pseudo_forbidden_regs,
-		      HARD_REG_SET *pseudo_previous_regs,  bitmap spilled)
+		      HARD_REG_SET *pseudo_previous_regs,
+		      bitmap spilled)
 {
   int i, m, n, regno;
   bool changed_p;
   ira_allocno_t a, conflict_a;
   HARD_REG_SET forbidden_regs;
   ira_allocno_conflict_iterator aci;
+  bitmap temp = BITMAP_ALLOC (NULL);
 
+  /* Add pseudos which conflict with pseudos already in
+     SPILLED_PSEUDO_REGS to SPILLED_PSEUDO_REGS.  This is preferable
+     to allocating in two steps as some of the conflicts might have
+     a higher priority than the pseudos passed in SPILLED_PSEUDO_REGS.  */
+  for (i = 0; i < num; i++)
+    bitmap_set_bit (temp, spilled_pseudo_regs[i]);
+
+  for (i = 0, n = num; i < n; i++)
+    {
+      int regno = spilled_pseudo_regs[i];
+      bitmap_set_bit (temp, regno);
+
+      a = ira_regno_allocno_map[regno];
+      FOR_EACH_ALLOCNO_CONFLICT (a, conflict_a, aci)
+	if (ALLOCNO_HARD_REGNO (conflict_a) < 0
+	    && ! ALLOCNO_DONT_REASSIGN_P (conflict_a)
+	    && ! bitmap_bit_p (temp, ALLOCNO_REGNO (conflict_a)))
+	  {
+	    spilled_pseudo_regs[num++] = ALLOCNO_REGNO (conflict_a);
+	    bitmap_set_bit (temp, ALLOCNO_REGNO (conflict_a));
+	    /* ?!? This seems wrong.  */
+	    bitmap_set_bit (consideration_allocno_bitmap,
+			    ALLOCNO_NUM (conflict_a));
+	  }
+    }
+
   if (num > 1)
     qsort (spilled_pseudo_regs, num, sizeof (int), pseudo_reg_compare);
   changed_p = false;
@@ -2878,7 +2996,7 @@
       ira_assert (reg_renumber[regno] < 0);
       if (internal_flag_ira_verbose > 3 && ira_dump_file != NULL)
 	fprintf (ira_dump_file,
-		 "      Spill %d(a%d), cost=%d", regno, ALLOCNO_NUM (a),
+		 "      Try Assign %d(a%d), cost=%d", regno, ALLOCNO_NUM (a),
 		 ALLOCNO_MEMORY_COST (a)
 		 - ALLOCNO_COVER_CLASS_COST (a));
       allocno_reload_assign (a, forbidden_regs);
@@ -2887,60 +3005,8 @@
 	  CLEAR_REGNO_REG_SET (spilled, regno);
 	  changed_p = true;
 	}
-      else
-	spilled_pseudo_regs[m++] = regno;
     }
-  if (m == 0)
-    return changed_p;
-  if (internal_flag_ira_verbose > 3 && ira_dump_file != NULL)
-    {
-      fprintf (ira_dump_file, "      Spilled regs");
-      for (i = 0; i < m; i++)
-	fprintf (ira_dump_file, " %d", spilled_pseudo_regs[i]);
-      fprintf (ira_dump_file, "\n");
-    }
-  /* Try to assign hard registers to pseudos conflicting with ones
-     from SPILLED_PSEUDO_REGS.  */
-  for (i = n = 0; i < m; i++)
-    {
-      regno = spilled_pseudo_regs[i];
-      a = ira_regno_allocno_map[regno];
-      FOR_EACH_ALLOCNO_CONFLICT (a, conflict_a, aci)
-	if (ALLOCNO_HARD_REGNO (conflict_a) < 0
-	    && ! ALLOCNO_DONT_REASSIGN_P (conflict_a)
-	    && ! bitmap_bit_p (consideration_allocno_bitmap,
-			       ALLOCNO_NUM (conflict_a)))
-	  {
-	    sorted_allocnos[n++] = conflict_a;
-	    bitmap_set_bit (consideration_allocno_bitmap,
-			    ALLOCNO_NUM (conflict_a));
-	  }
-    }
-  if (n != 0)
-    {
-      setup_allocno_priorities (sorted_allocnos, n);
-      qsort (sorted_allocnos, n, sizeof (ira_allocno_t),
-	     allocno_priority_compare_func);
-      for (i = 0; i < n; i++)
-	{
-	  a = sorted_allocnos[i];
-	  regno = ALLOCNO_REGNO (a);
-	  COPY_HARD_REG_SET (forbidden_regs, bad_spill_regs);
-	  IOR_HARD_REG_SET (forbidden_regs, pseudo_forbidden_regs[regno]);
-	  IOR_HARD_REG_SET (forbidden_regs, pseudo_previous_regs[regno]);
-	  if (internal_flag_ira_verbose > 3 && ira_dump_file != NULL)
-	    fprintf (ira_dump_file,
-		     "        Try assign %d(a%d), cost=%d",
-		     regno, ALLOCNO_NUM (a),
-		     ALLOCNO_MEMORY_COST (a)
-		     - ALLOCNO_COVER_CLASS_COST (a));
-	  if (allocno_reload_assign (a, forbidden_regs))
-	    {
-	      changed_p = true;
-	      bitmap_clear_bit (spilled, regno);
-	    }
-	}
-    }
+  BITMAP_FREE (temp);
   return changed_p;
 }
 
Index: ira-build.c
===================================================================
--- ira-build.c	(revision 155343)
+++ ira-build.c	(working copy)
@@ -482,6 +482,7 @@
   ALLOCNO_LIVE_RANGES (a) = NULL;
   ALLOCNO_MIN (a) = INT_MAX;
   ALLOCNO_MAX (a) = -1;
+  ALLOCNO_REQUIRES_REGISTER_SUBSET (a) = 0;
   ALLOCNO_CONFLICT_ID (a) = ira_allocnos_num;
   VEC_safe_push (ira_allocno_t, heap, allocno_vec, a);
   ira_allocnos = VEC_address (ira_allocno_t, allocno_vec);
Index: ira-costs.c
===================================================================
--- ira-costs.c	(revision 155343)
+++ ira-costs.c	(working copy)
@@ -637,6 +637,14 @@
 	    struct costs *pp = op_costs[i], *qq = this_op_costs[i];
 	    int scale = 1 + (recog_data.operand_type[i] == OP_INOUT);
 
+            /* catch any allocnos that want bottom_regs and record this for later */
+            if (allocno_pref && allocno_pref[ALLOCNO_NUM
+				       (ira_curr_regno_allocno_map
+					[REGNO (ops[i])])] == BOTTOM_REGS)
+              {
+                ALLOCNO_REQUIRES_REGISTER_SUBSET (ira_curr_regno_allocno_map[REGNO (ops[i])]) = 1;
+              }
+
 	    pp->mem_cost = MIN (pp->mem_cost,
 				(qq->mem_cost + op_cost_add) * scale);
 
Index: reload1.c
===================================================================
--- reload1.c	(revision 155343)
+++ reload1.c	(working copy)
@@ -48,6 +48,7 @@
 #include "df.h"
 #include "target.h"
 #include "emit-rtl.h"
+#include "ira-int.h"
 
 /* This file contains the reload pass of the compiler, which is
    run after register allocation has been done.  It checks that
@@ -5720,6 +5721,48 @@
   return 0;
 }
 
+/* Return nonzero if REGNO is a particularly bad choice for reloading X.
+   (Part of the Jeff Law hack) */
+static int
+ira_bad_reload_regno_1 (int regno, rtx x)
+{
+   int x_regno;
+   ira_allocno_t a;
+   enum reg_class pref;
+
+   /* We only deal with pseudo regs.  */
+   if (! x || GET_CODE (x) != REG)
+     return 0;
+
+   x_regno = REGNO (x);
+   if (x_regno < FIRST_PSEUDO_REGISTER)
+     return 0;
+
+   /* If the pseudo prefers REGNO explicitly, then do not consider
+      REGNO a bad spill choice.  */
+   pref = reg_preferred_class (x_regno);
+   if (reg_class_size[pref] == 1
+&& TEST_HARD_REG_BIT (reg_class_contents[pref], regno))
+     return 0;
+
+   /* If the pseudo conflicts with REGNO, then we consider REGNO a
+      poor choice for a reload regno.  */
+   a = ira_regno_allocno_map[x_regno];
+   if (TEST_HARD_REG_BIT (ALLOCNO_TOTAL_CONFLICT_HARD_REGS (a), regno))
+     return 1;
+
+   return 0;
+}
+
+/* Return nonzero if REGNO is a particularly bad choice for reloading
+    IN or OUT.  (Part of the Jeff Law hack) */
+int
+ira_bad_reload_regno (int regno, rtx in, rtx out)
+{
+   return (ira_bad_reload_regno_1 (regno, in)
+           || ira_bad_reload_regno_1 (regno, out));
+}
+
 /* Find a spill register to use as a reload register for reload R.
    LAST_RELOAD is nonzero if this is the last reload for the insn being
    processed.
@@ -5761,8 +5804,11 @@
      run out of reload regs.  Suppose we have three reloads, and
      reloads A and B can share regs.  These need two regs.
      Suppose A and B are given different regs.
-     That leaves none for C.  */
-  for (pass = 0; pass < 2; pass++)
+     That leaves none for C.
+
+     NOTE: There are now three passes in accordance with the Jeff
+     Law hack.  */
+  for (pass = 0; pass < 3; pass++)
     {
       /* I is the index in spill_regs.
 	 We advance it round-robin between insns to use all spill regs
@@ -5801,6 +5847,11 @@
 		      && ! TEST_HARD_REG_BIT (reload_reg_used_for_inherit,
 					      regnum))))
 	    {
+              /* BEGIN: Jeff Law's hack */
+              if (pass == 1
+                  && ira_bad_reload_regno (regnum, rld[r].in, rld[r].out))
+                continue;
+              /* END: Jeff Law's hack */
 	      int nr = hard_regno_nregs[regnum][rld[r].mode];
 	      /* Avoid the problem where spilling a GENERAL_OR_FP_REG
 		 (on 68000) got us two FP regs.  If NR is 1,

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2010-01-11 14:34 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-12-14 11:15 Understanding IRA (The Story so far) Ian Bolton
2009-12-18 15:33 ` Possible IRA improvements for irregular register architectures Ian Bolton
2010-01-04 14:19   ` Ian Bolton
2010-01-11 14:34     ` Ian Bolton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).