* RE: Understanding IRA (The Story so far) @ 2009-12-14 11:15 Ian Bolton 2009-12-18 15:33 ` Possible IRA improvements for irregular register architectures Ian Bolton 0 siblings, 1 reply; 4+ messages in thread From: Ian Bolton @ 2009-12-14 11:15 UTC (permalink / raw) To: gcc For those who have been following my adventure in the world of IRA, or just want to take advantage of some of the time I've spent on this, here's a detailed recap of my exploits and my current status (which I'm very happy with): I initially thought IRA was calculating costs wrongly for our architecture and so I experimented with adding more costs in different places. This effectively raised the priority of coloring certain allocnos and it led to good performance on some benchmarks, but it was really a complete hack and not an acceptable solution, so I went back to the drawing board. After chatting with Vlad and Jeff, I realised IRA was calculating costs correctly and the issue was that the costs only reflect the preferences of each allocno, as opposed to their "agnosticism" (i.e. not caring what register they get), and so instructions that will happily take any register would take whichever one you showed them first; this meant that high priority agnostic instructions (that are colored first) were pinching BOTTOM_REGS before the lower-priority special instructions got a chance. To address this, I came up with the simple solution of changing the REG_ALLOC_ORDER, where I show TOP_CREGS (the top half of our register set) to agnostic instructions before the callee-save BOTTOM_REGS (c7-c15) by default, so they take TOP_CREGS when all other costs are equal; meanwhile, allocnos requiring BOTTOM_REGS were given them because they'd been left unallocated and their costs were lower than the TOP_CREGS. This resulted in good performance improvements on some key benchmarks, where previously agnostic instructions were unwittingly using up these special BOTTOM_REGS and leading to us allocating TOP_CREGS for operands in special instructions (which then required spills and moves to get a BOTTOM_REG in time to do the special instruction). The main drawback of this alternate REG_ALLOC_ORDER was that we reduce the chance to reuse a callee-save register if we pick a TOP_CREG, because it doesn't work on all instructions, and I ended up getting regressions on some low-pressure benchmarks because we used more callee-save registers than previously. I then tried various other ways of achieving the performance improvements I got on the high register pressure benchmarks and eliminating the regressions on the low-pressure ones: * I tried using '?' constraints on agnostic instructions, such that they erred to TOP_CREGS instead of callee-save BOTTOM_REGS, but this led to too much bias away from the more reusable BOTTOM_REGS (even when there was no need to avoid them) and I subsequently got regressions; * I tried restricting the effect of the '?', by not using it for the register preferencing pass and by only adding on the alt_cost if I'd determined the allocno never requires BOTTOM_REG, and this got some decent performance, but I was still effectively penalising agnostic instructions for using BOTTOM_REGS regardless of whether others allocnos were demanding them, so I got regressions; * I tried coloring the ones that need BOTTOM_REGS first, which maximises the reuse opportunities, but this was essentially a rejection of the existing tried-and-tested coloring heuristics in IRA and consequently led to some regressions; * I tried combining the new REG_ALLOC_ORDER with a modified coloring heuristic that uses the need for BOTTOM_REGS as a tie-breaker when two allocnos are otherwise indistinguishable, but this was overriding the ALLOCNO_NUM ordering (which I guess represents live range starting points?), so I got some regressions. Fortunately, I was hit by another epiphany on Monday night and I realised I could emulate the new REG_ALLOC_ORDER dynamically within assign_hard_reg and only when necessary. When allocating for an agnostic allocno, if it conflicted with other allocnos that needed BOTTOM_REGS, I added 1 to the full_cost of each of the BOTTOM_REGS so this allocno took a TOP_CREG. This gave me my best performance overall, but there were still some regressions on low-pressure benchmarks. Whilst investigating them, I realised I had gone beyond my original new REG_ALLOC_ORDER because I was now avoiding all BOTTOM_REGS instead of just the callee-save BOTTOM_REGS, so I changed assign_hard_reg so it only adds 1 to each callee-save BOTTOM_REG. This means we happily use up the caller-save BOTTOM_REGS (c1 - c6), regardless of whether the allocno is agnostic or not, and this leads to more reusability in low-pressure cases (whereas avoiding all BOTTOM_REGS for an agnostic instruction decreases reusability). Note that I had been experimenting with Priority algorithm and Chaitin-Briggs up until this point and have decided to stick with Priority for now because we are using Priority already and this minimises the disruption and reduces the risk of regressions on untested inputs. (Results generally show CB to be better performing and smaller code-size, but it suffers badly in very high pressure cases and spills more than Priority.) As of yesterday, I had two versions of the Priority algorithm IRA: one that biases against all BOTTOM_REGS when an agnostic allocno conflicts with one that wants BOTTOM_REGS; and one that biases against only the callee-save BOTTOM_REGS. The former got me better performance on high-pressure benchmarks and the latter got me better performance on the ones with some low pressure. Either were an improvement on what I started with (default IRA), but I wanted to have the best of both worlds and so I needed a hybrid that biases against only callee-save BOTTOM_REGS for low-pressure and biases against all BOTTOM_REGS for high-pressure. Sure, I could have put in some ugly hack that looks at the register pressure number and picks one or the other, but instead I came up with a more intelligent solution, which takes advantage of our single caller-save TOP_CREG (c16), which appears after the caller-save BOTTOM_REGS in our REG_ALLOC_ORDER. Now, when an agnostic allocno conflicts with one that wants BOTTOM_REGS, I add 1 to the cost of c16 too, so we still give out c1, c2 etc early on (because they have lower costs than all the callee-save regs and they appear earlier in the REG_ALLOC_ORDER than c16) to whichever allocno gets there first. But later, when all the caller-save regs are allocated, we will use our first callee-save reg (agnostic allocnos will take c17, ones that need BOTTOM_REGS will take c7). At this point, the future cost of this allocated callee-save reg becomes cheaper in future than the caller-save regs, and we opt to use this in preference to the caller-save regs (including c1-c6). The result is that, for high pressure, we are holding back all the BOTTOM_REGS for those that need them. The performance of this version is my best ever and I am very happy with it. It doesn't quite get the highest performance I've seen on one of the high- pressure benchmarks, because we may still be giving up some BOTTOM_REGS to agnostic instructions early on, but it's pretty pretty good otherwise. Thanks for reading and happy allocating! ^ permalink raw reply [flat|nested] 4+ messages in thread
* Possible IRA improvements for irregular register architectures 2009-12-14 11:15 Understanding IRA (The Story so far) Ian Bolton @ 2009-12-18 15:33 ` Ian Bolton 2010-01-04 14:19 ` Ian Bolton 0 siblings, 1 reply; 4+ messages in thread From: Ian Bolton @ 2009-12-18 15:33 UTC (permalink / raw) To: gcc [-- Attachment #1: Type: text/plain, Size: 2820 bytes --] Let's assume I have two sub-classes of ALL_REGS: BOTTOM_REGS (c0-c15) and TOP_CREGS (c16-c31). Let's also assume I have two main types of instruction: A-type Instructions, which can use ALL 32 registers, and B-type Instructions, which can only use the 16 BOTTOM_REGS. IRA will correctly calculate costs (in ira-costs.c) for allocnos appearing in B-type instructions, such that TOP_CREGS has a higher cost than BOTTOM_REGS. It will also calculate costs for the A-type instructions such that TOP_CREGS and BOTTOM_REGS have the same cost. The order of coloring will be determined by the algorithm chosen: Priority or Chaitin-Briggs. As each allocno is colored, the costs will be inspected and the best available hard register will be chosen, mainly based on the register class costs mentioned above, so allocnos in B-type Instructions will usually be assigned a BOTTOM_REG if one is free. If two or more hard registers share the same cost then whichever one appears first in the REG_ALLOC_ORDER will be assigned. (If no hard register can be found, the allocno is assigned memory and will require a "reload" in a later pass to get a hard register.) I do not wish to alter the coloring algorithms or the coloring order. I believe they are good at determing the order to color allocnos, which dictates the likelihood of being assigned a hard register. What I wish to change is the hard register that is assigned, given that the coloring order has determined that this allocno should get one next. Why do I need to do this? Since the BOTTOM_REGS can be used by all instructions, it makes sense to put them first in the REG_ALLOC_ORDER, so we minimise the number of registers consumed by a low-pressure function. But it also makes sense, in high-pressure functions, to steer A-type Instructions away from using BOTTOM_REGS so that they are free for B-type Instructions to use. To achieve this, I tried altering the costs calculated in ira-costs.c, either explicitly with various hacks or by altering operand constraints. The problem with this approach was that it is static and independent, occurring before any coloring order has been determined and without any awareness of the needs of other allocnos. I believe I require a dynamic way to alter the costs, based on which allocnos conflict with the allocno currently being colored and which hard registers are still available at this point. The patch I have attached here is my first reasonable successful attempt at this dynamic approach, which has led to performance improvements on some of our benchmarks and no significant regressions. I am hoping it will be useful to others, but I post it more as a talking point or perhaps to inspire others to come up with better solutions and share them with me :-) [-- Attachment #2: bolton_ira_subset_experiments.diff --] [-- Type: application/octet-stream, Size: 11903 bytes --] Index: toplev.c =================================================================== --- toplev.c (revision 155343) +++ toplev.c (working copy) @@ -287,7 +287,7 @@ /* Set the default region and algorithm for the integrated register allocator. */ -enum ira_algorithm flag_ira_algorithm = IRA_ALGORITHM_CB; +enum ira_algorithm flag_ira_algorithm = IRA_ALGORITHM_PRIORITY; enum ira_region flag_ira_region = IRA_REGION_MIXED; /* Set the default value for -fira-verbose. */ Index: ira-int.h =================================================================== --- ira-int.h (revision 155343) +++ ira-int.h (working copy) @@ -418,6 +418,9 @@ ira_allocno_t prev_bucket_allocno; /* Used for temporary purposes. */ int temp; + /* make note that this allocno can only use a subset of ALL_REGS. + e.g. BOTTOM_REGS */ + int requires_register_subset; }; /* All members of the allocno structures should be accessed only @@ -481,6 +484,7 @@ #define ALLOCNO_MIN(A) ((A)->min) #define ALLOCNO_MAX(A) ((A)->max) #define ALLOCNO_CONFLICT_ID(A) ((A)->conflict_id) +#define ALLOCNO_REQUIRES_REGISTER_SUBSET(A) ((A)->requires_register_subset) /* Map regno -> allocnos with given regno (see comments for allocno member `next_regno_allocno'). */ Index: ira-color.c =================================================================== --- ira-color.c (revision 155343) +++ ira-color.c (working copy) @@ -452,6 +452,7 @@ #ifdef STACK_REGS bool no_stack_reg_p; #endif + int leave_bottom = 0; ira_assert (! ALLOCNO_ASSIGNED_P (allocno)); cover_class = ALLOCNO_COVER_CLASS (allocno); @@ -502,6 +503,20 @@ ALLOCNO_NUM (conflict_allocno))) { conflict_cover_class = ALLOCNO_COVER_CLASS (conflict_allocno); + + /* If we don't need the register subset, but a conflicting allocno does, + we should leave the bottom regs, to ensure that allocno gets what it needs + No change is made in reload pass, however. */ + if (!retry_p && + !ALLOCNO_ASSIGNED_P (conflict_allocno) && + (ALLOCNO_REQUIRES_REGISTER_SUBSET (a) == 0) && + (ALLOCNO_REQUIRES_REGISTER_SUBSET (conflict_allocno))) + { + /* WE DON'T WANT BOTTOM, BUT THEY DO ... + .... SO INCREASE THE COSTS FOR BOTTOM_REGS SO WE ERR TOWARDS TOP_REGS */ + leave_bottom += 1; + } + ira_assert (ira_reg_classes_intersect_p [cover_class][conflict_cover_class]); if (allocno_coalesced_p) @@ -600,6 +615,27 @@ } if (min_cost > cost) min_cost = cost; + + /* Nudge the cost of this reg up because someone else wants this one. + + Note: if we only penalised BOTTOM_REGS then we would skip over c2, c3, c4, c5, c6 + and go straight for c16 because it is cheaper than currently non-allocated + callee-save TOP_CREGS. By penalising c16 too, the cost for c2, c3, c4, c5, + c6 and c16 will all be the same AND lower than any of the currently + unallocated TOP_CREGS, so we will take the lowest number (c2). This is good + for limiting the registers we use up in low-pressure cases. */ + if (leave_bottom) + { + /* Always add 1 to full_cost of BOTTOM_REGS + Only add 1 to full_cost of c16 when c17 (the first callee-save TOP_CREG) has + not yet been allocated. This effectively means anyone can get c1-c6 until we + grab our first callee-save TOP_CREG; after that, we try to reserve c1-c6 for + instructions that want them specifically by encouraging use of c16, c17 first. */ + if (hard_regno < 16 || + ((hard_regno == 16) && (!allocated_hardreg_p[17]))) + full_cost += 1; + } + if (min_full_cost > full_cost) { min_full_cost = full_cost; @@ -2853,14 +2889,42 @@ ira_reassign_pseudos (int *spilled_pseudo_regs, int num, HARD_REG_SET bad_spill_regs, HARD_REG_SET *pseudo_forbidden_regs, - HARD_REG_SET *pseudo_previous_regs, bitmap spilled) + HARD_REG_SET *pseudo_previous_regs, + bitmap spilled) { int i, m, n, regno; bool changed_p; ira_allocno_t a, conflict_a; HARD_REG_SET forbidden_regs; ira_allocno_conflict_iterator aci; + bitmap temp = BITMAP_ALLOC (NULL); + /* Add pseudos which conflict with pseudos already in + SPILLED_PSEUDO_REGS to SPILLED_PSEUDO_REGS. This is preferable + to allocating in two steps as some of the conflicts might have + a higher priority than the pseudos passed in SPILLED_PSEUDO_REGS. */ + for (i = 0; i < num; i++) + bitmap_set_bit (temp, spilled_pseudo_regs[i]); + + for (i = 0, n = num; i < n; i++) + { + int regno = spilled_pseudo_regs[i]; + bitmap_set_bit (temp, regno); + + a = ira_regno_allocno_map[regno]; + FOR_EACH_ALLOCNO_CONFLICT (a, conflict_a, aci) + if (ALLOCNO_HARD_REGNO (conflict_a) < 0 + && ! ALLOCNO_DONT_REASSIGN_P (conflict_a) + && ! bitmap_bit_p (temp, ALLOCNO_REGNO (conflict_a))) + { + spilled_pseudo_regs[num++] = ALLOCNO_REGNO (conflict_a); + bitmap_set_bit (temp, ALLOCNO_REGNO (conflict_a)); + /* ?!? This seems wrong. */ + bitmap_set_bit (consideration_allocno_bitmap, + ALLOCNO_NUM (conflict_a)); + } + } + if (num > 1) qsort (spilled_pseudo_regs, num, sizeof (int), pseudo_reg_compare); changed_p = false; @@ -2878,7 +2942,7 @@ ira_assert (reg_renumber[regno] < 0); if (internal_flag_ira_verbose > 3 && ira_dump_file != NULL) fprintf (ira_dump_file, - " Spill %d(a%d), cost=%d", regno, ALLOCNO_NUM (a), + " Try Assign %d(a%d), cost=%d", regno, ALLOCNO_NUM (a), ALLOCNO_MEMORY_COST (a) - ALLOCNO_COVER_CLASS_COST (a)); allocno_reload_assign (a, forbidden_regs); @@ -2887,60 +2951,8 @@ CLEAR_REGNO_REG_SET (spilled, regno); changed_p = true; } - else - spilled_pseudo_regs[m++] = regno; } - if (m == 0) - return changed_p; - if (internal_flag_ira_verbose > 3 && ira_dump_file != NULL) - { - fprintf (ira_dump_file, " Spilled regs"); - for (i = 0; i < m; i++) - fprintf (ira_dump_file, " %d", spilled_pseudo_regs[i]); - fprintf (ira_dump_file, "\n"); - } - /* Try to assign hard registers to pseudos conflicting with ones - from SPILLED_PSEUDO_REGS. */ - for (i = n = 0; i < m; i++) - { - regno = spilled_pseudo_regs[i]; - a = ira_regno_allocno_map[regno]; - FOR_EACH_ALLOCNO_CONFLICT (a, conflict_a, aci) - if (ALLOCNO_HARD_REGNO (conflict_a) < 0 - && ! ALLOCNO_DONT_REASSIGN_P (conflict_a) - && ! bitmap_bit_p (consideration_allocno_bitmap, - ALLOCNO_NUM (conflict_a))) - { - sorted_allocnos[n++] = conflict_a; - bitmap_set_bit (consideration_allocno_bitmap, - ALLOCNO_NUM (conflict_a)); - } - } - if (n != 0) - { - setup_allocno_priorities (sorted_allocnos, n); - qsort (sorted_allocnos, n, sizeof (ira_allocno_t), - allocno_priority_compare_func); - for (i = 0; i < n; i++) - { - a = sorted_allocnos[i]; - regno = ALLOCNO_REGNO (a); - COPY_HARD_REG_SET (forbidden_regs, bad_spill_regs); - IOR_HARD_REG_SET (forbidden_regs, pseudo_forbidden_regs[regno]); - IOR_HARD_REG_SET (forbidden_regs, pseudo_previous_regs[regno]); - if (internal_flag_ira_verbose > 3 && ira_dump_file != NULL) - fprintf (ira_dump_file, - " Try assign %d(a%d), cost=%d", - regno, ALLOCNO_NUM (a), - ALLOCNO_MEMORY_COST (a) - - ALLOCNO_COVER_CLASS_COST (a)); - if (allocno_reload_assign (a, forbidden_regs)) - { - changed_p = true; - bitmap_clear_bit (spilled, regno); - } - } - } + BITMAP_FREE (temp); return changed_p; } Index: ira-build.c =================================================================== --- ira-build.c (revision 155343) +++ ira-build.c (working copy) @@ -482,6 +482,7 @@ ALLOCNO_LIVE_RANGES (a) = NULL; ALLOCNO_MIN (a) = INT_MAX; ALLOCNO_MAX (a) = -1; + ALLOCNO_REQUIRES_REGISTER_SUBSET (a) = 0; ALLOCNO_CONFLICT_ID (a) = ira_allocnos_num; VEC_safe_push (ira_allocno_t, heap, allocno_vec, a); ira_allocnos = VEC_address (ira_allocno_t, allocno_vec); Index: ira-costs.c =================================================================== --- ira-costs.c (revision 155343) +++ ira-costs.c (working copy) @@ -637,6 +637,14 @@ struct costs *pp = op_costs[i], *qq = this_op_costs[i]; int scale = 1 + (recog_data.operand_type[i] == OP_INOUT); + /* catch any allocnos that want bottom_regs and record this for later */ + if (allocno_pref && allocno_pref[ALLOCNO_NUM + (ira_curr_regno_allocno_map + [REGNO (ops[i])])] == BOTTOM_REGS) + { + ALLOCNO_REQUIRES_REGISTER_SUBSET (ira_curr_regno_allocno_map[REGNO (ops[i])]) = 1; + } + pp->mem_cost = MIN (pp->mem_cost, (qq->mem_cost + op_cost_add) * scale); Index: reload1.c =================================================================== --- reload1.c (revision 155343) +++ reload1.c (working copy) @@ -48,6 +48,7 @@ #include "df.h" #include "target.h" #include "emit-rtl.h" +#include "ira-int.h" /* This file contains the reload pass of the compiler, which is run after register allocation has been done. It checks that @@ -5720,6 +5721,48 @@ return 0; } +/* Return nonzero if REGNO is a particularly bad choice for reloading X. + (Part of the Jeff Law hack) */ +static int +ira_bad_reload_regno_1 (int regno, rtx x) +{ + int x_regno; + ira_allocno_t a; + enum reg_class pref; + + /* We only deal with pseudo regs. */ + if (! x || GET_CODE (x) != REG) + return 0; + + x_regno = REGNO (x); + if (x_regno < FIRST_PSEUDO_REGISTER) + return 0; + + /* If the pseudo prefers REGNO explicitly, then do not consider + REGNO a bad spill choice. */ + pref = reg_preferred_class (x_regno); + if (reg_class_size[pref] == 1 +&& TEST_HARD_REG_BIT (reg_class_contents[pref], regno)) + return 0; + + /* If the pseudo conflicts with REGNO, then we consider REGNO a + poor choice for a reload regno. */ + a = ira_regno_allocno_map[x_regno]; + if (TEST_HARD_REG_BIT (ALLOCNO_TOTAL_CONFLICT_HARD_REGS (a), regno)) + return 1; + + return 0; +} + +/* Return nonzero if REGNO is a particularly bad choice for reloading + IN or OUT. (Part of the Jeff Law hack) */ +int +ira_bad_reload_regno (int regno, rtx in, rtx out) +{ + return (ira_bad_reload_regno_1 (regno, in) + || ira_bad_reload_regno_1 (regno, out)); +} + /* Find a spill register to use as a reload register for reload R. LAST_RELOAD is nonzero if this is the last reload for the insn being processed. @@ -5761,8 +5804,11 @@ run out of reload regs. Suppose we have three reloads, and reloads A and B can share regs. These need two regs. Suppose A and B are given different regs. - That leaves none for C. */ - for (pass = 0; pass < 2; pass++) + That leaves none for C. + + NOTE: There are now three passes in accordance with the Jeff + Law hack. */ + for (pass = 0; pass < 3; pass++) { /* I is the index in spill_regs. We advance it round-robin between insns to use all spill regs @@ -5801,6 +5847,11 @@ && ! TEST_HARD_REG_BIT (reload_reg_used_for_inherit, regnum)))) { + /* BEGIN: Jeff Law's hack */ + if (pass == 1 + && ira_bad_reload_regno (regnum, rld[r].in, rld[r].out)) + continue; + /* END: Jeff Law's hack */ int nr = hard_regno_nregs[regnum][rld[r].mode]; /* Avoid the problem where spilling a GENERAL_OR_FP_REG (on 68000) got us two FP regs. If NR is 1, ^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: Possible IRA improvements for irregular register architectures 2009-12-18 15:33 ` Possible IRA improvements for irregular register architectures Ian Bolton @ 2010-01-04 14:19 ` Ian Bolton 2010-01-11 14:34 ` Ian Bolton 0 siblings, 1 reply; 4+ messages in thread From: Ian Bolton @ 2010-01-04 14:19 UTC (permalink / raw) To: gcc Happy New Year! I was hoping for some kind of response to this, but maybe I didn't give enough info? I'd appreciate some pointers on what I could do to prompt some discussion because I have some promising new ideas that lead on from what I've described below. Cheers, Ian > -----Original Message----- > From: gcc-owner@gcc.gnu.org [mailto:gcc-owner@gcc.gnu.org] On Behalf Of > Ian Bolton > Sent: 18 December 2009 15:34 > To: gcc@gcc.gnu.org > Subject: Possible IRA improvements for irregular register architectures > > Let's assume I have two sub-classes of ALL_REGS: BOTTOM_REGS (c0-c15) > and TOP_CREGS (c16-c31). > > Let's also assume I have two main types of instruction: A-type > Instructions, which can use ALL 32 registers, and B-type Instructions, > which can only use the 16 BOTTOM_REGS. > > IRA will correctly calculate costs (in ira-costs.c) for allocnos > appearing in B-type instructions, such that TOP_CREGS has a higher > cost than BOTTOM_REGS. It will also calculate costs for the A-type > instructions such that TOP_CREGS and BOTTOM_REGS have the same cost. > > The order of coloring will be determined by the algorithm chosen: > Priority or Chaitin-Briggs. As each allocno is colored, the costs > will be inspected and the best available hard register will be chosen, > mainly based on the register class costs mentioned above, so allocnos > in B-type Instructions will usually be assigned a BOTTOM_REG if one is > free. If two or more hard registers share the same cost then > whichever one appears first in the REG_ALLOC_ORDER will be assigned. > (If no hard register can be found, the allocno is assigned memory and > will require a "reload" in a later pass to get a hard register.) > > I do not wish to alter the coloring algorithms or the coloring order. > I believe they are good at determing the order to color allocnos, > which dictates the likelihood of being assigned a hard register. What > I wish to change is the hard register that is assigned, given that the > coloring order has determined that this allocno should get one next. > > Why do I need to do this? Since the BOTTOM_REGS can be used by all > instructions, it makes sense to put them first in the REG_ALLOC_ORDER, > so we minimise the number of registers consumed by a low-pressure > function. But it also makes sense, in high-pressure functions, to > steer A-type Instructions away from using BOTTOM_REGS so that they are > free for B-type Instructions to use. > > To achieve this, I tried altering the costs calculated in ira-costs.c, > either explicitly with various hacks or by altering operand > constraints. The problem with this approach was that it is static and > independent, occurring before any coloring order has been determined > and without any awareness of the needs of other allocnos. I believe I > require a dynamic way to alter the costs, based on which allocnos > conflict with the allocno currently being colored and which hard > registers are still available at this point. > > The patch I have attached here is my first reasonable successful > attempt at this dynamic approach, which has led to performance > improvements on some of our benchmarks and no significant > regressions. > > I am hoping it will be useful to others, but I post it more as a > talking point or perhaps to inspire others to come up with better > solutions and share them with me :-) ^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: Possible IRA improvements for irregular register architectures 2010-01-04 14:19 ` Ian Bolton @ 2010-01-11 14:34 ` Ian Bolton 0 siblings, 0 replies; 4+ messages in thread From: Ian Bolton @ 2010-01-11 14:34 UTC (permalink / raw) To: gcc [-- Attachment #1: Type: text/plain, Size: 5407 bytes --] In the absence of any discussion, I just went ahead and implemented my new ideas, which are attached as a patch! Whilst it is tailored to the specifics of our architecture - where we have 32 C_REGS, of which there are 16 BOTTOM_REGS, and some insns that only work with BOTTOM_REGS - I believe it could be adapted to other architectures where you want to optimise the allocation of any special subsets of your register set. The key file to look at is ira-color.c. I have something called "determine_max_abcd", which considers all conflicting allocnos for the one currently being colored, and finds the maximum depth of overlap of those yet-to-be-colored allocnos that require BOTTOM_REGS and those already-colored allocnos that have already been assigned a BOTTOM_REG. I use this to tell me whether to give out a BOTTOM_REG to an allocno that can use any of our 32 registers. If the ABCD is greater than or equal to the number of BOTTOM_REGS then I adjust the costs of the BOTTOM_REGS to nudge this allocno towards using a TOP_CREG instead. Without this intervention, a future allocno that needed BOTTOM_REGS would not have got one and we would need to plant a spill and fill later to satisfy the needs of its instruction. This improves performance on all benchmarks I have run on our architecture. I still have other ideas for making it even better, but I figured now was a good time to share what I have. Comments and questions would be greatly appreciated. Cheers, Ian > -----Original Message----- > From: gcc-owner@gcc.gnu.org [mailto:gcc-owner@gcc.gnu.org] On Behalf Of > Ian Bolton > Sent: 04 January 2010 14:19 > To: gcc@gcc.gnu.org > Subject: RE: Possible IRA improvements for irregular register > architectures > > Happy New Year! > > I was hoping for some kind of response to this, but maybe I didn't give > enough info? I'd appreciate some pointers on what I could do to prompt > some discussion because I have some promising new ideas that lead on > from what I've described below. > > Cheers, > Ian > > > -----Original Message----- > > From: gcc-owner@gcc.gnu.org [mailto:gcc-owner@gcc.gnu.org] On Behalf > Of > > Ian Bolton > > Sent: 18 December 2009 15:34 > > To: gcc@gcc.gnu.org > > Subject: Possible IRA improvements for irregular register > architectures > > > > Let's assume I have two sub-classes of ALL_REGS: BOTTOM_REGS (c0-c15) > > and TOP_CREGS (c16-c31). > > > > Let's also assume I have two main types of instruction: A-type > > Instructions, which can use ALL 32 registers, and B-type > Instructions, > > which can only use the 16 BOTTOM_REGS. > > > > IRA will correctly calculate costs (in ira-costs.c) for allocnos > > appearing in B-type instructions, such that TOP_CREGS has a higher > > cost than BOTTOM_REGS. It will also calculate costs for the A-type > > instructions such that TOP_CREGS and BOTTOM_REGS have the same cost. > > > > The order of coloring will be determined by the algorithm chosen: > > Priority or Chaitin-Briggs. As each allocno is colored, the costs > > will be inspected and the best available hard register will be > chosen, > > mainly based on the register class costs mentioned above, so allocnos > > in B-type Instructions will usually be assigned a BOTTOM_REG if one > is > > free. If two or more hard registers share the same cost then > > whichever one appears first in the REG_ALLOC_ORDER will be assigned. > > (If no hard register can be found, the allocno is assigned memory and > > will require a "reload" in a later pass to get a hard register.) > > > > I do not wish to alter the coloring algorithms or the coloring order. > > I believe they are good at determing the order to color allocnos, > > which dictates the likelihood of being assigned a hard register. > What > > I wish to change is the hard register that is assigned, given that > the > > coloring order has determined that this allocno should get one next. > > > > Why do I need to do this? Since the BOTTOM_REGS can be used by all > > instructions, it makes sense to put them first in the > REG_ALLOC_ORDER, > > so we minimise the number of registers consumed by a low-pressure > > function. But it also makes sense, in high-pressure functions, to > > steer A-type Instructions away from using BOTTOM_REGS so that they > are > > free for B-type Instructions to use. > > > > To achieve this, I tried altering the costs calculated in ira- > costs.c, > > either explicitly with various hacks or by altering operand > > constraints. The problem with this approach was that it is static > and > > independent, occurring before any coloring order has been determined > > and without any awareness of the needs of other allocnos. I believe > I > > require a dynamic way to alter the costs, based on which allocnos > > conflict with the allocno currently being colored and which hard > > registers are still available at this point. > > > > The patch I have attached here is my first reasonable successful > > attempt at this dynamic approach, which has led to performance > > improvements on some of our benchmarks and no significant > > regressions. > > > > I am hoping it will be useful to others, but I post it more as a > > talking point or perhaps to inspire others to come up with better > > solutions and share them with me :-) [-- Attachment #2: bolton_ira_subset_ABCD_experiments.diff --] [-- Type: application/octet-stream, Size: 13977 bytes --] Index: toplev.c =================================================================== --- toplev.c (revision 155343) +++ toplev.c (working copy) @@ -287,7 +287,7 @@ /* Set the default region and algorithm for the integrated register allocator. */ -enum ira_algorithm flag_ira_algorithm = IRA_ALGORITHM_CB; +enum ira_algorithm flag_ira_algorithm = IRA_ALGORITHM_PRIORITY; enum ira_region flag_ira_region = IRA_REGION_MIXED; /* Set the default value for -fira-verbose. */ Index: ira-int.h =================================================================== --- ira-int.h (revision 155343) +++ ira-int.h (working copy) @@ -418,6 +418,9 @@ ira_allocno_t prev_bucket_allocno; /* Used for temporary purposes. */ int temp; + /* make note that this allocno can only use a subset of ALL_REGS. + e.g. BOTTOM_REGS */ + int requires_register_subset; }; /* All members of the allocno structures should be accessed only @@ -481,6 +484,7 @@ #define ALLOCNO_MIN(A) ((A)->min) #define ALLOCNO_MAX(A) ((A)->max) #define ALLOCNO_CONFLICT_ID(A) ((A)->conflict_id) +#define ALLOCNO_REQUIRES_REGISTER_SUBSET(A) ((A)->requires_register_subset) /* Map regno -> allocnos with given regno (see comments for allocno member `next_regno_allocno'). */ Index: ira-color.c =================================================================== --- ira-color.c (revision 155343) +++ ira-color.c (working copy) @@ -429,6 +429,65 @@ } } + +/* + Calculates the maximum ALLOCNO BOTTOM_REG CONFLICT DEPTH. + + Examine the entire live range for this allocno and look for one or more points + where taking a BOTTOM_REG for this allocno will deprive a yet-to-be-colored + allocno that needs BOTTOM_REGS. I identify these points by calculating the + "Allocno BOTTOM_REG Conflict Depth" (ABCD), which is the sum of the number of + yet-to-be-colored allocnos that require BOTTOM_REGS and the number of + BOTTOM_REGS already allocated to allocnos that conflict with this one. + + If the max ABCD across the live range is 15 or greater (the number of non-fixed + BOTTOM_REGS) then we should not take a BOTTOM_REG. +*/ +static int determine_max_abcd (ira_allocno_t a) +{ + allocno_live_range_t r; + ira_allocno_t conflict_allocno; + ira_allocno_conflict_iterator aci; + + /* if this one needs BOTTOM_REGS, the ABCD is irrelevant so return 0 */ + if (ALLOCNO_REQUIRES_REGISTER_SUBSET (a)) + return 0; + + int max = 0; + for (r = ALLOCNO_LIVE_RANGES (a); r != NULL; r = r->next) + { + int prog_point; + for (prog_point = r->finish; prog_point >= r->start; prog_point--) + { + int depth = 0; + FOR_EACH_ALLOCNO_CONFLICT (a, conflict_allocno, aci) + { + allocno_live_range_t s; + if (ALLOCNO_REQUIRES_REGISTER_SUBSET (conflict_allocno) || + (ALLOCNO_ASSIGNED_P (conflict_allocno) && (ALLOCNO_HARD_REGNO (conflict_allocno) < 16))) + { + for (s = ALLOCNO_LIVE_RANGES (conflict_allocno); s != NULL; s = s->next) + { + if (s->finish >= prog_point && s->start <= prog_point) + { + depth++; + break; + } + } + } + } + + if (max < depth) + { + max = depth; + } + } + } + + return max; +} + + /* Choose a hard register for ALLOCNO (or for all coalesced allocnos represented by ALLOCNO). If RETRY_P is TRUE, it means that the function called from function `ira_reassign_conflict_allocnos' and @@ -452,6 +511,7 @@ #ifdef STACK_REGS bool no_stack_reg_p; #endif + int leave_bottom = 0; ira_assert (! ALLOCNO_ASSIGNED_P (allocno)); cover_class = ALLOCNO_COVER_CLASS (allocno); @@ -494,6 +554,17 @@ costs[i] += cost; full_costs[i] += cost; } + + + // let's determine the "ALLOCNO_BOTTOM_CONFLICT_DEPTH" + // If it's 15 or more then we cannot take a BOTTOM_REG this time + if (determine_max_abcd(a) >= 15) + { + leave_bottom = 1; + if (internal_flag_ira_verbose > 3 && ira_dump_file != NULL) + fprintf(ira_dump_file, "[ABCD optimisation triggered for a%d]\n", ALLOCNO_NUM(a)); + } + /* Take preferences of conflicting allocnos into account. */ FOR_EACH_ALLOCNO_CONFLICT (a, conflict_allocno, aci) /* Reload can give another class so we need to check all @@ -502,7 +573,8 @@ ALLOCNO_NUM (conflict_allocno))) { conflict_cover_class = ALLOCNO_COVER_CLASS (conflict_allocno); - ira_assert (ira_reg_classes_intersect_p + + ira_assert (ira_reg_classes_intersect_p [cover_class][conflict_cover_class]); if (allocno_coalesced_p) { @@ -600,6 +672,15 @@ } if (min_cost > cost) min_cost = cost; + + /* Nudge the cost of this reg up because someone else wants this one. */ + if (leave_bottom && hard_regno < 16) + { + /* we need to add on the add_cost so that caller-save BOTTOM_REGS end up more + expensive than callee-save BOTTOM_REGS */ + full_cost += (1 + add_cost); + } + if (min_full_cost > full_cost) { min_full_cost = full_cost; @@ -607,6 +688,15 @@ ira_assert (hard_regno >= 0); } } + + /* Detect cases where what we added to the full_cost was not enough to force BOTTOM_REG to be avoided. + This can be caused by cost adjustments due to hard reg moves and allocno copies. */ + if (leave_bottom && best_hard_regno < 16) + { + if (internal_flag_ira_verbose > 3 && ira_dump_file != NULL) + fprintf(ira_dump_file, "[ABCD optimisation failed to force TOP_CREG usage (a%d got %d)]\n", ALLOCNO_NUM(allocno), best_hard_regno); + } + if (min_full_cost > mem_cost) { if (! retry_p && internal_flag_ira_verbose > 3 && ira_dump_file != NULL) @@ -2853,14 +2943,42 @@ ira_reassign_pseudos (int *spilled_pseudo_regs, int num, HARD_REG_SET bad_spill_regs, HARD_REG_SET *pseudo_forbidden_regs, - HARD_REG_SET *pseudo_previous_regs, bitmap spilled) + HARD_REG_SET *pseudo_previous_regs, + bitmap spilled) { int i, m, n, regno; bool changed_p; ira_allocno_t a, conflict_a; HARD_REG_SET forbidden_regs; ira_allocno_conflict_iterator aci; + bitmap temp = BITMAP_ALLOC (NULL); + /* Add pseudos which conflict with pseudos already in + SPILLED_PSEUDO_REGS to SPILLED_PSEUDO_REGS. This is preferable + to allocating in two steps as some of the conflicts might have + a higher priority than the pseudos passed in SPILLED_PSEUDO_REGS. */ + for (i = 0; i < num; i++) + bitmap_set_bit (temp, spilled_pseudo_regs[i]); + + for (i = 0, n = num; i < n; i++) + { + int regno = spilled_pseudo_regs[i]; + bitmap_set_bit (temp, regno); + + a = ira_regno_allocno_map[regno]; + FOR_EACH_ALLOCNO_CONFLICT (a, conflict_a, aci) + if (ALLOCNO_HARD_REGNO (conflict_a) < 0 + && ! ALLOCNO_DONT_REASSIGN_P (conflict_a) + && ! bitmap_bit_p (temp, ALLOCNO_REGNO (conflict_a))) + { + spilled_pseudo_regs[num++] = ALLOCNO_REGNO (conflict_a); + bitmap_set_bit (temp, ALLOCNO_REGNO (conflict_a)); + /* ?!? This seems wrong. */ + bitmap_set_bit (consideration_allocno_bitmap, + ALLOCNO_NUM (conflict_a)); + } + } + if (num > 1) qsort (spilled_pseudo_regs, num, sizeof (int), pseudo_reg_compare); changed_p = false; @@ -2878,7 +2996,7 @@ ira_assert (reg_renumber[regno] < 0); if (internal_flag_ira_verbose > 3 && ira_dump_file != NULL) fprintf (ira_dump_file, - " Spill %d(a%d), cost=%d", regno, ALLOCNO_NUM (a), + " Try Assign %d(a%d), cost=%d", regno, ALLOCNO_NUM (a), ALLOCNO_MEMORY_COST (a) - ALLOCNO_COVER_CLASS_COST (a)); allocno_reload_assign (a, forbidden_regs); @@ -2887,60 +3005,8 @@ CLEAR_REGNO_REG_SET (spilled, regno); changed_p = true; } - else - spilled_pseudo_regs[m++] = regno; } - if (m == 0) - return changed_p; - if (internal_flag_ira_verbose > 3 && ira_dump_file != NULL) - { - fprintf (ira_dump_file, " Spilled regs"); - for (i = 0; i < m; i++) - fprintf (ira_dump_file, " %d", spilled_pseudo_regs[i]); - fprintf (ira_dump_file, "\n"); - } - /* Try to assign hard registers to pseudos conflicting with ones - from SPILLED_PSEUDO_REGS. */ - for (i = n = 0; i < m; i++) - { - regno = spilled_pseudo_regs[i]; - a = ira_regno_allocno_map[regno]; - FOR_EACH_ALLOCNO_CONFLICT (a, conflict_a, aci) - if (ALLOCNO_HARD_REGNO (conflict_a) < 0 - && ! ALLOCNO_DONT_REASSIGN_P (conflict_a) - && ! bitmap_bit_p (consideration_allocno_bitmap, - ALLOCNO_NUM (conflict_a))) - { - sorted_allocnos[n++] = conflict_a; - bitmap_set_bit (consideration_allocno_bitmap, - ALLOCNO_NUM (conflict_a)); - } - } - if (n != 0) - { - setup_allocno_priorities (sorted_allocnos, n); - qsort (sorted_allocnos, n, sizeof (ira_allocno_t), - allocno_priority_compare_func); - for (i = 0; i < n; i++) - { - a = sorted_allocnos[i]; - regno = ALLOCNO_REGNO (a); - COPY_HARD_REG_SET (forbidden_regs, bad_spill_regs); - IOR_HARD_REG_SET (forbidden_regs, pseudo_forbidden_regs[regno]); - IOR_HARD_REG_SET (forbidden_regs, pseudo_previous_regs[regno]); - if (internal_flag_ira_verbose > 3 && ira_dump_file != NULL) - fprintf (ira_dump_file, - " Try assign %d(a%d), cost=%d", - regno, ALLOCNO_NUM (a), - ALLOCNO_MEMORY_COST (a) - - ALLOCNO_COVER_CLASS_COST (a)); - if (allocno_reload_assign (a, forbidden_regs)) - { - changed_p = true; - bitmap_clear_bit (spilled, regno); - } - } - } + BITMAP_FREE (temp); return changed_p; } Index: ira-build.c =================================================================== --- ira-build.c (revision 155343) +++ ira-build.c (working copy) @@ -482,6 +482,7 @@ ALLOCNO_LIVE_RANGES (a) = NULL; ALLOCNO_MIN (a) = INT_MAX; ALLOCNO_MAX (a) = -1; + ALLOCNO_REQUIRES_REGISTER_SUBSET (a) = 0; ALLOCNO_CONFLICT_ID (a) = ira_allocnos_num; VEC_safe_push (ira_allocno_t, heap, allocno_vec, a); ira_allocnos = VEC_address (ira_allocno_t, allocno_vec); Index: ira-costs.c =================================================================== --- ira-costs.c (revision 155343) +++ ira-costs.c (working copy) @@ -637,6 +637,14 @@ struct costs *pp = op_costs[i], *qq = this_op_costs[i]; int scale = 1 + (recog_data.operand_type[i] == OP_INOUT); + /* catch any allocnos that want bottom_regs and record this for later */ + if (allocno_pref && allocno_pref[ALLOCNO_NUM + (ira_curr_regno_allocno_map + [REGNO (ops[i])])] == BOTTOM_REGS) + { + ALLOCNO_REQUIRES_REGISTER_SUBSET (ira_curr_regno_allocno_map[REGNO (ops[i])]) = 1; + } + pp->mem_cost = MIN (pp->mem_cost, (qq->mem_cost + op_cost_add) * scale); Index: reload1.c =================================================================== --- reload1.c (revision 155343) +++ reload1.c (working copy) @@ -48,6 +48,7 @@ #include "df.h" #include "target.h" #include "emit-rtl.h" +#include "ira-int.h" /* This file contains the reload pass of the compiler, which is run after register allocation has been done. It checks that @@ -5720,6 +5721,48 @@ return 0; } +/* Return nonzero if REGNO is a particularly bad choice for reloading X. + (Part of the Jeff Law hack) */ +static int +ira_bad_reload_regno_1 (int regno, rtx x) +{ + int x_regno; + ira_allocno_t a; + enum reg_class pref; + + /* We only deal with pseudo regs. */ + if (! x || GET_CODE (x) != REG) + return 0; + + x_regno = REGNO (x); + if (x_regno < FIRST_PSEUDO_REGISTER) + return 0; + + /* If the pseudo prefers REGNO explicitly, then do not consider + REGNO a bad spill choice. */ + pref = reg_preferred_class (x_regno); + if (reg_class_size[pref] == 1 +&& TEST_HARD_REG_BIT (reg_class_contents[pref], regno)) + return 0; + + /* If the pseudo conflicts with REGNO, then we consider REGNO a + poor choice for a reload regno. */ + a = ira_regno_allocno_map[x_regno]; + if (TEST_HARD_REG_BIT (ALLOCNO_TOTAL_CONFLICT_HARD_REGS (a), regno)) + return 1; + + return 0; +} + +/* Return nonzero if REGNO is a particularly bad choice for reloading + IN or OUT. (Part of the Jeff Law hack) */ +int +ira_bad_reload_regno (int regno, rtx in, rtx out) +{ + return (ira_bad_reload_regno_1 (regno, in) + || ira_bad_reload_regno_1 (regno, out)); +} + /* Find a spill register to use as a reload register for reload R. LAST_RELOAD is nonzero if this is the last reload for the insn being processed. @@ -5761,8 +5804,11 @@ run out of reload regs. Suppose we have three reloads, and reloads A and B can share regs. These need two regs. Suppose A and B are given different regs. - That leaves none for C. */ - for (pass = 0; pass < 2; pass++) + That leaves none for C. + + NOTE: There are now three passes in accordance with the Jeff + Law hack. */ + for (pass = 0; pass < 3; pass++) { /* I is the index in spill_regs. We advance it round-robin between insns to use all spill regs @@ -5801,6 +5847,11 @@ && ! TEST_HARD_REG_BIT (reload_reg_used_for_inherit, regnum)))) { + /* BEGIN: Jeff Law's hack */ + if (pass == 1 + && ira_bad_reload_regno (regnum, rld[r].in, rld[r].out)) + continue; + /* END: Jeff Law's hack */ int nr = hard_regno_nregs[regnum][rld[r].mode]; /* Avoid the problem where spilling a GENERAL_OR_FP_REG (on 68000) got us two FP regs. If NR is 1, ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2010-01-11 14:34 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-12-14 11:15 Understanding IRA (The Story so far) Ian Bolton 2009-12-18 15:33 ` Possible IRA improvements for irregular register architectures Ian Bolton 2010-01-04 14:19 ` Ian Bolton 2010-01-11 14:34 ` Ian Bolton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).