From: Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org>
To: GCC Patches <gcc-patches@gcc.gnu.org>
Cc: Vladimir Makarov <vmakarov@redhat.com>,
Richard Sandiford <rdsandiford@googlemail.com>
Subject: [PATCH] Account for prologue spills in reg_pressure scheduling
Date: Mon, 20 Oct 2014 07:03:00 -0000 [thread overview]
Message-ID: <FE7CEF4D-A27B-46D0-96B3-3569C1558DF9@linaro.org> (raw)
[-- Attachment #1: Type: text/plain, Size: 1508 bytes --]
Hi,
This patch improves register pressure scheduling (both SCHED_PRESSURE_WEIGHTED and SCHED_PRESSURE_MODEL) to better estimate number of available registers.
At the moment the scheduler does not account for spills in the prologues and restores in the epilogue, which occur from use of call-used registers. The current state is, essentially, optimized for case when there is a hot loop inside the function, and the loop executes significantly more often than the prologue/epilogue. However, on the opposite end, we have a case when the function is just a single non-cyclic basic block, which executes just as often as prologue / epilogue, so spills in the prologue hurt performance as much as spills in the basic block itself. In such a case the scheduler should throttle-down on the number of available registers and try to not go beyond call-clobbered registers.
The patch uses basic block frequencies to balance the cost of using call-used registers for intermediate cases between the two above extremes.
The motivation for this patch was a floating-point testcase on arm-linux-gnueabihf (ARM is one of the few targets that use register pressure scheduling by default).
A "thanks" goes to Richard good discussion of the problem and suggestions on the approach to fix it.
The patch was bootstrapped on x86_64-linux-gnu (which doesn't really exercises the patch), and cross-tested on arm-linux-gnueabihf and aarch64-linux-gnu.
OK to apply?
--
Maxim Kuvyrkov
www.linaro.org
[-- Attachment #2: 0001-sched_class_reg_num.ChangeLog --]
[-- Type: application/octet-stream, Size: 537 bytes --]
Account for prologue spills in reg_pressure scheduling
* haifa-sched.c (sched_class_regs_num, call_used_regs_num): New static
arrays. Use sched_class_regs_num instead of ira_class_hard_regs_num.
(print_curr_reg_pressure, setup_insn_reg_pressure_info,)
(model_update_pressure, model_spill_cost): Use sched_class_regs_num.
(model_start_schedule): Update.
(sched_pressure_start_bb): New static function. Calculate
sched_class_regs_num.
(schedule_block): Use it.
(alloc_global_sched_pressure_data): Calculate call_used_regs_num.
[-- Attachment #3: 0001-sched_class_reg_num.patch --]
[-- Type: application/octet-stream, Size: 8202 bytes --]
From 12e043a184ad6773d3c42baf23bd2003f6ebe72d Mon Sep 17 00:00:00 2001
From: Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org>
Date: Mon, 20 Oct 2014 05:04:23 +0100
Subject: [PATCH 1/2] sched_class_reg_num
---
gcc/haifa-sched.c | 97 +++++++++++++++++++++++++++++++++++++++++++++--------
1 file changed, 83 insertions(+), 14 deletions(-)
diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c
index db8a45c..2b624a1 100644
--- a/gcc/haifa-sched.c
+++ b/gcc/haifa-sched.c
@@ -933,6 +933,13 @@ static bitmap saved_reg_live;
/* Registers mentioned in the current region. */
static bitmap region_ref_regs;
+/* Effective number of available registers of a given class (see comment
+ in model_start_schedule). */
+static int sched_class_regs_num[N_REG_CLASSES];
+/* Number of call_used_regs. This is a helper for calculating of
+ sched_class_regs_num. */
+static int call_used_regs_num[N_REG_CLASSES];
+
/* Initiate register pressure relative info for scheduling the current
region. Currently it is only clearing register mentioned in the
current region. */
@@ -1116,7 +1123,7 @@ print_curr_reg_pressure (void)
gcc_assert (curr_reg_pressure[cl] >= 0);
fprintf (sched_dump, " %s:%d(%d)", reg_class_names[cl],
curr_reg_pressure[cl],
- curr_reg_pressure[cl] - ira_class_hard_regs_num[cl]);
+ curr_reg_pressure[cl] - sched_class_regs_num[cl]);
}
fprintf (sched_dump, "\n");
}
@@ -1731,9 +1738,9 @@ setup_insn_reg_pressure_info (rtx_insn *insn)
cl = ira_pressure_classes[i];
gcc_assert (curr_reg_pressure[cl] >= 0);
change = (int) pressure_info[i].set_increase - death[cl];
- before = MAX (0, max_reg_pressure[i] - ira_class_hard_regs_num[cl]);
+ before = MAX (0, max_reg_pressure[i] - sched_class_regs_num[cl]);
after = MAX (0, max_reg_pressure[i] + change
- - ira_class_hard_regs_num[cl]);
+ - sched_class_regs_num[cl]);
hard_regno = ira_class_hard_regs[cl][0];
gcc_assert (hard_regno >= 0);
mode = reg_raw_mode[hard_regno];
@@ -2070,7 +2077,7 @@ model_update_pressure (struct model_pressure_group *group,
/* Check whether the maximum pressure in the overall schedule
has increased. (This means that the MODEL_MAX_PRESSURE of
- every point <= POINT will need to increae too; see below.) */
+ every point <= POINT will need to increase too; see below.) */
if (group->limits[pci].pressure < ref_pressure)
group->limits[pci].pressure = ref_pressure;
@@ -2347,7 +2354,7 @@ must_restore_pattern_p (rtx_insn *next, dep_t dep)
/* Return the cost of increasing the pressure in class CL from FROM to TO.
Here we use the very simplistic cost model that every register above
- ira_class_hard_regs_num[CL] has a spill cost of 1. We could use other
+ sched_class_regs_num[CL] has a spill cost of 1. We could use other
measures instead, such as one based on MEMORY_MOVE_COST. However:
(1) In order for an instruction to be scheduled, the higher cost
@@ -2371,7 +2378,7 @@ must_restore_pattern_p (rtx_insn *next, dep_t dep)
static int
model_spill_cost (int cl, int from, int to)
{
- from = MAX (from, ira_class_hard_regs_num[cl]);
+ from = MAX (from, sched_class_regs_num[cl]);
return MAX (to, from) - from;
}
@@ -2477,7 +2484,7 @@ model_set_excess_costs (rtx_insn **insns, int count)
bool print_p;
/* Record the baseECC value for each instruction in the model schedule,
- except that negative costs are converted to zero ones now rather thatn
+ except that negative costs are converted to zero ones now rather than
later. Do not assign a cost to debug instructions, since they must
not change code-generation decisions. Experiments suggest we also
get better results by not assigning a cost to instructions from
@@ -3727,15 +3734,13 @@ model_dump_pressure_summary (void)
scheduling region. */
static void
-model_start_schedule (void)
+model_start_schedule (basic_block bb)
{
- basic_block bb;
-
model_next_priority = 1;
model_schedule.create (sched_max_luid);
model_insns = XCNEWVEC (struct model_insn_info, sched_max_luid);
- bb = BLOCK_FOR_INSN (NEXT_INSN (current_sched_info->prev_head));
+ gcc_assert (bb == BLOCK_FOR_INSN (NEXT_INSN (current_sched_info->prev_head)));
initiate_reg_pressure_info (df_get_live_in (bb));
model_analyze_insns ();
@@ -3773,6 +3778,53 @@ model_end_schedule (void)
model_finalize_pressure_group (&model_before_pressure);
model_schedule.release ();
}
+
+/* Prepare reg pressure scheduling for basic block BB. */
+static void
+sched_pressure_start_bb (basic_block bb)
+{
+ /* Set the number of available registers for each class taking into account
+ relative probability of current basic block versus function prologue and
+ epilogue.
+ * If the basic block executes much more often than the prologue/epilogue
+ (e.g., inside a hot loop), then cost of spill in the prologue is close to
+ nil, so the effective number of available registers is
+ (ira_class_hard_regs_num[cl] - 0).
+ * If the basic block executes as often as the prologue/epilogue,
+ then spill in the block is as costly as in the prologue, so the effective
+ number of available registers is
+ (ira_class_hard_regs_num[cl] - call_used_regs_num[cl]).
+ Note that all-else-equal, we prefer to spill in the prologue, since that
+ allows "extra" registers for other basic blocks of the function.
+ * If the basic block is on the cold path of the function and executes
+ rarely, then we should always prefer to spill in the block, rather than
+ in the prologue/epilogue. The effective number of available register is
+ (ira_class_hard_regs_num[cl] - call_used_regs_num[cl]). */
+ {
+ int i;
+ int entry_freq = ENTRY_BLOCK_PTR_FOR_FN (cfun)->frequency;
+ int bb_freq = bb->frequency;
+
+ if (bb_freq == 0)
+ {
+ if (entry_freq == 0)
+ entry_freq = bb_freq = 1;
+ }
+ if (bb_freq < entry_freq)
+ bb_freq = entry_freq;
+
+ for (i = 0; i < ira_pressure_classes_num; ++i)
+ {
+ enum reg_class cl = ira_pressure_classes[i];
+ sched_class_regs_num[cl] = ira_class_hard_regs_num[cl];
+ sched_class_regs_num[cl]
+ -= (call_used_regs_num[cl] * entry_freq) / bb_freq;
+ }
+ }
+
+ if (sched_pressure == SCHED_PRESSURE_MODEL)
+ model_start_schedule (bb);
+}
\f
/* A structure that holds local state for the loop in schedule_block. */
struct sched_block_state
@@ -6053,8 +6105,8 @@ schedule_block (basic_block *target_bb, state_t init_state)
in try_ready () (which is called through init_ready_list ()). */
(*current_sched_info->init_ready_list) ();
- if (sched_pressure == SCHED_PRESSURE_MODEL)
- model_start_schedule ();
+ if (sched_pressure)
+ sched_pressure_start_bb (*target_bb);
/* The algorithm is O(n^2) in the number of ready insns at any given
time in the worst case. Before reload we are more likely to have
@@ -6681,7 +6733,7 @@ alloc_global_sched_pressure_data (void)
{
if (sched_pressure != SCHED_PRESSURE_NONE)
{
- int i, max_regno = max_reg_num ();
+ int i, c, max_regno = max_reg_num ();
if (sched_dump != NULL)
/* We need info about pseudos for rtl dumps about pseudo
@@ -6701,6 +6753,23 @@ alloc_global_sched_pressure_data (void)
saved_reg_live = BITMAP_ALLOC (NULL);
region_ref_regs = BITMAP_ALLOC (NULL);
}
+
+ /* Calculate number of CALL_USED_REGS in register classes that
+ we calculate register pressure for. */
+ for (c = 0; c < ira_pressure_classes_num; ++c)
+ {
+ enum reg_class cl = ira_pressure_classes[c];
+ call_used_regs_num[cl] = 0;
+ }
+
+ for (i = 0; i < FIRST_PSEUDO_REGISTER; ++i)
+ if (call_used_regs[i])
+ for (c = 0; c < ira_pressure_classes_num; ++c)
+ {
+ enum reg_class cl = ira_pressure_classes[c];
+ if (ira_class_hard_regs[cl][i])
+ ++call_used_regs_num[cl];
+ }
}
}
--
1.7.9.5
next reply other threads:[~2014-10-20 6:57 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-10-20 7:03 Maxim Kuvyrkov [this message]
2014-10-20 19:13 ` Sebastian Pop
2014-10-20 19:23 ` Maxim Kuvyrkov
2014-10-20 20:44 ` Sebastian Pop
2014-10-20 20:59 ` Maxim Kuvyrkov
2014-10-20 21:21 ` Richard Sandiford
2014-10-20 21:57 ` Ramana Radhakrishnan
2014-10-20 22:27 ` Maxim Kuvyrkov
2014-10-21 0:01 ` Sebastian Pop
2014-10-20 21:21 ` Sebastian Pop
2014-10-20 22:13 ` Evandro Menezes
2014-10-21 15:27 ` Vladimir Makarov
2014-10-22 7:45 ` Maxim Kuvyrkov
2014-10-22 12:51 ` Richard Sandiford
2014-10-22 14:47 ` Vladimir Makarov
2014-10-23 3:19 ` Maxim Kuvyrkov
2014-10-23 7:25 ` Richard Sandiford
2014-10-23 7:28 ` Maxim Kuvyrkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=FE7CEF4D-A27B-46D0-96B3-3569C1558DF9@linaro.org \
--to=maxim.kuvyrkov@linaro.org \
--cc=gcc-patches@gcc.gnu.org \
--cc=rdsandiford@googlemail.com \
--cc=vmakarov@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).