[PATCH] Account for prologue spills in reg

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] Account for prologue spills in reg_pressure scheduling
@ 2014-10-20  7:03 Maxim Kuvyrkov
  2014-10-20 19:13 ` Sebastian Pop
  2014-10-21 15:27 ` Vladimir Makarov
  0 siblings, 2 replies; 18+ messages in thread
From: Maxim Kuvyrkov @ 2014-10-20  7:03 UTC (permalink / raw)
  To: GCC Patches; +Cc: Vladimir Makarov, Richard Sandiford

[-- Attachment #1: Type: text/plain, Size: 1508 bytes --]

Hi,

This patch improves register pressure scheduling (both SCHED_PRESSURE_WEIGHTED and SCHED_PRESSURE_MODEL) to better estimate number of available registers.

At the moment the scheduler does not account for spills in the prologues and restores in the epilogue, which occur from use of call-used registers.  The current state is, essentially, optimized for case when there is a hot loop inside the function, and the loop executes significantly more often than the prologue/epilogue.  However, on the opposite end, we have a case when the function is just a single non-cyclic basic block, which executes just as often as prologue / epilogue, so spills in the prologue hurt performance as much as spills in the basic block itself.  In such a case the scheduler should throttle-down on the number of available registers and try to not go beyond call-clobbered registers.

The patch uses basic block frequencies to balance the cost of using call-used registers for intermediate cases between the two above extremes.

The motivation for this patch was a floating-point testcase on arm-linux-gnueabihf (ARM is one of the few targets that use register pressure scheduling by default).

A "thanks" goes to Richard good discussion of the problem and suggestions on the approach to fix it.

The patch was bootstrapped on x86_64-linux-gnu (which doesn't really exercises the patch), and cross-tested on arm-linux-gnueabihf and aarch64-linux-gnu.

OK to apply?

--
Maxim Kuvyrkov
www.linaro.org



[-- Attachment #2: 0001-sched_class_reg_num.ChangeLog --]
[-- Type: application/octet-stream, Size: 537 bytes --]

Account for prologue spills in reg_pressure scheduling

	* haifa-sched.c (sched_class_regs_num, call_used_regs_num): New static
	arrays.  Use sched_class_regs_num instead of ira_class_hard_regs_num.
	(print_curr_reg_pressure, setup_insn_reg_pressure_info,)
	(model_update_pressure, model_spill_cost): Use sched_class_regs_num.
	(model_start_schedule): Update.
	(sched_pressure_start_bb): New static function.  Calculate
	sched_class_regs_num.
	(schedule_block): Use it.
	(alloc_global_sched_pressure_data): Calculate call_used_regs_num.

[-- Attachment #3: 0001-sched_class_reg_num.patch --]
[-- Type: application/octet-stream, Size: 8202 bytes --]

From 12e043a184ad6773d3c42baf23bd2003f6ebe72d Mon Sep 17 00:00:00 2001
From: Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org>
Date: Mon, 20 Oct 2014 05:04:23 +0100
Subject: [PATCH 1/2] sched_class_reg_num

---
 gcc/haifa-sched.c |   97 +++++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 83 insertions(+), 14 deletions(-)

diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c
index db8a45c..2b624a1 100644
--- a/gcc/haifa-sched.c
+++ b/gcc/haifa-sched.c
@@ -933,6 +933,13 @@ static bitmap saved_reg_live;
 /* Registers mentioned in the current region.  */
 static bitmap region_ref_regs;
 
+/* Effective number of available registers of a given class (see comment
+   in model_start_schedule).  */
+static int sched_class_regs_num[N_REG_CLASSES];
+/* Number of call_used_regs.  This is a helper for calculating of
+   sched_class_regs_num.  */
+static int call_used_regs_num[N_REG_CLASSES];
+
 /* Initiate register pressure relative info for scheduling the current
    region.  Currently it is only clearing register mentioned in the
    current region.  */
@@ -1116,7 +1123,7 @@ print_curr_reg_pressure (void)
       gcc_assert (curr_reg_pressure[cl] >= 0);
       fprintf (sched_dump, "  %s:%d(%d)", reg_class_names[cl],
 	       curr_reg_pressure[cl],
-	       curr_reg_pressure[cl] - ira_class_hard_regs_num[cl]);
+	       curr_reg_pressure[cl] - sched_class_regs_num[cl]);
     }
   fprintf (sched_dump, "\n");
 }
@@ -1731,9 +1738,9 @@ setup_insn_reg_pressure_info (rtx_insn *insn)
       cl = ira_pressure_classes[i];
       gcc_assert (curr_reg_pressure[cl] >= 0);
       change = (int) pressure_info[i].set_increase - death[cl];
-      before = MAX (0, max_reg_pressure[i] - ira_class_hard_regs_num[cl]);
+      before = MAX (0, max_reg_pressure[i] - sched_class_regs_num[cl]);
       after = MAX (0, max_reg_pressure[i] + change
-		   - ira_class_hard_regs_num[cl]);
+		   - sched_class_regs_num[cl]);
       hard_regno = ira_class_hard_regs[cl][0];
       gcc_assert (hard_regno >= 0);
       mode = reg_raw_mode[hard_regno];
@@ -2070,7 +2077,7 @@ model_update_pressure (struct model_pressure_group *group,
 
       /* Check whether the maximum pressure in the overall schedule
 	 has increased.  (This means that the MODEL_MAX_PRESSURE of
-	 every point <= POINT will need to increae too; see below.)  */
+	 every point <= POINT will need to increase too; see below.)  */
       if (group->limits[pci].pressure < ref_pressure)
 	group->limits[pci].pressure = ref_pressure;
 
@@ -2347,7 +2354,7 @@ must_restore_pattern_p (rtx_insn *next, dep_t dep)
 /* Return the cost of increasing the pressure in class CL from FROM to TO.
 
    Here we use the very simplistic cost model that every register above
-   ira_class_hard_regs_num[CL] has a spill cost of 1.  We could use other
+   sched_class_regs_num[CL] has a spill cost of 1.  We could use other
    measures instead, such as one based on MEMORY_MOVE_COST.  However:
 
       (1) In order for an instruction to be scheduled, the higher cost
@@ -2371,7 +2378,7 @@ must_restore_pattern_p (rtx_insn *next, dep_t dep)
 static int
 model_spill_cost (int cl, int from, int to)
 {
-  from = MAX (from, ira_class_hard_regs_num[cl]);
+  from = MAX (from, sched_class_regs_num[cl]);
   return MAX (to, from) - from;
 }
 
@@ -2477,7 +2484,7 @@ model_set_excess_costs (rtx_insn **insns, int count)
   bool print_p;
 
   /* Record the baseECC value for each instruction in the model schedule,
-     except that negative costs are converted to zero ones now rather thatn
+     except that negative costs are converted to zero ones now rather than
      later.  Do not assign a cost to debug instructions, since they must
      not change code-generation decisions.  Experiments suggest we also
      get better results by not assigning a cost to instructions from
@@ -3727,15 +3734,13 @@ model_dump_pressure_summary (void)
    scheduling region.  */
 
 static void
-model_start_schedule (void)
+model_start_schedule (basic_block bb)
 {
-  basic_block bb;
-
   model_next_priority = 1;
   model_schedule.create (sched_max_luid);
   model_insns = XCNEWVEC (struct model_insn_info, sched_max_luid);
 
-  bb = BLOCK_FOR_INSN (NEXT_INSN (current_sched_info->prev_head));
+  gcc_assert (bb == BLOCK_FOR_INSN (NEXT_INSN (current_sched_info->prev_head)));
   initiate_reg_pressure_info (df_get_live_in (bb));
 
   model_analyze_insns ();
@@ -3773,6 +3778,53 @@ model_end_schedule (void)
   model_finalize_pressure_group (&model_before_pressure);
   model_schedule.release ();
 }
+
+/* Prepare reg pressure scheduling for basic block BB.  */
+static void
+sched_pressure_start_bb (basic_block bb)
+{
+  /* Set the number of available registers for each class taking into account
+     relative probability of current basic block versus function prologue and
+     epilogue.
+     * If the basic block executes much more often than the prologue/epilogue
+     (e.g., inside a hot loop), then cost of spill in the prologue is close to
+     nil, so the effective number of available registers is
+     (ira_class_hard_regs_num[cl] - 0).
+     * If the basic block executes as often as the prologue/epilogue,
+     then spill in the block is as costly as in the prologue, so the effective
+     number of available registers is
+     (ira_class_hard_regs_num[cl] - call_used_regs_num[cl]).
+     Note that all-else-equal, we prefer to spill in the prologue, since that
+     allows "extra" registers for other basic blocks of the function.
+     * If the basic block is on the cold path of the function and executes
+     rarely, then we should always prefer to spill in the block, rather than
+     in the prologue/epilogue.  The effective number of available register is
+     (ira_class_hard_regs_num[cl] - call_used_regs_num[cl]).  */
+  {
+    int i;
+    int entry_freq = ENTRY_BLOCK_PTR_FOR_FN (cfun)->frequency;
+    int bb_freq = bb->frequency;
+
+    if (bb_freq == 0)
+      {
+	if (entry_freq == 0)
+	  entry_freq = bb_freq = 1;
+      }
+    if (bb_freq < entry_freq)
+      bb_freq = entry_freq;
+
+    for (i = 0; i < ira_pressure_classes_num; ++i)
+      {
+	enum reg_class cl = ira_pressure_classes[i];
+	sched_class_regs_num[cl] = ira_class_hard_regs_num[cl];
+	sched_class_regs_num[cl]
+	  -= (call_used_regs_num[cl] * entry_freq) / bb_freq;
+      }
+  }
+
+  if (sched_pressure == SCHED_PRESSURE_MODEL)
+    model_start_schedule (bb);
+}
 \f
 /* A structure that holds local state for the loop in schedule_block.  */
 struct sched_block_state
@@ -6053,8 +6105,8 @@ schedule_block (basic_block *target_bb, state_t init_state)
      in try_ready () (which is called through init_ready_list ()).  */
   (*current_sched_info->init_ready_list) ();
 
-  if (sched_pressure == SCHED_PRESSURE_MODEL)
-    model_start_schedule ();
+  if (sched_pressure)
+    sched_pressure_start_bb (*target_bb);
 
   /* The algorithm is O(n^2) in the number of ready insns at any given
      time in the worst case.  Before reload we are more likely to have
@@ -6681,7 +6733,7 @@ alloc_global_sched_pressure_data (void)
 {
   if (sched_pressure != SCHED_PRESSURE_NONE)
     {
-      int i, max_regno = max_reg_num ();
+      int i, c, max_regno = max_reg_num ();
 
       if (sched_dump != NULL)
 	/* We need info about pseudos for rtl dumps about pseudo
@@ -6701,6 +6753,23 @@ alloc_global_sched_pressure_data (void)
 	  saved_reg_live = BITMAP_ALLOC (NULL);
 	  region_ref_regs = BITMAP_ALLOC (NULL);
 	}
+
+      /* Calculate number of CALL_USED_REGS in register classes that
+	 we calculate register pressure for.  */
+      for (c = 0; c < ira_pressure_classes_num; ++c)
+	{
+	  enum reg_class cl = ira_pressure_classes[c];
+	  call_used_regs_num[cl] = 0;
+	}
+
+      for (i = 0; i < FIRST_PSEUDO_REGISTER; ++i)
+	if (call_used_regs[i])
+	  for (c = 0; c < ira_pressure_classes_num; ++c)
+	    {
+	      enum reg_class cl = ira_pressure_classes[c];
+	      if (ira_class_hard_regs[cl][i])
+		++call_used_regs_num[cl];
+	    }
     }
 }
 
-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Account for prologue spills in reg_pressure scheduling
  2014-10-20  7:03 [PATCH] Account for prologue spills in reg_pressure scheduling Maxim Kuvyrkov
@ 2014-10-20 19:13 ` Sebastian Pop
  2014-10-20 19:23   ` Maxim Kuvyrkov
  2014-10-21 15:27 ` Vladimir Makarov
  1 sibling, 1 reply; 18+ messages in thread
From: Sebastian Pop @ 2014-10-20 19:13 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: GCC Patches, Vladimir Makarov, Richard Sandiford

Maxim Kuvyrkov wrote:
> Hi,
> 
> This patch improves register pressure scheduling (both SCHED_PRESSURE_WEIGHTED and SCHED_PRESSURE_MODEL) to better estimate number of available registers.
> 
> At the moment the scheduler does not account for spills in the prologues and restores in the epilogue, which occur from use of call-used registers.  The current state is, essentially, optimized for case when there is a hot loop inside the function, and the loop executes significantly more often than the prologue/epilogue.  However, on the opposite end, we have a case when the function is just a single non-cyclic basic block, which executes just as often as prologue / epilogue, so spills in the prologue hurt performance as much as spills in the basic block itself.  In such a case the scheduler should throttle-down on the number of available registers and try to not go beyond call-clobbered registers.
> 
> The patch uses basic block frequencies to balance the cost of using call-used registers for intermediate cases between the two above extremes.
> 
> The motivation for this patch was a floating-point testcase on arm-linux-gnueabihf (ARM is one of the few targets that use register pressure scheduling by default).
> 

Does aarch64 enable reg pressure sched by default, or what is the flag to enable it?
I'm planing to look at the perf impact of the patch.

Thanks,
Sebastian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Account for prologue spills in reg_pressure scheduling
  2014-10-20 19:13 ` Sebastian Pop
@ 2014-10-20 19:23   ` Maxim Kuvyrkov
  2014-10-20 20:44     ` Sebastian Pop
  0 siblings, 1 reply; 18+ messages in thread
From: Maxim Kuvyrkov @ 2014-10-20 19:23 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: GCC Patches, Vladimir Makarov, Richard Sandiford

On Oct 21, 2014, at 8:11 AM, Sebastian Pop <sebpop@gmail.com> wrote:

> Maxim Kuvyrkov wrote:
>> Hi,
>> 
>> This patch improves register pressure scheduling (both SCHED_PRESSURE_WEIGHTED and SCHED_PRESSURE_MODEL) to better estimate number of available registers.
>> 
>> At the moment the scheduler does not account for spills in the prologues and restores in the epilogue, which occur from use of call-used registers.  The current state is, essentially, optimized for case when there is a hot loop inside the function, and the loop executes significantly more often than the prologue/epilogue. However, on the opposite end, we have a case when the function is just a single non-cyclic basic block, which executes just as often as prologue / epilogue, so spills in the prologue hurt performance as much as spills in the basic block itself.  In such a case the scheduler should throttle-down on the number of available registers and try to not go beyond call-clobbered registers.
>> 
>> The patch uses basic block frequencies to balance the cost of using call-used registers for intermediate cases between the two above extremes.
>> 
>> The motivation for this patch was a floating-point testcase on arm-linux-gnueabihf (ARM is one of the few targets that use register pressure scheduling by default).
>> 
> 
> Does aarch64 enable reg pressure sched by default, or what is the flag to enable it?
> I'm planing to look at the perf impact of the patch.

Thanks, benchmarking results are welcome!  AArch64 doesn't use reg_pressure scheduling by default.  Use "-fsched-pressure --param=sched-pressure-algorithm=2" to enable same thing as on ARM.  I would imagine C++ and Fortran floating-point code to be most affected.

--
Maxim Kuvyrkov
www.linaro.org

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Account for prologue spills in reg_pressure scheduling
  2014-10-20 19:23   ` Maxim Kuvyrkov
@ 2014-10-20 20:44     ` Sebastian Pop
  2014-10-20 20:59       ` Maxim Kuvyrkov
  0 siblings, 1 reply; 18+ messages in thread
From: Sebastian Pop @ 2014-10-20 20:44 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: GCC Patches, Vladimir Makarov, Richard Sandiford

Hi Maxim,

Maxim Kuvyrkov wrote:
>  Thanks, benchmarking results are welcome!  AArch64 doesn't use reg_pressure
> scheduling by default.  Use "-fsched-pressure
> --param=sched-pressure-algorithm=2" to enable same thing as on ARM.  I would
> imagine C++ and Fortran floating-point code to be most affected.

On aarch64 I only see perf improvements with your patch: no perf degradations on
all the tests that I have run.

base0: r216447, -O3
base1: r216447, -O3 -fsched-pressure --param=sched-pressure-algorithm=2
patch: r216447 + your patch, -O3 -fsched-pressure --param=sched-pressure-algorithm=2

patch vs. base1 is only an improvement.

base1 vs. base0 has a few good improvements, and some small degradations: your
patch improves the perf for one of the degradations to the point it is better
now with -fsched-pressure --param=sched-pressure-algorithm=2 than at -O3.

Could we turn on "-fsched-pressure --param=sched-pressure-algorithm=2" by
default for aarch64?

Thanks,
Sebastian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Account for prologue spills in reg_pressure scheduling
  2014-10-20 20:44     ` Sebastian Pop
@ 2014-10-20 20:59       ` Maxim Kuvyrkov
  2014-10-20 21:21         ` Richard Sandiford
                           ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Maxim Kuvyrkov @ 2014-10-20 20:59 UTC (permalink / raw)
  To: Sebastian Pop
  Cc: GCC Patches, Vladimir Makarov, Richard Sandiford,
	Ramana Radhakrishnan, Marcus Shawcroft, Richard Earnshaw

[Adding ARM maintainers to CC]

On Oct 21, 2014, at 9:44 AM, Sebastian Pop <sebpop@gmail.com> wrote:

> Hi Maxim,
> 
> Maxim Kuvyrkov wrote:
>> Thanks, benchmarking results are welcome!  AArch64 doesn't use reg_pressure
>> scheduling by default.  Use "-fsched-pressure
>> --param=sched-pressure-algorithm=2" to enable same thing as on ARM.  I would
>> imagine C++ and Fortran floating-point code to be most affected.
> 
> On aarch64 I only see perf improvements with your patch: no perf degradations on
> all the tests that I have run.
> 
> base0: r216447, -O3
> base1: r216447, -O3 -fsched-pressure --param=sched-pressure-algorithm=2
> patch: r216447 + your patch, -O3 -fsched-pressure --param=sched-pressure-algorithm=2
> 
> patch vs. base1 is only an improvement.
> 
> base1 vs. base0 has a few good improvements, and some small degradations: your
> patch improves the perf for one of the degradations to the point it is better
> now with -fsched-pressure --param=sched-pressure-algorithm=2 than at -O3.
> 
> Could we turn on "-fsched-pressure --param=sched-pressure-algorithm=2" by
> default for aarch64?

These are great results, yay!  Sebastian, what benchmarks did you run?

We need to see improvements on spec2k / spec2k6 to enable register-pressure scheduling on AArch64 by default.  The current understanding is that AArch64 has enough registers to not benefit from pressure-aware scheduling.  On the other hand, one could argue that cores with more complex pipelines (e.g., A57) might not benefit from pipeline-oriented scheduling either, and, therefore, scheduling for register pressure can provide a better win.

Thank you,

--
Maxim Kuvyrkov
www.linaro.org

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Account for prologue spills in reg_pressure scheduling
  2014-10-20 20:59       ` Maxim Kuvyrkov
@ 2014-10-20 21:21         ` Richard Sandiford
  2014-10-20 21:57           ` Ramana Radhakrishnan
  2014-10-20 21:21         ` Sebastian Pop
  2014-10-20 22:13         ` Evandro Menezes
  2 siblings, 1 reply; 18+ messages in thread
From: Richard Sandiford @ 2014-10-20 21:21 UTC (permalink / raw)
  To: Maxim Kuvyrkov
  Cc: Sebastian Pop, GCC Patches, Vladimir Makarov,
	Ramana Radhakrishnan, Marcus Shawcroft, Richard Earnshaw

Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org> writes:
> [Adding ARM maintainers to CC]
>
> On Oct 21, 2014, at 9:44 AM, Sebastian Pop <sebpop@gmail.com> wrote:
>
>> Hi Maxim,
>> 
>> Maxim Kuvyrkov wrote:
>>> Thanks, benchmarking results are welcome!  AArch64 doesn't use reg_pressure
>>> scheduling by default.  Use "-fsched-pressure
>>> --param=sched-pressure-algorithm=2" to enable same thing as on ARM.  I would
>>> imagine C++ and Fortran floating-point code to be most affected.
>> 
>> On aarch64 I only see perf improvements with your patch: no perf
>> degradations on
>> all the tests that I have run.
>> 
>> base0: r216447, -O3
>> base1: r216447, -O3 -fsched-pressure --param=sched-pressure-algorithm=2
>> patch: r216447 + your patch, -O3 -fsched-pressure
>> --param=sched-pressure-algorithm=2
>> 
>> patch vs. base1 is only an improvement.
>> 
>> base1 vs. base0 has a few good improvements, and some small degradations: your
>> patch improves the perf for one of the degradations to the point it is better
>> now with -fsched-pressure --param=sched-pressure-algorithm=2 than at -O3.
>> 
>> Could we turn on "-fsched-pressure --param=sched-pressure-algorithm=2" by
>> default for aarch64?
>
> These are great results, yay!

+1.  Thanks for running these tests.  If you have time, it'd also be
interesting to try the same thing with --param=sched-pressure-algorithm=1
(which should be equivalent to not having the --param, but better safe
than sorry).  Is algorithm 1 or algorithm 2 better for aarch64?

Thanks,
Richard

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Account for prologue spills in reg_pressure scheduling
  2014-10-20 20:59       ` Maxim Kuvyrkov
  2014-10-20 21:21         ` Richard Sandiford
@ 2014-10-20 21:21         ` Sebastian Pop
  2014-10-20 22:13         ` Evandro Menezes
  2 siblings, 0 replies; 18+ messages in thread
From: Sebastian Pop @ 2014-10-20 21:21 UTC (permalink / raw)
  To: Maxim Kuvyrkov
  Cc: GCC Patches, Vladimir Makarov, Richard Sandiford,
	Ramana Radhakrishnan, Marcus Shawcroft, Richard Earnshaw

Maxim Kuvyrkov wrote:
> [Adding ARM maintainers to CC]
> 
> On Oct 21, 2014, at 9:44 AM, Sebastian Pop <sebpop@gmail.com> wrote:
> 
> > Hi Maxim,
> > 
> > Maxim Kuvyrkov wrote:
> >> Thanks, benchmarking results are welcome!  AArch64 doesn't use reg_pressure
> >> scheduling by default.  Use "-fsched-pressure
> >> --param=sched-pressure-algorithm=2" to enable same thing as on ARM.  I would
> >> imagine C++ and Fortran floating-point code to be most affected.
> > 
> > On aarch64 I only see perf improvements with your patch: no perf degradations on
> > all the tests that I have run.
> > 
> > base0: r216447, -O3
> > base1: r216447, -O3 -fsched-pressure --param=sched-pressure-algorithm=2
> > patch: r216447 + your patch, -O3 -fsched-pressure --param=sched-pressure-algorithm=2
> > 
> > patch vs. base1 is only an improvement.
> > 
> > base1 vs. base0 has a few good improvements, and some small degradations: your
> > patch improves the perf for one of the degradations to the point it is better
> > now with -fsched-pressure --param=sched-pressure-algorithm=2 than at -O3.
> > 
> > Could we turn on "-fsched-pressure --param=sched-pressure-algorithm=2" by
> > default for aarch64?
> 
> These are great results, yay!  Sebastian, what benchmarks did you run?

I have run Geekbench and some other benchmarks.

> We need to see improvements on spec2k / spec2k6 to enable register-pressure
> scheduling on AArch64 by default.  The current understanding is that AArch64

I don't have the data for spec2k / spec2k6.

> has enough registers to not benefit from pressure-aware scheduling.  On the
> other hand, one could argue that cores with more complex pipelines (e.g., A57)
> might not benefit from pipeline-oriented scheduling either, and, therefore,
> scheduling for register pressure can provide a better win.

I have seen the improvements on A57.

Sebastian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Account for prologue spills in reg_pressure scheduling
  2014-10-20 21:21         ` Richard Sandiford
@ 2014-10-20 21:57           ` Ramana Radhakrishnan
  2014-10-20 22:27             ` Maxim Kuvyrkov
  2014-10-21  0:01             ` Sebastian Pop
  0 siblings, 2 replies; 18+ messages in thread
From: Ramana Radhakrishnan @ 2014-10-20 21:57 UTC (permalink / raw)
  To: Maxim Kuvyrkov, Sebastian Pop, GCC Patches, Vladimir Makarov,
	Ramana Radhakrishnan, Marcus Shawcroft, Richard Earnshaw,
	Richard Sandiford

On Mon, Oct 20, 2014 at 10:17 PM, Richard Sandiford
<rdsandiford@googlemail.com> wrote:
> Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org> writes:
>> [Adding ARM maintainers to CC]
>>
>> On Oct 21, 2014, at 9:44 AM, Sebastian Pop <sebpop@gmail.com> wrote:
>>
>>> Hi Maxim,
>>>
>>> Maxim Kuvyrkov wrote:
>>>> Thanks, benchmarking results are welcome!  AArch64 doesn't use reg_pressure
>>>> scheduling by default.  Use "-fsched-pressure
>>>> --param=sched-pressure-algorithm=2" to enable same thing as on ARM.  I would
>>>> imagine C++ and Fortran floating-point code to be most affected.
>>>
>>> On aarch64 I only see perf improvements with your patch: no perf
>>> degradations on
>>> all the tests that I have run.
>>>
>>> base0: r216447, -O3
>>> base1: r216447, -O3 -fsched-pressure --param=sched-pressure-algorithm=2
>>> patch: r216447 + your patch, -O3 -fsched-pressure
>>> --param=sched-pressure-algorithm=2
>>>
>>> patch vs. base1 is only an improvement.
>>>
>>> base1 vs. base0 has a few good improvements, and some small degradations: your
>>> patch improves the perf for one of the degradations to the point it is better
>>> now with -fsched-pressure --param=sched-pressure-algorithm=2 than at -O3.
>>>
>>> Could we turn on "-fsched-pressure --param=sched-pressure-algorithm=2" by
>>> default for aarch64?

We already have sched-pressure --param=sched-pressure-algorithm=1 on
by default in the AArch64 backend from September.
https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01663.html went in a few
days back.

 So if this patch is on then we are looking at uplifts with
sched-pressure-algorithm=2 patch and --param
sched-pressure-algorithm=2. To a large degree turning on algorithm #2
is a benchmarking exercise and IMHO should happen along with the
sched-pressure tweaks that you are currently doing.  I would suggest
moving to the same algorithm as the ARM backend would be nice and if
we can deal with any performance regressions that appear. However
without seeing behaviour on some more benchmarks like SPEC2k(6) it
would be unwise to switch this on by default . We can run this and let
you know the results, though SPECFP2k6 takes quite a while - are all
your patches to sched-pressure now done ?

>>
>> These are great results, yay!
>
> +1.  Thanks for running these tests.  If you have time, it'd also be
> interesting to try the same thing with --param=sched-pressure-algorithm=1
> (which should be equivalent to not having the --param, but better safe
> than sorry).  Is algorithm 1 or algorithm 2 better for aarch64?

Sebastian's results indicate algorithm #2 + Maxim's patches are better
but we probably need some more benchmarking.

Promising results though - thanks for getting these out, Maxim.

Ramana

>
> Thanks,
> Richard

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH] Account for prologue spills in reg_pressure scheduling
  2014-10-20 20:59       ` Maxim Kuvyrkov
  2014-10-20 21:21         ` Richard Sandiford
  2014-10-20 21:21         ` Sebastian Pop
@ 2014-10-20 22:13         ` Evandro Menezes
  2 siblings, 0 replies; 18+ messages in thread
From: Evandro Menezes @ 2014-10-20 22:13 UTC (permalink / raw)
  To: 'Maxim Kuvyrkov', 'Sebastian Pop'
  Cc: 'GCC Patches', 'Vladimir Makarov',
	'Richard Sandiford', 'Ramana Radhakrishnan',
	'Marcus Shawcroft', 'Richard Earnshaw'

> [Adding ARM maintainers to CC]
> 
> On Oct 21, 2014, at 9:44 AM, Sebastian Pop <sebpop@gmail.com> wrote:
> 
> > Hi Maxim,
> >
> > Maxim Kuvyrkov wrote:
> >> Thanks, benchmarking results are welcome!  AArch64 doesn't use
> >> reg_pressure scheduling by default.  Use "-fsched-pressure
> >> --param=sched-pressure-algorithm=2" to enable same thing as on ARM.
> >> I would imagine C++ and Fortran floating-point code to be most
affected.
> >
> > On aarch64 I only see perf improvements with your patch: no perf
> > degradations on all the tests that I have run.
> >
> > base0: r216447, -O3
> > base1: r216447, -O3 -fsched-pressure
> > --param=sched-pressure-algorithm=2
> > patch: r216447 + your patch, -O3 -fsched-pressure
> > --param=sched-pressure-algorithm=2
> >
> > patch vs. base1 is only an improvement.
> >
> > base1 vs. base0 has a few good improvements, and some small
> > degradations: your patch improves the perf for one of the degradations
> > to the point it is better now with -fsched-pressure
--param=sched-pressure-
> algorithm=2 than at -O3.
> >
> > Could we turn on "-fsched-pressure --param=sched-pressure-algorithm=2"
> > by default for aarch64?
> 
> These are great results, yay!  Sebastian, what benchmarks did you run?
> 
> We need to see improvements on spec2k / spec2k6 to enable
register-pressure
> scheduling on AArch64 by default.  The current understanding is that
AArch64
> has enough registers to not benefit from pressure-aware scheduling.  On
the
> other hand, one could argue that cores with more complex pipelines (e.g.,
> A57) might not benefit from pipeline-oriented scheduling either, and,
> therefore, scheduling for register pressure can provide a better win.

Cores with complex pipelines might benefit less from such scheduling, but if
x86-64, which is as complex as it gets, generally benefitted from other
scheduling algorithms at least, I'd wager that A57 could experience some
boost as well.

Cheers, 

-- 
Evandro Menezes                              Austin, TX

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Account for prologue spills in reg_pressure scheduling
  2014-10-20 21:57           ` Ramana Radhakrishnan
@ 2014-10-20 22:27             ` Maxim Kuvyrkov
  2014-10-21  0:01             ` Sebastian Pop
  1 sibling, 0 replies; 18+ messages in thread
From: Maxim Kuvyrkov @ 2014-10-20 22:27 UTC (permalink / raw)
  To: ramrad01
  Cc: Sebastian Pop, GCC Patches, Vladimir Makarov,
	Ramana Radhakrishnan, Marcus Shawcroft, Richard Earnshaw,
	Richard Sandiford

On Oct 21, 2014, at 10:39 AM, Ramana Radhakrishnan <ramana.gcc@googlemail.com> wrote:

> On Mon, Oct 20, 2014 at 10:17 PM, Richard Sandiford
> <rdsandiford@googlemail.com> wrote:
>> Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org> writes:
>>> [Adding ARM maintainers to CC]
>>> 
>>> On Oct 21, 2014, at 9:44 AM, Sebastian Pop <sebpop@gmail.com> wrote:
>>> 
>>>> Hi Maxim,
>>>> 
>>>> Maxim Kuvyrkov wrote:
>>>>> Thanks, benchmarking results are welcome!  AArch64 doesn't use reg_pressure
>>>>> scheduling by default.  Use "-fsched-pressure
>>>>> --param=sched-pressure-algorithm=2" to enable same thing as on ARM.  I would
>>>>> imagine C++ and Fortran floating-point code to be most affected.
>>>> 
>>>> On aarch64 I only see perf improvements with your patch: no perf
>>>> degradations on
>>>> all the tests that I have run.
>>>> 
>>>> base0: r216447, -O3
>>>> base1: r216447, -O3 -fsched-pressure --param=sched-pressure-algorithm=2
>>>> patch: r216447 + your patch, -O3 -fsched-pressure
>>>> --param=sched-pressure-algorithm=2
>>>> 
>>>> patch vs. base1 is only an improvement.
>>>> 
>>>> base1 vs. base0 has a few good improvements, and some small degradations: your
>>>> patch improves the perf for one of the degradations to the point it is better
>>>> now with -fsched-pressure --param=sched-pressure-algorithm=2 than at -O3.
>>>> 
>>>> Could we turn on "-fsched-pressure --param=sched-pressure-algorithm=2" by
>>>> default for aarch64?
> 
> We already have sched-pressure --param=sched-pressure-algorithm=1 on
> by default in the AArch64 backend from September.
> https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01663.html went in a few
> days back.

Oh, good, I didn't notice AArch64 using sched-pressure because of weighted algorithm.  This patch affects both "weighted" and "model" algorithm, so it doesn't necessarily makes "model" algorithm better than "weighted".  That said, my personal preference is for both AArch32 and AArch64 to use "model" algorithm, but, as you said, we need benchmarking data to make that decision.

...
>  We can run this and let
> you know the results, though SPECFP2k6 takes quite a while - are all
> your patches to sched-pressure now done ?

Wait on benchmarking for 1-2 days.  The sched-pressure patches are done now, but there are more patches for the 2nd scheduler pass in the queue (from linaro-dev/sched-model-prefetch branch).  Hopefully those will be posted here today.

Thank you,

--
Maxim Kuvyrkov
www.linaro.org

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Account for prologue spills in reg_pressure scheduling
  2014-10-20 21:57           ` Ramana Radhakrishnan
  2014-10-20 22:27             ` Maxim Kuvyrkov
@ 2014-10-21  0:01             ` Sebastian Pop
  1 sibling, 0 replies; 18+ messages in thread
From: Sebastian Pop @ 2014-10-21  0:01 UTC (permalink / raw)
  To: ramrad01
  Cc: Maxim Kuvyrkov, GCC Patches, Vladimir Makarov,
	Ramana Radhakrishnan, Marcus Shawcroft, Richard Earnshaw,
	Richard Sandiford

Ramana Radhakrishnan wrote:
> We already have sched-pressure --param=sched-pressure-algorithm=1 on
> by default in the AArch64 backend from September.
> https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01663.html went in a few
> days back.
> 
>  So if this patch is on then we are looking at uplifts with
> sched-pressure-algorithm=2 patch and --param

Right: I ran "-O3" vs. "-O3 -fsched-pressure --param=sched-pressure-algorithm=1"
and the numbers are identical.

> sched-pressure-algorithm=2. To a large degree turning on algorithm #2
> is a benchmarking exercise and IMHO should happen along with the
> sched-pressure tweaks that you are currently doing.  I would suggest
> moving to the same algorithm as the ARM backend would be nice and if
> we can deal with any performance regressions that appear. However
> without seeing behaviour on some more benchmarks like SPEC2k(6) it
> would be unwise to switch this on by default . We can run this and let
> you know the results, though SPECFP2k6 takes quite a while - are all
> your patches to sched-pressure now done ?
> 
> >>
> >> These are great results, yay!
> >
> > +1.  Thanks for running these tests.  If you have time, it'd also be
> > interesting to try the same thing with --param=sched-pressure-algorithm=1
> > (which should be equivalent to not having the --param, but better safe
> > than sorry).  Is algorithm 1 or algorithm 2 better for aarch64?

When testing Maxim's patch + --param=sched-pressure-algorithm=1
I see more perf degradations than speedups.

> Sebastian's results indicate algorithm #2 + Maxim's patches are better
> but we probably need some more benchmarking.

Overall algorithm #2 produces better results than algorithm #1.  Maxim's patch
is nicely improving the perf of algorithm #2.

Thanks,
Sebastian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Account for prologue spills in reg_pressure scheduling
  2014-10-20  7:03 [PATCH] Account for prologue spills in reg_pressure scheduling Maxim Kuvyrkov
  2014-10-20 19:13 ` Sebastian Pop
@ 2014-10-21 15:27 ` Vladimir Makarov
  2014-10-22  7:45   ` Maxim Kuvyrkov
  1 sibling, 1 reply; 18+ messages in thread
From: Vladimir Makarov @ 2014-10-21 15:27 UTC (permalink / raw)
  To: Maxim Kuvyrkov, GCC Patches; +Cc: Richard Sandiford

On 10/20/2014 02:57 AM, Maxim Kuvyrkov wrote:
> Hi,
>
> This patch improves register pressure scheduling (both SCHED_PRESSURE_WEIGHTED and SCHED_PRESSURE_MODEL) to better estimate number of available registers.
>
> At the moment the scheduler does not account for spills in the prologues and restores in the epilogue, which occur from use of call-used registers.  The current state is, essentially, optimized for case when there is a hot loop inside the function, and the loop executes significantly more often than the prologue/epilogue.  However, on the opposite end, we have a case when the function is just a single non-cyclic basic block, which executes just as often as prologue / epilogue, so spills in the prologue hurt performance as much as spills in the basic block itself.  In such a case the scheduler should throttle-down on the number of available registers and try to not go beyond call-clobbered registers.
>
> The patch uses basic block frequencies to balance the cost of using call-used registers for intermediate cases between the two above extremes.
>
> The motivation for this patch was a floating-point testcase on arm-linux-gnueabihf (ARM is one of the few targets that use register pressure scheduling by default).
>
> A "thanks" goes to Richard good discussion of the problem and suggestions on the approach to fix it.
>
> The patch was bootstrapped on x86_64-linux-gnu (which doesn't really exercises the patch), and cross-tested on arm-linux-gnueabihf and aarch64-linux-gnu.
>
> OK to apply?
>
It is a pretty interesting idea for heuristic, Maxim.

But I don't understand the following loop:

+      for (i = 0; i < FIRST_PSEUDO_REGISTER; ++i)
+	if (call_used_regs[i])
+	  for (c = 0; c < ira_pressure_classes_num; ++c)
+	    {
+	      enum reg_class cl = ira_pressure_classes[c];
+	      if (ira_class_hard_regs[cl][i])
+		++call_used_regs_num[cl];


ira_class_hard_regs[cl] is array containing hard registers belonging to
class CL.  So if GENERAL_REGS consists of hard regs 0..3, 12..15,  the
array will contain 8 elements 0..3, 12..15.  The array size is defined
by ira_class_hard_regs_num[cl].  So the index is order number of hard
reg in the class (starting from 0) but not hard register number itself. 
Also the pressure classes never intersect so you can stop the inner loop
when you find class to which hard reg belongs to.

I believe you should rewrite the code and get performance results again
to get an approval.   You also missed the changelog.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Account for prologue spills in reg_pressure scheduling
  2014-10-21 15:27 ` Vladimir Makarov
@ 2014-10-22  7:45   ` Maxim Kuvyrkov
  2014-10-22 12:51     ` Richard Sandiford
  2014-10-22 14:47     ` Vladimir Makarov
  0 siblings, 2 replies; 18+ messages in thread
From: Maxim Kuvyrkov @ 2014-10-22  7:45 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: GCC Patches, Richard Sandiford, Sebastian Pop

[-- Attachment #1: Type: text/plain, Size: 3181 bytes --]

On Oct 22, 2014, at 4:24 AM, Vladimir Makarov <vmakarov@redhat.com> wrote:

> On 10/20/2014 02:57 AM, Maxim Kuvyrkov wrote:
>> Hi,
>> 
>> This patch improves register pressure scheduling (both SCHED_PRESSURE_WEIGHTED and SCHED_PRESSURE_MODEL) to better estimate number of available registers.
>> 
>> At the moment the scheduler does not account for spills in the prologues and restores in the epilogue, which occur from use of call-used registers.  The current state is, essentially, optimized for case when there is a hot loop inside the function, and the loop executes significantly more often than the prologue/epilogue.  However, on the opposite end, we have a case when the function is just a single non-cyclic basic block, which executes just as often as prologue / epilogue, so spills in the prologue hurt performance as much as spills in the basic block itself.  In such a case the scheduler should throttle-down on the number of available registers and try to not go beyond call-clobbered registers.
>> 
>> The patch uses basic block frequencies to balance the cost of using call-used registers for intermediate cases between the two above extremes.
>> 
>> The motivation for this patch was a floating-point testcase on arm-linux-gnueabihf (ARM is one of the few targets that use register pressure scheduling by default).
>> 
>> A "thanks" goes to Richard good discussion of the problem and suggestions on the approach to fix it.
>> 
>> The patch was bootstrapped on x86_64-linux-gnu (which doesn't really exercises the patch), and cross-tested on arm-linux-gnueabihf and aarch64-linux-gnu.
>> 
>> OK to apply?
>> 
> It is a pretty interesting idea for heuristic, Maxim.
> 
> But I don't understand the following loop:
> 
> +      for (i = 0; i < FIRST_PSEUDO_REGISTER; ++i)
> +	if (call_used_regs[i])
> +	  for (c = 0; c < ira_pressure_classes_num; ++c)
> +	    {
> +	      enum reg_class cl = ira_pressure_classes[c];
> +	      if (ira_class_hard_regs[cl][i])
> +		++call_used_regs_num[cl];
> 
> 
> ira_class_hard_regs[cl] is array containing hard registers belonging to
> class CL.  So if GENERAL_REGS consists of hard regs 0..3, 12..15,  the
> array will contain 8 elements 0..3, 12..15.  The array size is defined
> by ira_class_hard_regs_num[cl].  So the index is order number of hard
> reg in the class (starting from 0) but not hard register number itself. 
> Also the pressure classes never intersect so you can stop the inner loop
> when you find class to which hard reg belongs to.

Thanks for spotting this.  Indeed, this is a bug, but it still happened to correctly calculate numbers of call-used register for ARM (where I debugged the implementation).

> 
> I believe you should rewrite the code and get performance results again
> to get an approval.

Sebastian, could you run the geekbench again to make sure you see same performance numbers?

>   You also missed the changelog.
> 

The changelog was in the separate file.  Also attached here with the fixed patch.  Bootstrapped on x86_64-linux-gnu.  Bootstrap and regtest on arm-linux-gnueabihf is in progress.

--
Maxim Kuvyrkov
www.linaro.org



[-- Attachment #2: 0001-sched_class_reg_num.ChangeLog --]
[-- Type: application/octet-stream, Size: 537 bytes --]

Account for prologue spills in reg_pressure scheduling

	* haifa-sched.c (sched_class_regs_num, call_used_regs_num): New static
	arrays.  Use sched_class_regs_num instead of ira_class_hard_regs_num.
	(print_curr_reg_pressure, setup_insn_reg_pressure_info,)
	(model_update_pressure, model_spill_cost): Use sched_class_regs_num.
	(model_start_schedule): Update.
	(sched_pressure_start_bb): New static function.  Calculate
	sched_class_regs_num.
	(schedule_block): Use it.
	(alloc_global_sched_pressure_data): Calculate call_used_regs_num.

[-- Attachment #3: 0001-sched_class_reg_num.patch --]
[-- Type: application/octet-stream, Size: 8444 bytes --]

From 1643c4f2ec40feeb4987f5a000fac01304a31c1b Mon Sep 17 00:00:00 2001
From: Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org>
Date: Mon, 20 Oct 2014 05:04:23 +0100
Subject: [PATCH 1/7] sched_class_reg_num

---
 gcc/haifa-sched.c |  105 ++++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 91 insertions(+), 14 deletions(-)

diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c
index db8a45c..68d0b64 100644
--- a/gcc/haifa-sched.c
+++ b/gcc/haifa-sched.c
@@ -933,6 +933,13 @@ static bitmap saved_reg_live;
 /* Registers mentioned in the current region.  */
 static bitmap region_ref_regs;
 
+/* Effective number of available registers of a given class (see comment
+   in sched_pressure_start_bb).  */
+static int sched_class_regs_num[N_REG_CLASSES];
+/* Number of call_used_regs.  This is a helper for calculating of
+   sched_class_regs_num.  */
+static int call_used_regs_num[N_REG_CLASSES];
+
 /* Initiate register pressure relative info for scheduling the current
    region.  Currently it is only clearing register mentioned in the
    current region.  */
@@ -1116,7 +1123,7 @@ print_curr_reg_pressure (void)
       gcc_assert (curr_reg_pressure[cl] >= 0);
       fprintf (sched_dump, "  %s:%d(%d)", reg_class_names[cl],
 	       curr_reg_pressure[cl],
-	       curr_reg_pressure[cl] - ira_class_hard_regs_num[cl]);
+	       curr_reg_pressure[cl] - sched_class_regs_num[cl]);
     }
   fprintf (sched_dump, "\n");
 }
@@ -1731,9 +1738,9 @@ setup_insn_reg_pressure_info (rtx_insn *insn)
       cl = ira_pressure_classes[i];
       gcc_assert (curr_reg_pressure[cl] >= 0);
       change = (int) pressure_info[i].set_increase - death[cl];
-      before = MAX (0, max_reg_pressure[i] - ira_class_hard_regs_num[cl]);
+      before = MAX (0, max_reg_pressure[i] - sched_class_regs_num[cl]);
       after = MAX (0, max_reg_pressure[i] + change
-		   - ira_class_hard_regs_num[cl]);
+		   - sched_class_regs_num[cl]);
       hard_regno = ira_class_hard_regs[cl][0];
       gcc_assert (hard_regno >= 0);
       mode = reg_raw_mode[hard_regno];
@@ -2070,7 +2077,7 @@ model_update_pressure (struct model_pressure_group *group,
 
       /* Check whether the maximum pressure in the overall schedule
 	 has increased.  (This means that the MODEL_MAX_PRESSURE of
-	 every point <= POINT will need to increae too; see below.)  */
+	 every point <= POINT will need to increase too; see below.)  */
       if (group->limits[pci].pressure < ref_pressure)
 	group->limits[pci].pressure = ref_pressure;
 
@@ -2347,7 +2354,7 @@ must_restore_pattern_p (rtx_insn *next, dep_t dep)
 /* Return the cost of increasing the pressure in class CL from FROM to TO.
 
    Here we use the very simplistic cost model that every register above
-   ira_class_hard_regs_num[CL] has a spill cost of 1.  We could use other
+   sched_class_regs_num[CL] has a spill cost of 1.  We could use other
    measures instead, such as one based on MEMORY_MOVE_COST.  However:
 
       (1) In order for an instruction to be scheduled, the higher cost
@@ -2371,7 +2378,7 @@ must_restore_pattern_p (rtx_insn *next, dep_t dep)
 static int
 model_spill_cost (int cl, int from, int to)
 {
-  from = MAX (from, ira_class_hard_regs_num[cl]);
+  from = MAX (from, sched_class_regs_num[cl]);
   return MAX (to, from) - from;
 }
 
@@ -2477,7 +2484,7 @@ model_set_excess_costs (rtx_insn **insns, int count)
   bool print_p;
 
   /* Record the baseECC value for each instruction in the model schedule,
-     except that negative costs are converted to zero ones now rather thatn
+     except that negative costs are converted to zero ones now rather than
      later.  Do not assign a cost to debug instructions, since they must
      not change code-generation decisions.  Experiments suggest we also
      get better results by not assigning a cost to instructions from
@@ -3727,15 +3734,13 @@ model_dump_pressure_summary (void)
    scheduling region.  */
 
 static void
-model_start_schedule (void)
+model_start_schedule (basic_block bb)
 {
-  basic_block bb;
-
   model_next_priority = 1;
   model_schedule.create (sched_max_luid);
   model_insns = XCNEWVEC (struct model_insn_info, sched_max_luid);
 
-  bb = BLOCK_FOR_INSN (NEXT_INSN (current_sched_info->prev_head));
+  gcc_assert (bb == BLOCK_FOR_INSN (NEXT_INSN (current_sched_info->prev_head)));
   initiate_reg_pressure_info (df_get_live_in (bb));
 
   model_analyze_insns ();
@@ -3773,6 +3778,53 @@ model_end_schedule (void)
   model_finalize_pressure_group (&model_before_pressure);
   model_schedule.release ();
 }
+
+/* Prepare reg pressure scheduling for basic block BB.  */
+static void
+sched_pressure_start_bb (basic_block bb)
+{
+  /* Set the number of available registers for each class taking into account
+     relative probability of current basic block versus function prologue and
+     epilogue.
+     * If the basic block executes much more often than the prologue/epilogue
+     (e.g., inside a hot loop), then cost of spill in the prologue is close to
+     nil, so the effective number of available registers is
+     (ira_class_hard_regs_num[cl] - 0).
+     * If the basic block executes as often as the prologue/epilogue,
+     then spill in the block is as costly as in the prologue, so the effective
+     number of available registers is
+     (ira_class_hard_regs_num[cl] - call_used_regs_num[cl]).
+     Note that all-else-equal, we prefer to spill in the prologue, since that
+     allows "extra" registers for other basic blocks of the function.
+     * If the basic block is on the cold path of the function and executes
+     rarely, then we should always prefer to spill in the block, rather than
+     in the prologue/epilogue.  The effective number of available register is
+     (ira_class_hard_regs_num[cl] - call_used_regs_num[cl]).  */
+  {
+    int i;
+    int entry_freq = ENTRY_BLOCK_PTR_FOR_FN (cfun)->frequency;
+    int bb_freq = bb->frequency;
+
+    if (bb_freq == 0)
+      {
+	if (entry_freq == 0)
+	  entry_freq = bb_freq = 1;
+      }
+    if (bb_freq < entry_freq)
+      bb_freq = entry_freq;
+
+    for (i = 0; i < ira_pressure_classes_num; ++i)
+      {
+	enum reg_class cl = ira_pressure_classes[i];
+	sched_class_regs_num[cl] = ira_class_hard_regs_num[cl];
+	sched_class_regs_num[cl]
+	  -= (call_used_regs_num[cl] * entry_freq) / bb_freq;
+      }
+  }
+
+  if (sched_pressure == SCHED_PRESSURE_MODEL)
+    model_start_schedule (bb);
+}
 \f
 /* A structure that holds local state for the loop in schedule_block.  */
 struct sched_block_state
@@ -6053,8 +6105,8 @@ schedule_block (basic_block *target_bb, state_t init_state)
      in try_ready () (which is called through init_ready_list ()).  */
   (*current_sched_info->init_ready_list) ();
 
-  if (sched_pressure == SCHED_PRESSURE_MODEL)
-    model_start_schedule ();
+  if (sched_pressure)
+    sched_pressure_start_bb (*target_bb);
 
   /* The algorithm is O(n^2) in the number of ready insns at any given
      time in the worst case.  Before reload we are more likely to have
@@ -6681,7 +6733,7 @@ alloc_global_sched_pressure_data (void)
 {
   if (sched_pressure != SCHED_PRESSURE_NONE)
     {
-      int i, max_regno = max_reg_num ();
+      int i, c, max_regno = max_reg_num ();
 
       if (sched_dump != NULL)
 	/* We need info about pseudos for rtl dumps about pseudo
@@ -6701,6 +6753,31 @@ alloc_global_sched_pressure_data (void)
 	  saved_reg_live = BITMAP_ALLOC (NULL);
 	  region_ref_regs = BITMAP_ALLOC (NULL);
 	}
+
+      /* Calculate number of CALL_USED_REGS in register classes that
+	 we calculate register pressure for.  */
+      for (c = 0; c < ira_pressure_classes_num; ++c)
+	{
+	  enum reg_class cl = ira_pressure_classes[c];
+	  call_used_regs_num[cl] = 0;
+	}
+
+      for (i = 0; i < FIRST_PSEUDO_REGISTER; ++i)
+	if (call_used_regs[i])
+	  for (c = 0; c < ira_pressure_classes_num; ++c)
+	    {
+	      int j;
+	      enum reg_class cl = ira_pressure_classes[c];
+
+	      for (j = 0; j < ira_class_hard_regs_num[cl]; ++j)
+		if (ira_class_hard_regs[cl][j] == i)
+		  {
+		    /* Register I belongs to pressure class CL.  Pressure
+		       classes do not intersect, so don't look further.  */
+		    ++call_used_regs_num[cl];
+		    break;
+		  }
+	    }
     }
 }
 
-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Account for prologue spills in reg_pressure scheduling
  2014-10-22  7:45   ` Maxim Kuvyrkov
@ 2014-10-22 12:51     ` Richard Sandiford
  2014-10-22 14:47     ` Vladimir Makarov
  1 sibling, 0 replies; 18+ messages in thread
From: Richard Sandiford @ 2014-10-22 12:51 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: Vladimir Makarov, GCC Patches, Sebastian Pop

Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org> writes:
> +      for (i = 0; i < FIRST_PSEUDO_REGISTER; ++i)
> +	if (call_used_regs[i])
> +	  for (c = 0; c < ira_pressure_classes_num; ++c)
> +	    {
> +	      int j;
> +	      enum reg_class cl = ira_pressure_classes[c];
> +
> +	      for (j = 0; j < ira_class_hard_regs_num[cl]; ++j)
> +		if (ira_class_hard_regs[cl][j] == i)
> +		  {
> +		    /* Register I belongs to pressure class CL.  Pressure
> +		       classes do not intersect, so don't look further.  */
> +		    ++call_used_regs_num[cl];
> +		    break;
> +		  }
> +	    }

It'd be easier to iterate over the classes as the outer loop:

    for (int c = 0; c < ira_pressure_classes_num; ++c)
      {
        enum reg_class cl = ira_pressure_classes[c];
        for (int i = 0; i < ira_class_hard_regs_num[cl]; ++i)
          if (call_used_regs[ira_class_hard_regs[cl][i]])
            ...

Thanks,
Richard


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Account for prologue spills in reg_pressure scheduling
  2014-10-22  7:45   ` Maxim Kuvyrkov
  2014-10-22 12:51     ` Richard Sandiford
@ 2014-10-22 14:47     ` Vladimir Makarov
  2014-10-23  3:19       ` Maxim Kuvyrkov
  1 sibling, 1 reply; 18+ messages in thread
From: Vladimir Makarov @ 2014-10-22 14:47 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: GCC Patches, Richard Sandiford, Sebastian Pop

On 2014-10-22 2:17 AM, Maxim Kuvyrkov wrote:
> On Oct 22, 2014, at 4:24 AM, Vladimir Makarov <vmakarov@redhat.com> wrote:
>
>> On 10/20/2014 02:57 AM, Maxim Kuvyrkov wrote:
>>> Hi,
>>>
>>> This patch improves register pressure scheduling (both SCHED_PRESSURE_WEIGHTED and SCHED_PRESSURE_MODEL) to better estimate number of available registers.
>>>
>>> At the moment the scheduler does not account for spills in the prologues and restores in the epilogue, which occur from use of call-used registers.  The current state is, essentially, optimized for case when there is a hot loop inside the function, and the loop executes significantly more often than the prologue/epilogue.  However, on the opposite end, we have a case when the function is just a single non-cyclic basic block, which executes just as often as prologue / epilogue, so spills in the prologue hurt performance as much as spills in the basic block itself.  In such a case the scheduler should throttle-down on the number of available registers and try to not go beyond call-clobbered registers.
>>>
>>> The patch uses basic block frequencies to balance the cost of using call-used registers for intermediate cases between the two above extremes.
>>>
>>> The motivation for this patch was a floating-point testcase on arm-linux-gnueabihf (ARM is one of the few targets that use register pressure scheduling by default).
>>>
>>> A "thanks" goes to Richard good discussion of the problem and suggestions on the approach to fix it.
>>>
>>> The patch was bootstrapped on x86_64-linux-gnu (which doesn't really exercises the patch), and cross-tested on arm-linux-gnueabihf and aarch64-linux-gnu.
>>>
>>> OK to apply?
>>>
>> It is a pretty interesting idea for heuristic, Maxim.
>>
>> But I don't understand the following loop:
>>
>> +      for (i = 0; i < FIRST_PSEUDO_REGISTER; ++i)
>> +	if (call_used_regs[i])
>> +	  for (c = 0; c < ira_pressure_classes_num; ++c)
>> +	    {
>> +	      enum reg_class cl = ira_pressure_classes[c];
>> +	      if (ira_class_hard_regs[cl][i])
>> +		++call_used_regs_num[cl];
>>
>>
>> ira_class_hard_regs[cl] is array containing hard registers belonging to
>> class CL.  So if GENERAL_REGS consists of hard regs 0..3, 12..15,  the
>> array will contain 8 elements 0..3, 12..15.  The array size is defined
>> by ira_class_hard_regs_num[cl].  So the index is order number of hard
>> reg in the class (starting from 0) but not hard register number itself.
>> Also the pressure classes never intersect so you can stop the inner loop
>> when you find class to which hard reg belongs to.
>
> Thanks for spotting this.  Indeed, this is a bug, but it still happened to correctly calculate numbers of call-used register for ARM (where I debugged the implementation).
>

Ok.

>>
>> I believe you should rewrite the code and get performance results again
>> to get an approval.
>
> Sebastian, could you run the geekbench again to make sure you see same performance numbers?
>

I guess no need to this as the call-used register numbers were correctly 
calculated before for aarch64.  It means we will have the same results 
and the results as I read were good and promising to implement this 
heuristic in GCC.


>>    You also missed the changelog.
>>
>
> The changelog was in the separate file.  Also attached here with the fixed patch.  Bootstrapped on x86_64-linux-gnu.  Bootstrap and regtest on arm-linux-gnueabihf is in progress.
>

Richard proposed 2-loops solution instead of 3.  So I'd definitely 
prefer his.

The patch is ok with Richard's 2-loops proposal.  Thanks, Maxim.  As I 
wrote the heuristic is an interesting one.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Account for prologue spills in reg_pressure scheduling
  2014-10-22 14:47     ` Vladimir Makarov
@ 2014-10-23  3:19       ` Maxim Kuvyrkov
  2014-10-23  7:25         ` Richard Sandiford
  0 siblings, 1 reply; 18+ messages in thread
From: Maxim Kuvyrkov @ 2014-10-23  3:19 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: GCC Patches, Richard Sandiford, Sebastian Pop

[-- Attachment #1: Type: text/plain, Size: 417 bytes --]

On Oct 23, 2014, at 3:45 AM, Vladimir Makarov <vmakarov@redhat.com> wrote:

...
> Richard proposed 2-loops solution instead of 3.  So I'd definitely prefer his.
> 
> The patch is ok with Richard's 2-loops proposal.  Thanks, Maxim.  As I wrote the heuristic is an interesting one.

This is the patch I'm going to commit after some more testing.  Thank you both reviews!

--
Maxim Kuvyrkov
www.linaro.org


[-- Attachment #2: 0001-sched_class_reg_num.ChangeLog --]
[-- Type: application/octet-stream, Size: 537 bytes --]

Account for prologue spills in reg_pressure scheduling

	* haifa-sched.c (sched_class_regs_num, call_used_regs_num): New static
	arrays.  Use sched_class_regs_num instead of ira_class_hard_regs_num.
	(print_curr_reg_pressure, setup_insn_reg_pressure_info,)
	(model_update_pressure, model_spill_cost): Use sched_class_regs_num.
	(model_start_schedule): Update.
	(sched_pressure_start_bb): New static function.  Calculate
	sched_class_regs_num.
	(schedule_block): Use it.
	(alloc_global_sched_pressure_data): Calculate call_used_regs_num.

[-- Attachment #3: 0001-sched_class_reg_num.patch --]
[-- Type: application/octet-stream, Size: 7928 bytes --]

From 684c5361bbb9d60c73e2d9bdb5cf78fe50824f2b Mon Sep 17 00:00:00 2001
From: Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org>
Date: Mon, 20 Oct 2014 05:04:23 +0100
Subject: [PATCH 1/8] sched_class_reg_num

---
 gcc/haifa-sched.c |   96 +++++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 83 insertions(+), 13 deletions(-)

diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c
index db8a45c..724b107 100644
--- a/gcc/haifa-sched.c
+++ b/gcc/haifa-sched.c
@@ -933,6 +933,13 @@ static bitmap saved_reg_live;
 /* Registers mentioned in the current region.  */
 static bitmap region_ref_regs;
 
+/* Effective number of available registers of a given class (see comment
+   in sched_pressure_start_bb).  */
+static int sched_class_regs_num[N_REG_CLASSES];
+/* Number of call_used_regs.  This is a helper for calculating of
+   sched_class_regs_num.  */
+static int call_used_regs_num[N_REG_CLASSES];
+
 /* Initiate register pressure relative info for scheduling the current
    region.  Currently it is only clearing register mentioned in the
    current region.  */
@@ -1116,7 +1123,7 @@ print_curr_reg_pressure (void)
       gcc_assert (curr_reg_pressure[cl] >= 0);
       fprintf (sched_dump, "  %s:%d(%d)", reg_class_names[cl],
 	       curr_reg_pressure[cl],
-	       curr_reg_pressure[cl] - ira_class_hard_regs_num[cl]);
+	       curr_reg_pressure[cl] - sched_class_regs_num[cl]);
     }
   fprintf (sched_dump, "\n");
 }
@@ -1731,9 +1738,9 @@ setup_insn_reg_pressure_info (rtx_insn *insn)
       cl = ira_pressure_classes[i];
       gcc_assert (curr_reg_pressure[cl] >= 0);
       change = (int) pressure_info[i].set_increase - death[cl];
-      before = MAX (0, max_reg_pressure[i] - ira_class_hard_regs_num[cl]);
+      before = MAX (0, max_reg_pressure[i] - sched_class_regs_num[cl]);
       after = MAX (0, max_reg_pressure[i] + change
-		   - ira_class_hard_regs_num[cl]);
+		   - sched_class_regs_num[cl]);
       hard_regno = ira_class_hard_regs[cl][0];
       gcc_assert (hard_regno >= 0);
       mode = reg_raw_mode[hard_regno];
@@ -2070,7 +2077,7 @@ model_update_pressure (struct model_pressure_group *group,
 
       /* Check whether the maximum pressure in the overall schedule
 	 has increased.  (This means that the MODEL_MAX_PRESSURE of
-	 every point <= POINT will need to increae too; see below.)  */
+	 every point <= POINT will need to increase too; see below.)  */
       if (group->limits[pci].pressure < ref_pressure)
 	group->limits[pci].pressure = ref_pressure;
 
@@ -2347,7 +2354,7 @@ must_restore_pattern_p (rtx_insn *next, dep_t dep)
 /* Return the cost of increasing the pressure in class CL from FROM to TO.
 
    Here we use the very simplistic cost model that every register above
-   ira_class_hard_regs_num[CL] has a spill cost of 1.  We could use other
+   sched_class_regs_num[CL] has a spill cost of 1.  We could use other
    measures instead, such as one based on MEMORY_MOVE_COST.  However:
 
       (1) In order for an instruction to be scheduled, the higher cost
@@ -2371,7 +2378,7 @@ must_restore_pattern_p (rtx_insn *next, dep_t dep)
 static int
 model_spill_cost (int cl, int from, int to)
 {
-  from = MAX (from, ira_class_hard_regs_num[cl]);
+  from = MAX (from, sched_class_regs_num[cl]);
   return MAX (to, from) - from;
 }
 
@@ -2477,7 +2484,7 @@ model_set_excess_costs (rtx_insn **insns, int count)
   bool print_p;
 
   /* Record the baseECC value for each instruction in the model schedule,
-     except that negative costs are converted to zero ones now rather thatn
+     except that negative costs are converted to zero ones now rather than
      later.  Do not assign a cost to debug instructions, since they must
      not change code-generation decisions.  Experiments suggest we also
      get better results by not assigning a cost to instructions from
@@ -3727,15 +3734,13 @@ model_dump_pressure_summary (void)
    scheduling region.  */
 
 static void
-model_start_schedule (void)
+model_start_schedule (basic_block bb)
 {
-  basic_block bb;
-
   model_next_priority = 1;
   model_schedule.create (sched_max_luid);
   model_insns = XCNEWVEC (struct model_insn_info, sched_max_luid);
 
-  bb = BLOCK_FOR_INSN (NEXT_INSN (current_sched_info->prev_head));
+  gcc_assert (bb == BLOCK_FOR_INSN (NEXT_INSN (current_sched_info->prev_head)));
   initiate_reg_pressure_info (df_get_live_in (bb));
 
   model_analyze_insns ();
@@ -3773,6 +3778,53 @@ model_end_schedule (void)
   model_finalize_pressure_group (&model_before_pressure);
   model_schedule.release ();
 }
+
+/* Prepare reg pressure scheduling for basic block BB.  */
+static void
+sched_pressure_start_bb (basic_block bb)
+{
+  /* Set the number of available registers for each class taking into account
+     relative probability of current basic block versus function prologue and
+     epilogue.
+     * If the basic block executes much more often than the prologue/epilogue
+     (e.g., inside a hot loop), then cost of spill in the prologue is close to
+     nil, so the effective number of available registers is
+     (ira_class_hard_regs_num[cl] - 0).
+     * If the basic block executes as often as the prologue/epilogue,
+     then spill in the block is as costly as in the prologue, so the effective
+     number of available registers is
+     (ira_class_hard_regs_num[cl] - call_used_regs_num[cl]).
+     Note that all-else-equal, we prefer to spill in the prologue, since that
+     allows "extra" registers for other basic blocks of the function.
+     * If the basic block is on the cold path of the function and executes
+     rarely, then we should always prefer to spill in the block, rather than
+     in the prologue/epilogue.  The effective number of available register is
+     (ira_class_hard_regs_num[cl] - call_used_regs_num[cl]).  */
+  {
+    int i;
+    int entry_freq = ENTRY_BLOCK_PTR_FOR_FN (cfun)->frequency;
+    int bb_freq = bb->frequency;
+
+    if (bb_freq == 0)
+      {
+	if (entry_freq == 0)
+	  entry_freq = bb_freq = 1;
+      }
+    if (bb_freq < entry_freq)
+      bb_freq = entry_freq;
+
+    for (i = 0; i < ira_pressure_classes_num; ++i)
+      {
+	enum reg_class cl = ira_pressure_classes[i];
+	sched_class_regs_num[cl] = ira_class_hard_regs_num[cl];
+	sched_class_regs_num[cl]
+	  -= (call_used_regs_num[cl] * entry_freq) / bb_freq;
+      }
+  }
+
+  if (sched_pressure == SCHED_PRESSURE_MODEL)
+    model_start_schedule (bb);
+}
 \f
 /* A structure that holds local state for the loop in schedule_block.  */
 struct sched_block_state
@@ -6053,8 +6105,8 @@ schedule_block (basic_block *target_bb, state_t init_state)
      in try_ready () (which is called through init_ready_list ()).  */
   (*current_sched_info->init_ready_list) ();
 
-  if (sched_pressure == SCHED_PRESSURE_MODEL)
-    model_start_schedule ();
+  if (sched_pressure)
+    sched_pressure_start_bb (*target_bb);
 
   /* The algorithm is O(n^2) in the number of ready insns at any given
      time in the worst case.  Before reload we are more likely to have
@@ -6701,6 +6753,24 @@ alloc_global_sched_pressure_data (void)
 	  saved_reg_live = BITMAP_ALLOC (NULL);
 	  region_ref_regs = BITMAP_ALLOC (NULL);
 	}
+
+      /* Calculate number of CALL_USED_REGS in register classes that
+	 we calculate register pressure for.  */
+      for (int c = 0; c < ira_pressure_classes_num; ++c)
+	{
+	  enum reg_class cl = ira_pressure_classes[c];
+
+	  call_used_regs_num[cl] = 0;
+
+	  for (int i = 0; i < ira_class_hard_regs_num[cl]; ++i)
+	    if (call_used_regs[ira_class_hard_regs[cl][i]])
+	      {
+		/* Register I belongs to pressure class CL.  Pressure
+		   classes do not intersect, so don't look further.  */
+		++call_used_regs_num[cl];
+		break;
+	      }
+	}
     }
 }
 
-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Account for prologue spills in reg_pressure scheduling
  2014-10-23  3:19       ` Maxim Kuvyrkov
@ 2014-10-23  7:25         ` Richard Sandiford
  2014-10-23  7:28           ` Maxim Kuvyrkov
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Sandiford @ 2014-10-23  7:25 UTC (permalink / raw)
  To: Maxim Kuvyrkov; +Cc: Vladimir Makarov, GCC Patches, Sebastian Pop

Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org> writes:
> @@ -6701,6 +6753,24 @@ alloc_global_sched_pressure_data (void)
>  	  saved_reg_live = BITMAP_ALLOC (NULL);
>  	  region_ref_regs = BITMAP_ALLOC (NULL);
>  	}
> +
> +      /* Calculate number of CALL_USED_REGS in register classes that
> +	 we calculate register pressure for.  */
> +      for (int c = 0; c < ira_pressure_classes_num; ++c)
> +	{
> +	  enum reg_class cl = ira_pressure_classes[c];
> +
> +	  call_used_regs_num[cl] = 0;
> +
> +	  for (int i = 0; i < ira_class_hard_regs_num[cl]; ++i)
> +	    if (call_used_regs[ira_class_hard_regs[cl][i]])
> +	      {
> +		/* Register I belongs to pressure class CL.  Pressure
> +		   classes do not intersect, so don't look further.  */
> +		++call_used_regs_num[cl];
> +		break;
> +	      }
> +	}

I don't think we want the break here.  The effect would be to count
at most one call-used register per pressure class.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Account for prologue spills in reg_pressure scheduling
  2014-10-23  7:25         ` Richard Sandiford
@ 2014-10-23  7:28           ` Maxim Kuvyrkov
  0 siblings, 0 replies; 18+ messages in thread
From: Maxim Kuvyrkov @ 2014-10-23  7:28 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: Vladimir Makarov, GCC Patches, Sebastian Pop

On Oct 23, 2014, at 8:20 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:

> Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org> writes:
>> @@ -6701,6 +6753,24 @@ alloc_global_sched_pressure_data (void)
>> 	  saved_reg_live = BITMAP_ALLOC (NULL);
>> 	  region_ref_regs = BITMAP_ALLOC (NULL);
>> 	}
>> +
>> +      /* Calculate number of CALL_USED_REGS in register classes that
>> +	 we calculate register pressure for.  */
>> +      for (int c = 0; c < ira_pressure_classes_num; ++c)
>> +	{
>> +	  enum reg_class cl = ira_pressure_classes[c];
>> +
>> +	  call_used_regs_num[cl] = 0;
>> +
>> +	  for (int i = 0; i < ira_class_hard_regs_num[cl]; ++i)
>> +	    if (call_used_regs[ira_class_hard_regs[cl][i]])
>> +	      {
>> +		/* Register I belongs to pressure class CL.  Pressure
>> +		   classes do not intersect, so don't look further.  */
>> +		++call_used_regs_num[cl];
>> +		break;
>> +	      }
>> +	}
> 
> I don't think we want the break here.  The effect would be to count
> at most one call-used register per pressure class.

You make me feel dumb :-)

Thank you,

--
Maxim Kuvyrkov
www.linaro.org

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2014-10-23  7:25 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-20  7:03 [PATCH] Account for prologue spills in reg_pressure scheduling Maxim Kuvyrkov
2014-10-20 19:13 ` Sebastian Pop
2014-10-20 19:23   ` Maxim Kuvyrkov
2014-10-20 20:44     ` Sebastian Pop
2014-10-20 20:59       ` Maxim Kuvyrkov
2014-10-20 21:21         ` Richard Sandiford
2014-10-20 21:57           ` Ramana Radhakrishnan
2014-10-20 22:27             ` Maxim Kuvyrkov
2014-10-21  0:01             ` Sebastian Pop
2014-10-20 21:21         ` Sebastian Pop
2014-10-20 22:13         ` Evandro Menezes
2014-10-21 15:27 ` Vladimir Makarov
2014-10-22  7:45   ` Maxim Kuvyrkov
2014-10-22 12:51     ` Richard Sandiford
2014-10-22 14:47     ` Vladimir Makarov
2014-10-23  3:19       ` Maxim Kuvyrkov
2014-10-23  7:25         ` Richard Sandiford
2014-10-23  7:28           ` Maxim Kuvyrkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).