[PATCH GCC8][29/33]New register pressure estimation

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH GCC8][29/33]New register pressure estimation
@ 2017-04-18 10:53 Bin Cheng
  2017-05-11 10:53 ` Richard Biener
  0 siblings, 1 reply; 10+ messages in thread
From: Bin Cheng @ 2017-04-18 10:53 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd

[-- Attachment #1: Type: text/plain, Size: 1302 bytes --]

Hi,
Currently IVOPTs shares the same register pressure computation with RTL loop invariant pass,
which doesn't work very well.  This patch introduces specific interface for IVOPTs.
The general idea is described in the cover message as below:
  C) Current implementation shares the same register pressure computation with RTL loop
     inv pass.  It has difficulty in handling (especially large) loop nest, and quite
     often generating too many candidates (especially for outer loops).  This change
     introduces new register pressure computation.  The brief idea is to differentiate
     (hot) innermost loop and outer loop.  for (possibly hot) inner most, more registers
     are allowed as long as the register pressure is within the range of number of target
     available registers.
It can also help to restrict number of candidates for outer loop.
Is it OK?

Thanks,
bin
2017-04-11  Bin Cheng  <bin.cheng@arm.com>

	* tree-ssa-loop-ivopts.c (struct ivopts_data): New field.
	(ivopts_estimate_reg_pressure): New reg_pressure model function.
	(ivopts_global_cost_for_size): Delete.
	(determine_set_costs, iv_ca_recount_cost): Call new model function
	ivopts_estimate_reg_pressure.
	(determine_hot_innermost_loop): New.
	(tree_ssa_iv_optimize_loop): Call above function.

[-- Attachment #2: 0029-ivopt-reg_pressure-model-20170223.txt --]
[-- Type: text/plain, Size: 5465 bytes --]

From 2b6f11666a86f740a7f813eca26905ce15691d5e Mon Sep 17 00:00:00 2001
From: Bin Cheng <binche01@e108451-lin.cambridge.arm.com>
Date: Fri, 10 Mar 2017 11:03:16 +0000
Subject: [PATCH 29/33] ivopt-reg_pressure-model-20170223.txt

---
 gcc/tree-ssa-loop-ivopts.c | 82 ++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 72 insertions(+), 10 deletions(-)

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index db8254c..464f96e 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -589,6 +589,9 @@ struct ivopts_data
 
   /* Whether the loop body can only be exited via single exit.  */
   bool loop_single_exit_p;
+
+  /* The current loop is the innermost loop and maybe hot.  */
+  bool hot_innermost_loop_p;
 };
 
 /* An assignment of iv candidates to uses.  */
@@ -5537,17 +5540,51 @@ determine_iv_costs (struct ivopts_data *data)
     fprintf (dump_file, "\n");
 }
 
-/* Calculates cost for having N_REGS registers.  This number includes
-   induction variables, invariant variables and invariant expressions.  */
+/* Estimate register pressure for loop having N_INVS invariants and N_CANDS
+   induction variables.  Note N_INVS includes both invariant variables and
+   invariant expressions.  */
 
 static unsigned
-ivopts_global_cost_for_size (struct ivopts_data *data, unsigned n_regs)
+ivopts_estimate_reg_pressure (struct ivopts_data *data, unsigned n_invs,
+			      unsigned n_cands)
 {
-  unsigned cost = estimate_reg_pressure_cost (n_regs,
-					      data->regs_used, data->speed,
-					      data->body_includes_call);
-  /* Add n_regs to the cost, so that we prefer eliminating ivs if possible.  */
-  return n_regs + cost;
+  unsigned cost;
+  unsigned n_old = data->regs_used, n_new = n_invs + n_cands;
+  unsigned regs_needed = n_new + n_old, available_regs = target_avail_regs;
+  bool speed = data->speed, hot_p = data->hot_innermost_loop_p;
+
+  /* If there is a call in the loop body, the call-clobbered registers
+     are not available for loop invariants.  */
+  if (data->body_includes_call)
+    available_regs = available_regs - target_clobbered_regs;
+
+  /* If we have enough registers.  */
+  if (regs_needed + target_res_regs < available_regs)
+    {
+      /* For the maybe hot innermost loop, we use available registers and
+	 not restrict the transformations unnecessarily.  For other loops,
+	 we want to use fewer register.  */
+      cost = hot_p ? 0 : target_reg_cost [speed] * n_new; //regs_needed;
+    }
+  /* If close to running out of registers, try to preserve them.  */
+  else if (regs_needed <= available_regs)
+    cost = target_reg_cost [speed] * regs_needed;
+  /* If we run out of available registers but the number of candidates
+     does not, we penalize extra registers using target_spill_cost.  */
+  else if (n_cands <= available_regs)
+    cost = target_reg_cost [speed] * available_regs
+	   + target_spill_cost [speed] * (regs_needed - available_regs);
+  /* If the number of candidates runs out available registers, we penalize
+     extra candidate registers using target_spill_cost * 2.  Because it is
+     more expensive to spill induction variable than invariant.  */
+  else
+    cost = target_reg_cost [speed] * available_regs
+	   + target_spill_cost [speed] * (n_cands - available_regs) * 2
+	   + target_spill_cost [speed] * (regs_needed - n_cands);
+
+  /* Finally, add the number of candidates, so that we prefer eliminating
+     induction variables if possible.  */
+  return cost + n_cands;
 }
 
 /* For each size of the induction variable set determine the penalty.  */
@@ -5607,7 +5644,7 @@ determine_set_costs (struct ivopts_data *data)
       fprintf (dump_file, "  ivs\tcost\n");
       for (j = 0; j <= 2 * target_avail_regs; j++)
 	fprintf (dump_file, "  %d\t%d\n", j,
-		 ivopts_global_cost_for_size (data, j));
+		 ivopts_estimate_reg_pressure (data, 0, j));
       fprintf (dump_file, "\n");
     }
 }
@@ -5666,7 +5703,7 @@ iv_ca_recount_cost (struct ivopts_data *data, struct iv_ca *ivs)
   comp_cost cost = ivs->cand_use_cost;
 
   cost += ivs->cand_cost;
-  cost += ivopts_global_cost_for_size (data, ivs->n_invs + ivs->n_cands);
+  cost += ivopts_estimate_reg_pressure (data, ivs->n_invs, ivs->n_cands);
   ivs->cost = cost;
 }
 
@@ -7367,6 +7404,30 @@ loop_body_includes_call (basic_block *body, unsigned num_nodes)
   return false;
 }
 
+/* Determine if current loop is the innermost loop and maybe hot.  */
+
+static void
+determine_hot_innermost_loop (struct ivopts_data *data)
+{
+  data->hot_innermost_loop_p = true;
+  if (!data->speed)
+    return;
+
+  struct loop *loop = data->current_loop;
+  if (loop->inner != NULL)
+    {
+      data->hot_innermost_loop_p = false;
+      return;
+    }
+
+  HOST_WIDE_INT niter = avg_loop_niter (loop);
+  if (niter < PARAM_VALUE (PARAM_AVG_LOOP_NITER)
+      || loop_constraint_set_p (loop, LOOP_C_PROLOG)
+      || loop_constraint_set_p (loop, LOOP_C_EPILOG)
+      || loop_constraint_set_p (loop, LOOP_C_VERSION))
+    data->hot_innermost_loop_p = false;
+}
+
 /* Optimizes the LOOP.  Returns true if anything changed.  */
 
 static bool
@@ -7381,6 +7442,7 @@ tree_ssa_iv_optimize_loop (struct ivopts_data *data, struct loop *loop)
   data->current_loop = loop;
   data->loop_loc = find_loop_location (loop);
   data->speed = optimize_loop_for_speed_p (loop);
+  determine_hot_innermost_loop (data);
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
-- 
1.9.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH GCC8][29/33]New register pressure estimation
  2017-04-18 10:53 [PATCH GCC8][29/33]New register pressure estimation Bin Cheng
@ 2017-05-11 10:53 ` Richard Biener
  2017-05-11 10:54   ` Bin.Cheng
  2017-05-15 15:56   ` Bin.Cheng
  0 siblings, 2 replies; 10+ messages in thread
From: Richard Biener @ 2017-05-11 10:53 UTC (permalink / raw)
  To: Bin Cheng; +Cc: gcc-patches, nd

On Tue, Apr 18, 2017 at 12:53 PM, Bin Cheng <Bin.Cheng@arm.com> wrote:
> Hi,
> Currently IVOPTs shares the same register pressure computation with RTL loop invariant pass,
> which doesn't work very well.  This patch introduces specific interface for IVOPTs.
> The general idea is described in the cover message as below:
>   C) Current implementation shares the same register pressure computation with RTL loop
>      inv pass.  It has difficulty in handling (especially large) loop nest, and quite
>      often generating too many candidates (especially for outer loops).  This change
>      introduces new register pressure computation.  The brief idea is to differentiate
>      (hot) innermost loop and outer loop.  for (possibly hot) inner most, more registers
>      are allowed as long as the register pressure is within the range of number of target
>      available registers.
> It can also help to restrict number of candidates for outer loop.
> Is it OK?

+/* Determine if current loop is the innermost loop and maybe hot.  */
+
+static void
+determine_hot_innermost_loop (struct ivopts_data *data)
+{
+  data->hot_innermost_loop_p = true;
+  if (!data->speed)
+    return;

err, so when not optimizing for speed we assume all loops (even not innermost)
are hot and innermost?!

+  HOST_WIDE_INT niter = avg_loop_niter (loop);
+  if (niter < PARAM_VALUE (PARAM_AVG_LOOP_NITER)
+      || loop_constraint_set_p (loop, LOOP_C_PROLOG)
+      || loop_constraint_set_p (loop, LOOP_C_EPILOG)
+      || loop_constraint_set_p (loop, LOOP_C_VERSION))
+    data->hot_innermost_loop_p = false;

this needs adjustment for the constraint patch removal.  Also looking at niter
of the loop in question insn't a good metric for hotness.  data->speed is the
best guess you get I think (optimize_loop_for_speed_p).

   data->speed = optimize_loop_for_speed_p (loop);
+  determine_hot_innermost_loop (data);

  data->hot_innermost_loop_p = determine_hot_innermost_loop (data);

would be more consistent here.

Thanks,
Richard.

> Thanks,
> bin
> 2017-04-11  Bin Cheng  <bin.cheng@arm.com>
>
>         * tree-ssa-loop-ivopts.c (struct ivopts_data): New field.
>         (ivopts_estimate_reg_pressure): New reg_pressure model function.
>         (ivopts_global_cost_for_size): Delete.
>         (determine_set_costs, iv_ca_recount_cost): Call new model function
>         ivopts_estimate_reg_pressure.
>         (determine_hot_innermost_loop): New.
>         (tree_ssa_iv_optimize_loop): Call above function.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH GCC8][29/33]New register pressure estimation
  2017-05-11 10:53 ` Richard Biener
@ 2017-05-11 10:54   ` Bin.Cheng
  2017-05-15 15:56   ` Bin.Cheng
  1 sibling, 0 replies; 10+ messages in thread
From: Bin.Cheng @ 2017-05-11 10:54 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

On Thu, May 11, 2017 at 11:39 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Tue, Apr 18, 2017 at 12:53 PM, Bin Cheng <Bin.Cheng@arm.com> wrote:
>> Hi,
>> Currently IVOPTs shares the same register pressure computation with RTL loop invariant pass,
>> which doesn't work very well.  This patch introduces specific interface for IVOPTs.
>> The general idea is described in the cover message as below:
>>   C) Current implementation shares the same register pressure computation with RTL loop
>>      inv pass.  It has difficulty in handling (especially large) loop nest, and quite
>>      often generating too many candidates (especially for outer loops).  This change
>>      introduces new register pressure computation.  The brief idea is to differentiate
>>      (hot) innermost loop and outer loop.  for (possibly hot) inner most, more registers
>>      are allowed as long as the register pressure is within the range of number of target
>>      available registers.
>> It can also help to restrict number of candidates for outer loop.
>> Is it OK?
>
> +/* Determine if current loop is the innermost loop and maybe hot.  */
> +
> +static void
> +determine_hot_innermost_loop (struct ivopts_data *data)
> +{
> +  data->hot_innermost_loop_p = true;
> +  if (!data->speed)
> +    return;
>
> err, so when not optimizing for speed we assume all loops (even not innermost)
> are hot and innermost?!
>
> +  HOST_WIDE_INT niter = avg_loop_niter (loop);
> +  if (niter < PARAM_VALUE (PARAM_AVG_LOOP_NITER)
> +      || loop_constraint_set_p (loop, LOOP_C_PROLOG)
> +      || loop_constraint_set_p (loop, LOOP_C_EPILOG)
> +      || loop_constraint_set_p (loop, LOOP_C_VERSION))
> +    data->hot_innermost_loop_p = false;
>
> this needs adjustment for the constraint patch removal.  Also looking at niter
> of the loop in question insn't a good metric for hotness.  data->speed is the
> best guess you get I think (optimize_loop_for_speed_p).
Yes, I also found this is inefficient finding the targeting loops.  I
will update and benchmark new patch discarding all *hot innermost*
part.  It can always be improved in future.

Thanks,
bin
>
>    data->speed = optimize_loop_for_speed_p (loop);
> +  determine_hot_innermost_loop (data);
>
>   data->hot_innermost_loop_p = determine_hot_innermost_loop (data);
>
> would be more consistent here.
>
> Thanks,
> Richard.
>
>> Thanks,
>> bin
>> 2017-04-11  Bin Cheng  <bin.cheng@arm.com>
>>
>>         * tree-ssa-loop-ivopts.c (struct ivopts_data): New field.
>>         (ivopts_estimate_reg_pressure): New reg_pressure model function.
>>         (ivopts_global_cost_for_size): Delete.
>>         (determine_set_costs, iv_ca_recount_cost): Call new model function
>>         ivopts_estimate_reg_pressure.
>>         (determine_hot_innermost_loop): New.
>>         (tree_ssa_iv_optimize_loop): Call above function.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH GCC8][29/33]New register pressure estimation
  2017-05-11 10:53 ` Richard Biener
  2017-05-11 10:54   ` Bin.Cheng
@ 2017-05-15 15:56   ` Bin.Cheng
  2017-05-17 12:27     ` Richard Biener
  2017-05-17 12:44     ` Richard Sandiford
  1 sibling, 2 replies; 10+ messages in thread
From: Bin.Cheng @ 2017-05-15 15:56 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 3054 bytes --]

On Thu, May 11, 2017 at 11:39 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Tue, Apr 18, 2017 at 12:53 PM, Bin Cheng <Bin.Cheng@arm.com> wrote:
>> Hi,
>> Currently IVOPTs shares the same register pressure computation with RTL loop invariant pass,
>> which doesn't work very well.  This patch introduces specific interface for IVOPTs.
>> The general idea is described in the cover message as below:
>>   C) Current implementation shares the same register pressure computation with RTL loop
>>      inv pass.  It has difficulty in handling (especially large) loop nest, and quite
>>      often generating too many candidates (especially for outer loops).  This change
>>      introduces new register pressure computation.  The brief idea is to differentiate
>>      (hot) innermost loop and outer loop.  for (possibly hot) inner most, more registers
>>      are allowed as long as the register pressure is within the range of number of target
>>      available registers.
>> It can also help to restrict number of candidates for outer loop.
>> Is it OK?
>
> +/* Determine if current loop is the innermost loop and maybe hot.  */
> +
> +static void
> +determine_hot_innermost_loop (struct ivopts_data *data)
> +{
> +  data->hot_innermost_loop_p = true;
> +  if (!data->speed)
> +    return;
>
> err, so when not optimizing for speed we assume all loops (even not innermost)
> are hot and innermost?!
>
> +  HOST_WIDE_INT niter = avg_loop_niter (loop);
> +  if (niter < PARAM_VALUE (PARAM_AVG_LOOP_NITER)
> +      || loop_constraint_set_p (loop, LOOP_C_PROLOG)
> +      || loop_constraint_set_p (loop, LOOP_C_EPILOG)
> +      || loop_constraint_set_p (loop, LOOP_C_VERSION))
> +    data->hot_innermost_loop_p = false;
>
> this needs adjustment for the constraint patch removal.  Also looking at niter
> of the loop in question insn't a good metric for hotness.  data->speed is the
> best guess you get I think (optimize_loop_for_speed_p).
>
>    data->speed = optimize_loop_for_speed_p (loop);
> +  determine_hot_innermost_loop (data);
>
>   data->hot_innermost_loop_p = determine_hot_innermost_loop (data);
>
> would be more consistent here.
Hi,
I removed the hot innermost part and here is the updated version.  Is it OK?

Thanks,
bin

2017-05-11  Bin Cheng  <bin.cheng@arm.com>

    * tree-ssa-loop-ivopts.c (ivopts_estimate_reg_pressure): New
    reg_pressure model function.
    (ivopts_global_cost_for_size): Delete.
    (determine_set_costs, iv_ca_recount_cost): Call new model function
    ivopts_estimate_reg_pressure.

>
> Thanks,
> Richard.
>
>> Thanks,
>> bin
>> 2017-04-11  Bin Cheng  <bin.cheng@arm.com>
>>
>>         * tree-ssa-loop-ivopts.c (struct ivopts_data): New field.
>>         (ivopts_estimate_reg_pressure): New reg_pressure model function.
>>         (ivopts_global_cost_for_size): Delete.
>>         (determine_set_costs, iv_ca_recount_cost): Call new model function
>>         ivopts_estimate_reg_pressure.
>>         (determine_hot_innermost_loop): New.
>>         (tree_ssa_iv_optimize_loop): Call above function.

[-- Attachment #2: 0001-ivopt-reg_pressure-model-20170225.txt.patch --]
[-- Type: text/x-patch, Size: 3710 bytes --]

From 3ca5cb6bafb516c68cad9d6fd3adbbe73bec4d19 Mon Sep 17 00:00:00 2001
From: Bin Cheng <binche01@e108451-lin.cambridge.arm.com>
Date: Fri, 10 Mar 2017 11:03:16 +0000
Subject: [PATCH 1/9] ivopt-reg_pressure-model-20170225.txt

---
 gcc/tree-ssa-loop-ivopts.c | 49 ++++++++++++++++++++++++++++++++++++----------
 1 file changed, 39 insertions(+), 10 deletions(-)

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 8b228ca..7caed10 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -5531,17 +5531,46 @@ determine_iv_costs (struct ivopts_data *data)
     fprintf (dump_file, "\n");
 }
 
-/* Calculates cost for having N_REGS registers.  This number includes
-   induction variables, invariant variables and invariant expressions.  */
+/* Estimate register pressure for loop having N_INVS invariants and N_CANDS
+   induction variables.  Note N_INVS includes both invariant variables and
+   invariant expressions.  */
 
 static unsigned
-ivopts_global_cost_for_size (struct ivopts_data *data, unsigned n_regs)
+ivopts_estimate_reg_pressure (struct ivopts_data *data, unsigned n_invs,
+			      unsigned n_cands)
 {
-  unsigned cost = estimate_reg_pressure_cost (n_regs,
-					      data->regs_used, data->speed,
-					      data->body_includes_call);
-  /* Add n_regs to the cost, so that we prefer eliminating ivs if possible.  */
-  return n_regs + cost;
+  unsigned cost;
+  unsigned n_old = data->regs_used, n_new = n_invs + n_cands;
+  unsigned regs_needed = n_new + n_old, available_regs = target_avail_regs;
+  bool speed = data->speed;
+
+  /* If there is a call in the loop body, the call-clobbered registers
+     are not available for loop invariants.  */
+  if (data->body_includes_call)
+    available_regs = available_regs - target_clobbered_regs;
+
+  /* If we have enough registers.  */
+  if (regs_needed + target_res_regs < available_regs)
+    cost = n_new;
+  /* If close to running out of registers, try to preserve them.  */
+  else if (regs_needed <= available_regs)
+    cost = target_reg_cost [speed] * regs_needed;
+  /* If we run out of available registers but the number of candidates
+     does not, we penalize extra registers using target_spill_cost.  */
+  else if (n_cands <= available_regs)
+    cost = target_reg_cost [speed] * available_regs
+	   + target_spill_cost [speed] * (regs_needed - available_regs);
+  /* If the number of candidates runs out available registers, we penalize
+     extra candidate registers using target_spill_cost * 2.  Because it is
+     more expensive to spill induction variable than invariant.  */
+  else
+    cost = target_reg_cost [speed] * available_regs
+	   + target_spill_cost [speed] * (n_cands - available_regs) * 2
+	   + target_spill_cost [speed] * (regs_needed - n_cands);
+
+  /* Finally, add the number of candidates, so that we prefer eliminating
+     induction variables if possible.  */
+  return cost + n_cands;
 }
 
 /* For each size of the induction variable set determine the penalty.  */
@@ -5602,7 +5631,7 @@ determine_set_costs (struct ivopts_data *data)
       fprintf (dump_file, "  ivs\tcost\n");
       for (j = 0; j <= 2 * target_avail_regs; j++)
 	fprintf (dump_file, "  %d\t%d\n", j,
-		 ivopts_global_cost_for_size (data, j));
+		 ivopts_estimate_reg_pressure (data, 0, j));
       fprintf (dump_file, "\n");
     }
 }
@@ -5661,7 +5690,7 @@ iv_ca_recount_cost (struct ivopts_data *data, struct iv_ca *ivs)
   comp_cost cost = ivs->cand_use_cost;
 
   cost += ivs->cand_cost;
-  cost += ivopts_global_cost_for_size (data, ivs->n_invs + ivs->n_cands);
+  cost += ivopts_estimate_reg_pressure (data, ivs->n_invs, ivs->n_cands);
   ivs->cost = cost;
 }
 
-- 
1.9.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH GCC8][29/33]New register pressure estimation
  2017-05-15 15:56   ` Bin.Cheng
@ 2017-05-17 12:27     ` Richard Biener
  2017-06-07 11:02       ` Bin.Cheng
  2017-05-17 12:44     ` Richard Sandiford
  1 sibling, 1 reply; 10+ messages in thread
From: Richard Biener @ 2017-05-17 12:27 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: gcc-patches

On Mon, May 15, 2017 at 5:50 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Thu, May 11, 2017 at 11:39 AM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Tue, Apr 18, 2017 at 12:53 PM, Bin Cheng <Bin.Cheng@arm.com> wrote:
>>> Hi,
>>> Currently IVOPTs shares the same register pressure computation with RTL loop invariant pass,
>>> which doesn't work very well.  This patch introduces specific interface for IVOPTs.
>>> The general idea is described in the cover message as below:
>>>   C) Current implementation shares the same register pressure computation with RTL loop
>>>      inv pass.  It has difficulty in handling (especially large) loop nest, and quite
>>>      often generating too many candidates (especially for outer loops).  This change
>>>      introduces new register pressure computation.  The brief idea is to differentiate
>>>      (hot) innermost loop and outer loop.  for (possibly hot) inner most, more registers
>>>      are allowed as long as the register pressure is within the range of number of target
>>>      available registers.
>>> It can also help to restrict number of candidates for outer loop.
>>> Is it OK?
>>
>> +/* Determine if current loop is the innermost loop and maybe hot.  */
>> +
>> +static void
>> +determine_hot_innermost_loop (struct ivopts_data *data)
>> +{
>> +  data->hot_innermost_loop_p = true;
>> +  if (!data->speed)
>> +    return;
>>
>> err, so when not optimizing for speed we assume all loops (even not innermost)
>> are hot and innermost?!
>>
>> +  HOST_WIDE_INT niter = avg_loop_niter (loop);
>> +  if (niter < PARAM_VALUE (PARAM_AVG_LOOP_NITER)
>> +      || loop_constraint_set_p (loop, LOOP_C_PROLOG)
>> +      || loop_constraint_set_p (loop, LOOP_C_EPILOG)
>> +      || loop_constraint_set_p (loop, LOOP_C_VERSION))
>> +    data->hot_innermost_loop_p = false;
>>
>> this needs adjustment for the constraint patch removal.  Also looking at niter
>> of the loop in question insn't a good metric for hotness.  data->speed is the
>> best guess you get I think (optimize_loop_for_speed_p).
>>
>>    data->speed = optimize_loop_for_speed_p (loop);
>> +  determine_hot_innermost_loop (data);
>>
>>   data->hot_innermost_loop_p = determine_hot_innermost_loop (data);
>>
>> would be more consistent here.
> Hi,
> I removed the hot innermost part and here is the updated version.  Is it OK?

Ok.

> Thanks,
> bin
>
> 2017-05-11  Bin Cheng  <bin.cheng@arm.com>
>
>     * tree-ssa-loop-ivopts.c (ivopts_estimate_reg_pressure): New
>     reg_pressure model function.
>     (ivopts_global_cost_for_size): Delete.
>     (determine_set_costs, iv_ca_recount_cost): Call new model function
>     ivopts_estimate_reg_pressure.
>
>>
>> Thanks,
>> Richard.
>>
>>> Thanks,
>>> bin
>>> 2017-04-11  Bin Cheng  <bin.cheng@arm.com>
>>>
>>>         * tree-ssa-loop-ivopts.c (struct ivopts_data): New field.
>>>         (ivopts_estimate_reg_pressure): New reg_pressure model function.
>>>         (ivopts_global_cost_for_size): Delete.
>>>         (determine_set_costs, iv_ca_recount_cost): Call new model function
>>>         ivopts_estimate_reg_pressure.
>>>         (determine_hot_innermost_loop): New.
>>>         (tree_ssa_iv_optimize_loop): Call above function.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH GCC8][29/33]New register pressure estimation
  2017-05-15 15:56   ` Bin.Cheng
  2017-05-17 12:27     ` Richard Biener
@ 2017-05-17 12:44     ` Richard Sandiford
  2017-05-18 10:25       ` Bin.Cheng
  1 sibling, 1 reply; 10+ messages in thread
From: Richard Sandiford @ 2017-05-17 12:44 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: Richard Biener, gcc-patches

"Bin.Cheng" <amker.cheng@gmail.com> writes:
> -/* Calculates cost for having N_REGS registers.  This number includes
> -   induction variables, invariant variables and invariant expressions.  */
> +/* Estimate register pressure for loop having N_INVS invariants and N_CANDS
> +   induction variables.  Note N_INVS includes both invariant variables and
> +   invariant expressions.  */
>  
>  static unsigned
> -ivopts_global_cost_for_size (struct ivopts_data *data, unsigned n_regs)
> +ivopts_estimate_reg_pressure (struct ivopts_data *data, unsigned n_invs,
> +			      unsigned n_cands)
>  {
> -  unsigned cost = estimate_reg_pressure_cost (n_regs,
> -					      data->regs_used, data->speed,
> -					      data->body_includes_call);
> -  /* Add n_regs to the cost, so that we prefer eliminating ivs if possible.  */
> -  return n_regs + cost;
> +  unsigned cost;
> +  unsigned n_old = data->regs_used, n_new = n_invs + n_cands;
> +  unsigned regs_needed = n_new + n_old, available_regs = target_avail_regs;
> +  bool speed = data->speed;
> +
> +  /* If there is a call in the loop body, the call-clobbered registers
> +     are not available for loop invariants.  */
> +  if (data->body_includes_call)
> +    available_regs = available_regs - target_clobbered_regs;
> +
> +  /* If we have enough registers.  */
> +  if (regs_needed + target_res_regs < available_regs)
> +    cost = n_new;
> +  /* If close to running out of registers, try to preserve them.  */
> +  else if (regs_needed <= available_regs)
> +    cost = target_reg_cost [speed] * regs_needed;
> +  /* If we run out of available registers but the number of candidates
> +     does not, we penalize extra registers using target_spill_cost.  */
> +  else if (n_cands <= available_regs)
> +    cost = target_reg_cost [speed] * available_regs
> +	   + target_spill_cost [speed] * (regs_needed - available_regs);
> +  /* If the number of candidates runs out available registers, we penalize
> +     extra candidate registers using target_spill_cost * 2.  Because it is
> +     more expensive to spill induction variable than invariant.  */
> +  else
> +    cost = target_reg_cost [speed] * available_regs
> +	   + target_spill_cost [speed] * (n_cands - available_regs) * 2
> +	   + target_spill_cost [speed] * (regs_needed - n_cands);
> +
> +  /* Finally, add the number of candidates, so that we prefer eliminating
> +     induction variables if possible.  */
> +  return cost + n_cands;

It looks like the return is mixing units.  Would it work to return
a <cost, n_cands> pair instead, and use lexicographical ordering?

Thanks,
Richard

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH GCC8][29/33]New register pressure estimation
  2017-05-17 12:44     ` Richard Sandiford
@ 2017-05-18 10:25       ` Bin.Cheng
  2017-05-18 10:52         ` Richard Sandiford
  0 siblings, 1 reply; 10+ messages in thread
From: Bin.Cheng @ 2017-05-18 10:25 UTC (permalink / raw)
  To: Bin.Cheng, Richard Biener, gcc-patches, Richard Sandiford

On Wed, May 17, 2017 at 1:37 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> "Bin.Cheng" <amker.cheng@gmail.com> writes:
>> -/* Calculates cost for having N_REGS registers.  This number includes
>> -   induction variables, invariant variables and invariant expressions.  */
>> +/* Estimate register pressure for loop having N_INVS invariants and N_CANDS
>> +   induction variables.  Note N_INVS includes both invariant variables and
>> +   invariant expressions.  */
>>
>>  static unsigned
>> -ivopts_global_cost_for_size (struct ivopts_data *data, unsigned n_regs)
>> +ivopts_estimate_reg_pressure (struct ivopts_data *data, unsigned n_invs,
>> +                           unsigned n_cands)
>>  {
>> -  unsigned cost = estimate_reg_pressure_cost (n_regs,
>> -                                           data->regs_used, data->speed,
>> -                                           data->body_includes_call);
>> -  /* Add n_regs to the cost, so that we prefer eliminating ivs if possible.  */
>> -  return n_regs + cost;
>> +  unsigned cost;
>> +  unsigned n_old = data->regs_used, n_new = n_invs + n_cands;
>> +  unsigned regs_needed = n_new + n_old, available_regs = target_avail_regs;
>> +  bool speed = data->speed;
>> +
>> +  /* If there is a call in the loop body, the call-clobbered registers
>> +     are not available for loop invariants.  */
>> +  if (data->body_includes_call)
>> +    available_regs = available_regs - target_clobbered_regs;
>> +
>> +  /* If we have enough registers.  */
>> +  if (regs_needed + target_res_regs < available_regs)
>> +    cost = n_new;
>> +  /* If close to running out of registers, try to preserve them.  */
>> +  else if (regs_needed <= available_regs)
>> +    cost = target_reg_cost [speed] * regs_needed;
>> +  /* If we run out of available registers but the number of candidates
>> +     does not, we penalize extra registers using target_spill_cost.  */
>> +  else if (n_cands <= available_regs)
>> +    cost = target_reg_cost [speed] * available_regs
>> +        + target_spill_cost [speed] * (regs_needed - available_regs);
>> +  /* If the number of candidates runs out available registers, we penalize
>> +     extra candidate registers using target_spill_cost * 2.  Because it is
>> +     more expensive to spill induction variable than invariant.  */
>> +  else
>> +    cost = target_reg_cost [speed] * available_regs
>> +        + target_spill_cost [speed] * (n_cands - available_regs) * 2
>> +        + target_spill_cost [speed] * (regs_needed - n_cands);
>> +
>> +  /* Finally, add the number of candidates, so that we prefer eliminating
>> +     induction variables if possible.  */
>> +  return cost + n_cands;
>
> It looks like the return is mixing units.  Would it work to return
> a <cost, n_cands> pair instead, and use lexicographical ordering?
Hi Richard,
It just penalizes the cost by the number of candidates, rather than
returns n_cands to caller.  Actually that information is available all
the time in ivopts_data structure.

Thanks,
bin
>
> Thanks,
> Richard

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH GCC8][29/33]New register pressure estimation
  2017-05-18 10:25       ` Bin.Cheng
@ 2017-05-18 10:52         ` Richard Sandiford
  2017-05-18 11:10           ` Bin.Cheng
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Sandiford @ 2017-05-18 10:52 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: Richard Biener, gcc-patches

"Bin.Cheng" <amker.cheng@gmail.com> writes:
> On Wed, May 17, 2017 at 1:37 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> "Bin.Cheng" <amker.cheng@gmail.com> writes:
>>> -/* Calculates cost for having N_REGS registers.  This number includes
>>> -   induction variables, invariant variables and invariant expressions.  */
>>> +/* Estimate register pressure for loop having N_INVS invariants and N_CANDS
>>> +   induction variables.  Note N_INVS includes both invariant variables and
>>> +   invariant expressions.  */
>>>
>>>  static unsigned
>>> -ivopts_global_cost_for_size (struct ivopts_data *data, unsigned n_regs)
>>> +ivopts_estimate_reg_pressure (struct ivopts_data *data, unsigned n_invs,
>>> +                           unsigned n_cands)
>>>  {
>>> -  unsigned cost = estimate_reg_pressure_cost (n_regs,
>>> -                                           data->regs_used, data->speed,
>>> -                                           data->body_includes_call);
>>> - /* Add n_regs to the cost, so that we prefer eliminating ivs if
>>> possible.  */
>>> -  return n_regs + cost;
>>> +  unsigned cost;
>>> +  unsigned n_old = data->regs_used, n_new = n_invs + n_cands;
>>> +  unsigned regs_needed = n_new + n_old, available_regs = target_avail_regs;
>>> +  bool speed = data->speed;
>>> +
>>> +  /* If there is a call in the loop body, the call-clobbered registers
>>> +     are not available for loop invariants.  */
>>> +  if (data->body_includes_call)
>>> +    available_regs = available_regs - target_clobbered_regs;
>>> +
>>> +  /* If we have enough registers.  */
>>> +  if (regs_needed + target_res_regs < available_regs)
>>> +    cost = n_new;
>>> +  /* If close to running out of registers, try to preserve them.  */
>>> +  else if (regs_needed <= available_regs)
>>> +    cost = target_reg_cost [speed] * regs_needed;
>>> +  /* If we run out of available registers but the number of candidates
>>> +     does not, we penalize extra registers using target_spill_cost.  */
>>> +  else if (n_cands <= available_regs)
>>> +    cost = target_reg_cost [speed] * available_regs
>>> +        + target_spill_cost [speed] * (regs_needed - available_regs);
>>> +  /* If the number of candidates runs out available registers, we penalize
>>> +     extra candidate registers using target_spill_cost * 2.  Because it is
>>> +     more expensive to spill induction variable than invariant.  */
>>> +  else
>>> +    cost = target_reg_cost [speed] * available_regs
>>> +        + target_spill_cost [speed] * (n_cands - available_regs) * 2
>>> +        + target_spill_cost [speed] * (regs_needed - n_cands);
>>> +
>>> +  /* Finally, add the number of candidates, so that we prefer eliminating
>>> +     induction variables if possible.  */
>>> +  return cost + n_cands;
>>
>> It looks like the return is mixing units.  Would it work to return
>> a <cost, n_cands> pair instead, and use lexicographical ordering?
> Hi Richard,
> It just penalizes the cost by the number of candidates, rather than
> returns n_cands to caller.  Actually that information is available all
> the time in ivopts_data structure.

Yeah, but what I meant was: "cost" seems to be measured in abstract
instruction units, while "n_cands" counts a number of variables.
It doesn't seem to make sense to add them together.

If the idea is to use n_cands as a tie-breaker when the instruction
costs are the same, then lexicographical ordering of pairs would give
that without mixing the units.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH GCC8][29/33]New register pressure estimation
  2017-05-18 10:52         ` Richard Sandiford
@ 2017-05-18 11:10           ` Bin.Cheng
  0 siblings, 0 replies; 10+ messages in thread
From: Bin.Cheng @ 2017-05-18 11:10 UTC (permalink / raw)
  To: Bin.Cheng, Richard Biener, gcc-patches, Richard Sandiford

On Thu, May 18, 2017 at 11:41 AM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> "Bin.Cheng" <amker.cheng@gmail.com> writes:
>> On Wed, May 17, 2017 at 1:37 PM, Richard Sandiford
>> <richard.sandiford@linaro.org> wrote:
>>> "Bin.Cheng" <amker.cheng@gmail.com> writes:
>>>> -/* Calculates cost for having N_REGS registers.  This number includes
>>>> -   induction variables, invariant variables and invariant expressions.  */
>>>> +/* Estimate register pressure for loop having N_INVS invariants and N_CANDS
>>>> +   induction variables.  Note N_INVS includes both invariant variables and
>>>> +   invariant expressions.  */
>>>>
>>>>  static unsigned
>>>> -ivopts_global_cost_for_size (struct ivopts_data *data, unsigned n_regs)
>>>> +ivopts_estimate_reg_pressure (struct ivopts_data *data, unsigned n_invs,
>>>> +                           unsigned n_cands)
>>>>  {
>>>> -  unsigned cost = estimate_reg_pressure_cost (n_regs,
>>>> -                                           data->regs_used, data->speed,
>>>> -                                           data->body_includes_call);
>>>> - /* Add n_regs to the cost, so that we prefer eliminating ivs if
>>>> possible.  */
>>>> -  return n_regs + cost;
>>>> +  unsigned cost;
>>>> +  unsigned n_old = data->regs_used, n_new = n_invs + n_cands;
>>>> +  unsigned regs_needed = n_new + n_old, available_regs = target_avail_regs;
>>>> +  bool speed = data->speed;
>>>> +
>>>> +  /* If there is a call in the loop body, the call-clobbered registers
>>>> +     are not available for loop invariants.  */
>>>> +  if (data->body_includes_call)
>>>> +    available_regs = available_regs - target_clobbered_regs;
>>>> +
>>>> +  /* If we have enough registers.  */
>>>> +  if (regs_needed + target_res_regs < available_regs)
>>>> +    cost = n_new;
>>>> +  /* If close to running out of registers, try to preserve them.  */
>>>> +  else if (regs_needed <= available_regs)
>>>> +    cost = target_reg_cost [speed] * regs_needed;
>>>> +  /* If we run out of available registers but the number of candidates
>>>> +     does not, we penalize extra registers using target_spill_cost.  */
>>>> +  else if (n_cands <= available_regs)
>>>> +    cost = target_reg_cost [speed] * available_regs
>>>> +        + target_spill_cost [speed] * (regs_needed - available_regs);
>>>> +  /* If the number of candidates runs out available registers, we penalize
>>>> +     extra candidate registers using target_spill_cost * 2.  Because it is
>>>> +     more expensive to spill induction variable than invariant.  */
>>>> +  else
>>>> +    cost = target_reg_cost [speed] * available_regs
>>>> +        + target_spill_cost [speed] * (n_cands - available_regs) * 2
>>>> +        + target_spill_cost [speed] * (regs_needed - n_cands);
>>>> +
>>>> +  /* Finally, add the number of candidates, so that we prefer eliminating
>>>> +     induction variables if possible.  */
>>>> +  return cost + n_cands;
>>>
>>> It looks like the return is mixing units.  Would it work to return
>>> a <cost, n_cands> pair instead, and use lexicographical ordering?
>> Hi Richard,
>> It just penalizes the cost by the number of candidates, rather than
>> returns n_cands to caller.  Actually that information is available all
>> the time in ivopts_data structure.
>
> Yeah, but what I meant was: "cost" seems to be measured in abstract
> instruction units, while "n_cands" counts a number of variables.
> It doesn't seem to make sense to add them together.
Yes, unfortunately GCC adds up different costs and compares it together.
>
> If the idea is to use n_cands as a tie-breaker when the instruction
We don't have such cost model which different costs could be
applied/compared separately, also we don't have obvious break
conditions.  Sometimes the "instruction cost" is lower but results in
worse code because register cost is higher; but sometimes the opposite
holds.  LLVM has such separated cost models I guess?

Thanks,
bin
> costs are the same, then lexicographical ordering of pairs would give
> that without mixing the units.
>
> Thanks,
> Richard

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH GCC8][29/33]New register pressure estimation
  2017-05-17 12:27     ` Richard Biener
@ 2017-06-07 11:02       ` Bin.Cheng
  0 siblings, 0 replies; 10+ messages in thread
From: Bin.Cheng @ 2017-06-07 11:02 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

On Wed, May 17, 2017 at 1:24 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Mon, May 15, 2017 at 5:50 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>> On Thu, May 11, 2017 at 11:39 AM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>> On Tue, Apr 18, 2017 at 12:53 PM, Bin Cheng <Bin.Cheng@arm.com> wrote:
>>>> Hi,
>>>> Currently IVOPTs shares the same register pressure computation with RTL loop invariant pass,
>>>> which doesn't work very well.  This patch introduces specific interface for IVOPTs.
>>>> The general idea is described in the cover message as below:
>>>>   C) Current implementation shares the same register pressure computation with RTL loop
>>>>      inv pass.  It has difficulty in handling (especially large) loop nest, and quite
>>>>      often generating too many candidates (especially for outer loops).  This change
>>>>      introduces new register pressure computation.  The brief idea is to differentiate
>>>>      (hot) innermost loop and outer loop.  for (possibly hot) inner most, more registers
>>>>      are allowed as long as the register pressure is within the range of number of target
>>>>      available registers.
>>>> It can also help to restrict number of candidates for outer loop.
>>>> Is it OK?
>>>
>>> +/* Determine if current loop is the innermost loop and maybe hot.  */
>>> +
>>> +static void
>>> +determine_hot_innermost_loop (struct ivopts_data *data)
>>> +{
>>> +  data->hot_innermost_loop_p = true;
>>> +  if (!data->speed)
>>> +    return;
>>>
>>> err, so when not optimizing for speed we assume all loops (even not innermost)
>>> are hot and innermost?!
>>>
>>> +  HOST_WIDE_INT niter = avg_loop_niter (loop);
>>> +  if (niter < PARAM_VALUE (PARAM_AVG_LOOP_NITER)
>>> +      || loop_constraint_set_p (loop, LOOP_C_PROLOG)
>>> +      || loop_constraint_set_p (loop, LOOP_C_EPILOG)
>>> +      || loop_constraint_set_p (loop, LOOP_C_VERSION))
>>> +    data->hot_innermost_loop_p = false;
>>>
>>> this needs adjustment for the constraint patch removal.  Also looking at niter
>>> of the loop in question insn't a good metric for hotness.  data->speed is the
>>> best guess you get I think (optimize_loop_for_speed_p).
>>>
>>>    data->speed = optimize_loop_for_speed_p (loop);
>>> +  determine_hot_innermost_loop (data);
>>>
>>>   data->hot_innermost_loop_p = determine_hot_innermost_loop (data);
>>>
>>> would be more consistent here.
>> Hi,
>> I removed the hot innermost part and here is the updated version.  Is it OK?
>
> Ok.
Sorry for the long time delay, I committed the patch @r248954

Thanks,
bin
>
>> Thanks,
>> bin
>>
>> 2017-05-11  Bin Cheng  <bin.cheng@arm.com>
>>
>>     * tree-ssa-loop-ivopts.c (ivopts_estimate_reg_pressure): New
>>     reg_pressure model function.
>>     (ivopts_global_cost_for_size): Delete.
>>     (determine_set_costs, iv_ca_recount_cost): Call new model function
>>     ivopts_estimate_reg_pressure.
>>
>>>
>>> Thanks,
>>> Richard.
>>>
>>>> Thanks,
>>>> bin
>>>> 2017-04-11  Bin Cheng  <bin.cheng@arm.com>
>>>>
>>>>         * tree-ssa-loop-ivopts.c (struct ivopts_data): New field.
>>>>         (ivopts_estimate_reg_pressure): New reg_pressure model function.
>>>>         (ivopts_global_cost_for_size): Delete.
>>>>         (determine_set_costs, iv_ca_recount_cost): Call new model function
>>>>         ivopts_estimate_reg_pressure.
>>>>         (determine_hot_innermost_loop): New.
>>>>         (tree_ssa_iv_optimize_loop): Call above function.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-06-07 11:02 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-18 10:53 [PATCH GCC8][29/33]New register pressure estimation Bin Cheng
2017-05-11 10:53 ` Richard Biener
2017-05-11 10:54   ` Bin.Cheng
2017-05-15 15:56   ` Bin.Cheng
2017-05-17 12:27     ` Richard Biener
2017-06-07 11:02       ` Bin.Cheng
2017-05-17 12:44     ` Richard Sandiford
2017-05-18 10:25       ` Bin.Cheng
2017-05-18 10:52         ` Richard Sandiford
2017-05-18 11:10           ` Bin.Cheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).