[PATCH][haifa-sched] model load/store multiples properly in autoprefetcher scheduling

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH][haifa-sched] model load/store multiples properly in autoprefetcher scheduling
@ 2015-10-15  9:40 Kyrill Tkachov
  2015-10-15 10:16 ` Bernd Schmidt
  0 siblings, 1 reply; 5+ messages in thread
From: Kyrill Tkachov @ 2015-10-15  9:40 UTC (permalink / raw)
  To: GCC Patches; +Cc: Maxim Kuvyrkov

[-- Attachment #1: Type: text/plain, Size: 1897 bytes --]

Hi all,

I'd like to turn on the scheduling-for-autoprefetching heuristic from haifa-sched for aarch64.
This will have the effect of sorting sequences of loads and stores in ascending order of offsets from a common base.
However, there is a limitation with the current code that I'd like to remove first.

The code that analyzes the offsets of the loads/stores doesn't try to handle load/store-multiple insns.
These appear rather frequently in memory streaming workloads on aarch64 in the form of load-pair/store-pair instructions
i.e. ldp/stp.  In RTL, they are created by the sched_fusion pass + a subsequent peephole and during sched2 they appear
as PARALLEL rtxes of multiple SETs to/from memory.

This patch teaches autopref_multipass_init to handle these kinds of PARALLels as long as the SETs within them are all loads or all stores
and all use the same base register.  We now record the minimum and maximum offsets and use those when ranking pairs of insns.
The exact ranking logic is described in the new function autopref_rank_data.

For aarch64 this allows us to handle load/store pair instructions, for arm we now take into account the equivalent ldrd/strd instructions
and also other load/store-multiple instructions.

Bootstrapped and tested on arm, aarch64, x86_64.
This code is currently only enabled for arm for some cores, so it doesn't affect other platforms.

What do people think?

2015-10-15  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * sched-int.h (struct autopref_multipass_data_): Remove offset
     field.  Add min_offset, max_offset, multi_mem_insn_p fields.
     * haifa-sched.c (analyze_set_insn_for_autopref): New function.
     (autopref_multipass_init): Use it.  Handle PARALLEL sets.
     (autopref_rank_data): New function.
     (autopref_rank_for_schedule): Use it.
     (autopref_multipass_dfa_lookahead_guard_1): Likewise.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: haifa-sched-ldp.patch --]
[-- Type: text/x-patch; name=haifa-sched-ldp.patch, Size: 8026 bytes --]

commit a529ccd6a4e07463a40c7dbda10e3a090b0c06d3
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Tue Sep 29 16:58:05 2015 +0100

    model load/store pairs properly in autoprefetcher scheduling

diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c
index c35d777..223c72c 100644
--- a/gcc/haifa-sched.c
+++ b/gcc/haifa-sched.c
@@ -5533,6 +5533,35 @@ insn_finishes_cycle_p (rtx_insn *insn)
   return false;
 }
 
+/* Helper for autopref_multipass_init.  Given a SET insn in PAT and whether
+   we're expecting a memory WRITE or not, check that the insn is relevant to
+   the autoprefetcher modelling code.  Return true iff that is the case.
+   If it is relevant record the base register of the memory op in BASE and
+   the offset in OFFSET.  */
+
+static bool
+analyze_set_insn_for_autopref (rtx pat, int write, rtx *base, int *offset)
+{
+  if (GET_CODE (pat) != SET)
+    return false;
+
+  rtx mem = write ? SET_DEST (pat) : SET_SRC (pat);
+  if (!MEM_P (mem))
+    return false;
+
+  struct address_info info;
+  decompose_mem_address (&info, mem);
+
+  /* TODO: Currently only (base+const) addressing is supported.  */
+  if (info.base == NULL || !REG_P (*info.base)
+      || (info.disp != NULL && !CONST_INT_P (*info.disp)))
+    return false;
+
+  *base = *info.base;
+  *offset = info.disp ? INTVAL (*info.disp) : 0;
+  return true;
+}
+
 /* Functions to model cache auto-prefetcher.
 
    Some of the CPUs have cache auto-prefetcher, which /seems/ to initiate
@@ -5557,30 +5586,139 @@ autopref_multipass_init (const rtx_insn *insn, int write)
 
   gcc_assert (data->status == AUTOPREF_MULTIPASS_DATA_UNINITIALIZED);
   data->base = NULL_RTX;
-  data->offset = 0;
+  data->min_offset = 0;
+  data->max_offset = 0;
+  data->multi_mem_insn_p = false;
   /* Set insn entry initialized, but not relevant for auto-prefetcher.  */
   data->status = AUTOPREF_MULTIPASS_DATA_IRRELEVANT;
 
+  rtx pat = PATTERN (insn);
+
+  /* We have a multi-set insn like a load-multiple or store-multiple.
+     We care about these as long as all the memory ops inside the PARALLEL
+     have the same base register.  We care about the minimum and maximum
+     offsets from that base but don't check for the order of those offsets
+     within the PARALLEL insn itself.  */
+  if (GET_CODE (pat) == PARALLEL)
+    {
+      int n_elems = XVECLEN (pat, 0);
+
+      int i = 0;
+      rtx prev_base = NULL_RTX;
+      int min_offset;
+      int max_offset;
+
+      for (i = 0; i < n_elems; i++)
+	{
+	  rtx set = XVECEXP (pat, 0, i);
+	  if (GET_CODE (set) != SET)
+	    return;
+
+	  rtx base = NULL_RTX;
+	  int offset = 0;
+	  if (!analyze_set_insn_for_autopref (set, write, &base, &offset))
+	    return;
+
+	  if (i == 0)
+	    {
+	      prev_base = base;
+	      min_offset = offset;
+	      max_offset = offset;
+	    }
+	  /* Ensure that all memory operations in the PARALLEL use the same
+	     base register.  */
+	  else if (REGNO (base) != REGNO (prev_base))
+	    return;
+	  else
+	    {
+	      min_offset = MIN (min_offset, offset);
+	      max_offset = MAX (max_offset, offset);
+	    }
+	}
+
+      /* If we reached here then we have a valid PARALLEL of multiple memory
+	 ops with prev_base as the base and min_offset and max_offset
+	 containing the offsets range.  */
+      gcc_assert (prev_base);
+      data->base = prev_base;
+      data->min_offset = min_offset;
+      data->max_offset = max_offset;
+      data->multi_mem_insn_p = true;
+      data->status = AUTOPREF_MULTIPASS_DATA_NORMAL;
+
+      return;
+    }
+
+  /* Otherwise this is a single set memory operation.  */
   rtx set = single_set (insn);
   if (set == NULL_RTX)
     return;
 
-  rtx mem = write ? SET_DEST (set) : SET_SRC (set);
-  if (!MEM_P (mem))
+  if (!analyze_set_insn_for_autopref (set, write, &data->base,
+				       &data->min_offset))
     return;
 
-  struct address_info info;
-  decompose_mem_address (&info, mem);
+  /* This insn is relevant for auto-prefetcher.
+     The base and offset fields will have been filled in the
+     analyze_set_insn_for_autopref call above.  */
+  data->status = AUTOPREF_MULTIPASS_DATA_NORMAL;
+}
 
-  /* TODO: Currently only (base+const) addressing is supported.  */
-  if (info.base == NULL || !REG_P (*info.base)
-      || (info.disp != NULL && !CONST_INT_P (*info.disp)))
-    return;
 
-  /* This insn is relevant for auto-prefetcher.  */
-  data->base = *info.base;
-  data->offset = info.disp ? INTVAL (*info.disp) : 0;
-  data->status = AUTOPREF_MULTIPASS_DATA_NORMAL;
+/* Helper for autopref_rank_for_schedule.  Given the data of two
+   insns relevant to the auto-prefetcher modelling code DATA1 and DATA2
+   return their comparison result.  Return 0 if there is no sensible
+   ranking order for the two insns.  */
+
+static int
+autopref_rank_data (autopref_multipass_data_t data1,
+		     autopref_multipass_data_t data2)
+{
+  /* Simple case when both insns are simple single memory ops.  */
+  if (!data1->multi_mem_insn_p && !data2->multi_mem_insn_p)
+    return data1->min_offset - data2->min_offset;
+
+  /* Two load/store multiple insns.  Return 0 if the offset ranges
+     overlap and the difference between the minimum offsets otherwise.  */
+  else if (data1->multi_mem_insn_p && data2->multi_mem_insn_p)
+    {
+      int min1 = data1->min_offset;
+      int max1 = data1->max_offset;
+      int min2 = data2->min_offset;
+      int max2 = data2->max_offset;
+
+      if (max1 < min2 || min1 > max2)
+	return min1 - min2;
+      else
+	return 0;
+    }
+
+  /* The other two cases is a pair of a load/store multiple and
+     a simple memory op.  Return 0 if the single op's offset is within the
+     range of the multi-op insn and the difference between the single offset
+     and the minimum offset of the multi-set insn otherwise.  */
+  else if (data1->multi_mem_insn_p && !data2->multi_mem_insn_p)
+    {
+      int max1 = data1->max_offset;
+      int min1 = data1->min_offset;
+
+      if (data2->min_offset >= min1
+	  && data2->min_offset <= max1)
+	return 0;
+      else
+	return min1 - data2->min_offset;
+    }
+  else
+    {
+      int max2 = data2->max_offset;
+      int min2 = data2->min_offset;
+
+      if (data1->min_offset >= min2
+	  && data1->min_offset <= max2)
+	return 0;
+      else
+	return data1->min_offset - min2;
+    }
 }
 
 /* Helper function for rank_for_schedule sorting.  */
@@ -5607,7 +5745,7 @@ autopref_rank_for_schedule (const rtx_insn *insn1, const rtx_insn *insn2)
       if (!rtx_equal_p (data1->base, data2->base))
 	continue;
 
-      return data1->offset - data2->offset;
+      return autopref_rank_data (data1, data2);
     }
 
   return 0;
@@ -5632,8 +5770,9 @@ autopref_multipass_dfa_lookahead_guard_1 (const rtx_insn *insn1,
   if (data2->status == AUTOPREF_MULTIPASS_DATA_IRRELEVANT)
     return 0;
 
-  if (rtx_equal_p (data1->base, data2->base)
-      && data1->offset > data2->offset)
+  bool delay_p = rtx_equal_p (data1->base, data2->base)
+		  && autopref_rank_data (data1, data2) > 0;
+  if (delay_p)
     {
       if (sched_verbose >= 2)
 	{
diff --git a/gcc/sched-int.h b/gcc/sched-int.h
index 800262c..12fa93c 100644
--- a/gcc/sched-int.h
+++ b/gcc/sched-int.h
@@ -807,8 +807,17 @@ struct autopref_multipass_data_
 {
   /* Base part of memory address.  */
   rtx base;
-  /* Memory offset.  */
-  int offset;
+
+  /* Memory offsets from the base.  For single simple sets
+     only min_offset is valid.  For multi-set insns min_offset
+     and max_offset record the minimum and maximum offsets from the same
+     base among the sets inside the PARALLEL.  */
+  int min_offset;
+  int max_offset;
+
+  /* Is this a load/store-multiple instruction.  */
+  bool multi_mem_insn_p;
+
   /* Entry status.  */
   enum autopref_multipass_data_status status;
 };

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH][haifa-sched] model load/store multiples properly in autoprefetcher scheduling
  2015-10-15  9:40 [PATCH][haifa-sched] model load/store multiples properly in autoprefetcher scheduling Kyrill Tkachov
@ 2015-10-15 10:16 ` Bernd Schmidt
  2015-10-15 15:27   ` Kyrill Tkachov
  0 siblings, 1 reply; 5+ messages in thread
From: Bernd Schmidt @ 2015-10-15 10:16 UTC (permalink / raw)
  To: Kyrill Tkachov, GCC Patches; +Cc: Maxim Kuvyrkov

On 10/15/2015 11:40 AM, Kyrill Tkachov wrote:
> The code that analyzes the offsets of the loads/stores doesn't try to
> handle load/store-multiple insns.
> These appear rather frequently in memory streaming workloads on aarch64
> in the form of load-pair/store-pair instructions
> i.e. ldp/stp.  In RTL, they are created by the sched_fusion pass + a
> subsequent peephole and during sched2 they appear
> as PARALLEL rtxes of multiple SETs to/from memory.
>

>      * sched-int.h (struct autopref_multipass_data_): Remove offset
>      field.  Add min_offset, max_offset, multi_mem_insn_p fields.
>      * haifa-sched.c (analyze_set_insn_for_autopref): New function.
>      (autopref_multipass_init): Use it.  Handle PARALLEL sets.
>      (autopref_rank_data): New function.
>      (autopref_rank_for_schedule): Use it.
>      (autopref_multipass_dfa_lookahead_guard_1): Likewise.

Looks pretty reasonable to me. Ok to commit with a few changes next 
Wednesday unless you hear from Vlad in the meantime (I just want to give 
him time to look at it).

> +/* Helper for autopref_multipass_init.  Given a SET insn in PAT and whether
> +   we're expecting a memory WRITE or not, check that the insn is relevant to
> +   the autoprefetcher modelling code.  Return true iff that is the case.
> +   If it is relevant record the base register of the memory op in BASE and
> +   the offset in OFFSET.  */

Comma before "record". I think I'd just use "SET" rather than "SET 
insn", because in my mind an insn is always more than just a (part of a) 
pattern.

> +static bool
> +analyze_set_insn_for_autopref (rtx pat, int write, rtx *base, int *offset)

bool write?

> +  /* This insn is relevant for auto-prefetcher.

"the auto-prefetcher".

>     if (data2->status == AUTOPREF_MULTIPASS_DATA_IRRELEVANT)
>       return 0;
>
> -  if (rtx_equal_p (data1->base, data2->base)
> -      && data1->offset > data2->offset)
> +  bool delay_p = rtx_equal_p (data1->base, data2->base)
> +		  && autopref_rank_data (data1, data2) > 0;
> +  if (delay_p)

If you want to do this you still need parentheses around the multi-line 
expression. I'd just keep it inside the if condition.

> +  /* Is this a load/store-multiple instruction.  */
> +  bool multi_mem_insn_p;

Maybe write "True if this is a ..."


Bernd

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH][haifa-sched] model load/store multiples properly in autoprefetcher scheduling
  2015-10-15 10:16 ` Bernd Schmidt
@ 2015-10-15 15:27   ` Kyrill Tkachov
  2015-10-16  3:55     ` Vladimir Makarov
  0 siblings, 1 reply; 5+ messages in thread
From: Kyrill Tkachov @ 2015-10-15 15:27 UTC (permalink / raw)
  To: Bernd Schmidt, GCC Patches; +Cc: Maxim Kuvyrkov, Vladimir Makarov

[-- Attachment #1: Type: text/plain, Size: 3082 bytes --]


On 15/10/15 11:16, Bernd Schmidt wrote:
> On 10/15/2015 11:40 AM, Kyrill Tkachov wrote:
>> The code that analyzes the offsets of the loads/stores doesn't try to
>> handle load/store-multiple insns.
>> These appear rather frequently in memory streaming workloads on aarch64
>> in the form of load-pair/store-pair instructions
>> i.e. ldp/stp.  In RTL, they are created by the sched_fusion pass + a
>> subsequent peephole and during sched2 they appear
>> as PARALLEL rtxes of multiple SETs to/from memory.
>>
>
>>      * sched-int.h (struct autopref_multipass_data_): Remove offset
>>      field.  Add min_offset, max_offset, multi_mem_insn_p fields.
>>      * haifa-sched.c (analyze_set_insn_for_autopref): New function.
>>      (autopref_multipass_init): Use it.  Handle PARALLEL sets.
>>      (autopref_rank_data): New function.
>>      (autopref_rank_for_schedule): Use it.
>>      (autopref_multipass_dfa_lookahead_guard_1): Likewise.
>
> Looks pretty reasonable to me. Ok to commit with a few changes next Wednesday unless you hear from Vlad in the meantime (I just want to give him time to look at it).

Thanks, I'll wait as you suggested (and cc'ing Vlad).
In the meantime, here's the updated patch with the suggested changes for the record.

Kyrill

2015-10-15  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * sched-int.h (struct autopref_multipass_data_): Remove offset
     field.  Add min_offset, max_offset, multi_mem_insn_p fields.
     * haifa-sched.c (analyze_set_insn_for_autopref): New function.
     (autopref_multipass_init): Use it.  Handle PARALLEL sets.
     (autopref_rank_data): New function.
     (autopref_rank_for_schedule): Use it.
     (autopref_multipass_dfa_lookahead_guard_1): Likewise.

>
>> +/* Helper for autopref_multipass_init. Given a SET insn in PAT and whether
>> +   we're expecting a memory WRITE or not, check that the insn is relevant to
>> +   the autoprefetcher modelling code.  Return true iff that is the case.
>> +   If it is relevant record the base register of the memory op in BASE and
>> +   the offset in OFFSET.  */
>
> Comma before "record". I think I'd just use "SET" rather than "SET insn", because in my mind an insn is always more than just a (part of a) pattern.
>
>> +static bool
>> +analyze_set_insn_for_autopref (rtx pat, int write, rtx *base, int *offset)
>
> bool write?
>
>> +  /* This insn is relevant for auto-prefetcher.
>
> "the auto-prefetcher".
>
>>     if (data2->status == AUTOPREF_MULTIPASS_DATA_IRRELEVANT)
>>       return 0;
>>
>> -  if (rtx_equal_p (data1->base, data2->base)
>> -      && data1->offset > data2->offset)
>> +  bool delay_p = rtx_equal_p (data1->base, data2->base)
>> +          && autopref_rank_data (data1, data2) > 0;
>> +  if (delay_p)
>
> If you want to do this you still need parentheses around the multi-line expression. I'd just keep it inside the if condition.
>
>> +  /* Is this a load/store-multiple instruction.  */
>> +  bool multi_mem_insn_p;
>
> Maybe write "True if this is a ..."
>
>
> Bernd
>


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: haifa-sched-ldp.patch --]
[-- Type: text/x-patch; name=haifa-sched-ldp.patch, Size: 7901 bytes --]

commit e90bcc38961f2911f5ea158a415dcde807e3f551
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Tue Sep 29 16:58:05 2015 +0100

    model load/store pairs properly in autoprefetcher scheduling

diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c
index c35d777..46751fe 100644
--- a/gcc/haifa-sched.c
+++ b/gcc/haifa-sched.c
@@ -5533,6 +5533,35 @@ insn_finishes_cycle_p (rtx_insn *insn)
   return false;
 }
 
+/* Helper for autopref_multipass_init.  Given a SET in PAT and whether
+   we're expecting a memory WRITE or not, check that the insn is relevant to
+   the autoprefetcher modelling code.  Return true iff that is the case.
+   If it is relevant, record the base register of the memory op in BASE and
+   the offset in OFFSET.  */
+
+static bool
+analyze_set_insn_for_autopref (rtx pat, bool write, rtx *base, int *offset)
+{
+  if (GET_CODE (pat) != SET)
+    return false;
+
+  rtx mem = write ? SET_DEST (pat) : SET_SRC (pat);
+  if (!MEM_P (mem))
+    return false;
+
+  struct address_info info;
+  decompose_mem_address (&info, mem);
+
+  /* TODO: Currently only (base+const) addressing is supported.  */
+  if (info.base == NULL || !REG_P (*info.base)
+      || (info.disp != NULL && !CONST_INT_P (*info.disp)))
+    return false;
+
+  *base = *info.base;
+  *offset = info.disp ? INTVAL (*info.disp) : 0;
+  return true;
+}
+
 /* Functions to model cache auto-prefetcher.
 
    Some of the CPUs have cache auto-prefetcher, which /seems/ to initiate
@@ -5557,30 +5586,139 @@ autopref_multipass_init (const rtx_insn *insn, int write)
 
   gcc_assert (data->status == AUTOPREF_MULTIPASS_DATA_UNINITIALIZED);
   data->base = NULL_RTX;
-  data->offset = 0;
+  data->min_offset = 0;
+  data->max_offset = 0;
+  data->multi_mem_insn_p = false;
   /* Set insn entry initialized, but not relevant for auto-prefetcher.  */
   data->status = AUTOPREF_MULTIPASS_DATA_IRRELEVANT;
 
+  rtx pat = PATTERN (insn);
+
+  /* We have a multi-set insn like a load-multiple or store-multiple.
+     We care about these as long as all the memory ops inside the PARALLEL
+     have the same base register.  We care about the minimum and maximum
+     offsets from that base but don't check for the order of those offsets
+     within the PARALLEL insn itself.  */
+  if (GET_CODE (pat) == PARALLEL)
+    {
+      int n_elems = XVECLEN (pat, 0);
+
+      int i = 0;
+      rtx prev_base = NULL_RTX;
+      int min_offset;
+      int max_offset;
+
+      for (i = 0; i < n_elems; i++)
+	{
+	  rtx set = XVECEXP (pat, 0, i);
+	  if (GET_CODE (set) != SET)
+	    return;
+
+	  rtx base = NULL_RTX;
+	  int offset = 0;
+	  if (!analyze_set_insn_for_autopref (set, write, &base, &offset))
+	    return;
+
+	  if (i == 0)
+	    {
+	      prev_base = base;
+	      min_offset = offset;
+	      max_offset = offset;
+	    }
+	  /* Ensure that all memory operations in the PARALLEL use the same
+	     base register.  */
+	  else if (REGNO (base) != REGNO (prev_base))
+	    return;
+	  else
+	    {
+	      min_offset = MIN (min_offset, offset);
+	      max_offset = MAX (max_offset, offset);
+	    }
+	}
+
+      /* If we reached here then we have a valid PARALLEL of multiple memory
+	 ops with prev_base as the base and min_offset and max_offset
+	 containing the offsets range.  */
+      gcc_assert (prev_base);
+      data->base = prev_base;
+      data->min_offset = min_offset;
+      data->max_offset = max_offset;
+      data->multi_mem_insn_p = true;
+      data->status = AUTOPREF_MULTIPASS_DATA_NORMAL;
+
+      return;
+    }
+
+  /* Otherwise this is a single set memory operation.  */
   rtx set = single_set (insn);
   if (set == NULL_RTX)
     return;
 
-  rtx mem = write ? SET_DEST (set) : SET_SRC (set);
-  if (!MEM_P (mem))
+  if (!analyze_set_insn_for_autopref (set, write, &data->base,
+				       &data->min_offset))
     return;
 
-  struct address_info info;
-  decompose_mem_address (&info, mem);
+  /* This insn is relevant for the auto-prefetcher.
+     The base and offset fields will have been filled in the
+     analyze_set_insn_for_autopref call above.  */
+  data->status = AUTOPREF_MULTIPASS_DATA_NORMAL;
+}
 
-  /* TODO: Currently only (base+const) addressing is supported.  */
-  if (info.base == NULL || !REG_P (*info.base)
-      || (info.disp != NULL && !CONST_INT_P (*info.disp)))
-    return;
 
-  /* This insn is relevant for auto-prefetcher.  */
-  data->base = *info.base;
-  data->offset = info.disp ? INTVAL (*info.disp) : 0;
-  data->status = AUTOPREF_MULTIPASS_DATA_NORMAL;
+/* Helper for autopref_rank_for_schedule.  Given the data of two
+   insns relevant to the auto-prefetcher modelling code DATA1 and DATA2
+   return their comparison result.  Return 0 if there is no sensible
+   ranking order for the two insns.  */
+
+static int
+autopref_rank_data (autopref_multipass_data_t data1,
+		     autopref_multipass_data_t data2)
+{
+  /* Simple case when both insns are simple single memory ops.  */
+  if (!data1->multi_mem_insn_p && !data2->multi_mem_insn_p)
+    return data1->min_offset - data2->min_offset;
+
+  /* Two load/store multiple insns.  Return 0 if the offset ranges
+     overlap and the difference between the minimum offsets otherwise.  */
+  else if (data1->multi_mem_insn_p && data2->multi_mem_insn_p)
+    {
+      int min1 = data1->min_offset;
+      int max1 = data1->max_offset;
+      int min2 = data2->min_offset;
+      int max2 = data2->max_offset;
+
+      if (max1 < min2 || min1 > max2)
+	return min1 - min2;
+      else
+	return 0;
+    }
+
+  /* The other two cases is a pair of a load/store multiple and
+     a simple memory op.  Return 0 if the single op's offset is within the
+     range of the multi-op insn and the difference between the single offset
+     and the minimum offset of the multi-set insn otherwise.  */
+  else if (data1->multi_mem_insn_p && !data2->multi_mem_insn_p)
+    {
+      int max1 = data1->max_offset;
+      int min1 = data1->min_offset;
+
+      if (data2->min_offset >= min1
+	  && data2->min_offset <= max1)
+	return 0;
+      else
+	return min1 - data2->min_offset;
+    }
+  else
+    {
+      int max2 = data2->max_offset;
+      int min2 = data2->min_offset;
+
+      if (data1->min_offset >= min2
+	  && data1->min_offset <= max2)
+	return 0;
+      else
+	return data1->min_offset - min2;
+    }
 }
 
 /* Helper function for rank_for_schedule sorting.  */
@@ -5607,7 +5745,7 @@ autopref_rank_for_schedule (const rtx_insn *insn1, const rtx_insn *insn2)
       if (!rtx_equal_p (data1->base, data2->base))
 	continue;
 
-      return data1->offset - data2->offset;
+      return autopref_rank_data (data1, data2);
     }
 
   return 0;
@@ -5633,7 +5771,7 @@ autopref_multipass_dfa_lookahead_guard_1 (const rtx_insn *insn1,
     return 0;
 
   if (rtx_equal_p (data1->base, data2->base)
-      && data1->offset > data2->offset)
+      && autopref_rank_data (data1, data2) > 0)
     {
       if (sched_verbose >= 2)
 	{
diff --git a/gcc/sched-int.h b/gcc/sched-int.h
index 800262c..86f5821 100644
--- a/gcc/sched-int.h
+++ b/gcc/sched-int.h
@@ -807,8 +807,17 @@ struct autopref_multipass_data_
 {
   /* Base part of memory address.  */
   rtx base;
-  /* Memory offset.  */
-  int offset;
+
+  /* Memory offsets from the base.  For single simple sets
+     only min_offset is valid.  For multi-set insns min_offset
+     and max_offset record the minimum and maximum offsets from the same
+     base among the sets inside the PARALLEL.  */
+  int min_offset;
+  int max_offset;
+
+  /* True if this is a load/store-multiple instruction.  */
+  bool multi_mem_insn_p;
+
   /* Entry status.  */
   enum autopref_multipass_data_status status;
 };

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH][haifa-sched] model load/store multiples properly in autoprefetcher scheduling
  2015-10-15 15:27   ` Kyrill Tkachov
@ 2015-10-16  3:55     ` Vladimir Makarov
  2015-10-16 13:08       ` Kyrill Tkachov
  0 siblings, 1 reply; 5+ messages in thread
From: Vladimir Makarov @ 2015-10-16  3:55 UTC (permalink / raw)
  To: Kyrill Tkachov, Bernd Schmidt, GCC Patches; +Cc: Maxim Kuvyrkov

On 10/15/2015 11:27 AM, Kyrill Tkachov wrote:
>
> On 15/10/15 11:16, Bernd Schmidt wrote:
>> On 10/15/2015 11:40 AM, Kyrill Tkachov wrote:
>>> The code that analyzes the offsets of the loads/stores doesn't try to
>>> handle load/store-multiple insns.
>>> These appear rather frequently in memory streaming workloads on aarch64
>>> in the form of load-pair/store-pair instructions
>>> i.e. ldp/stp.  In RTL, they are created by the sched_fusion pass + a
>>> subsequent peephole and during sched2 they appear
>>> as PARALLEL rtxes of multiple SETs to/from memory.
>>>
>>
>>>      * sched-int.h (struct autopref_multipass_data_): Remove offset
>>>      field.  Add min_offset, max_offset, multi_mem_insn_p fields.
>>>      * haifa-sched.c (analyze_set_insn_for_autopref): New function.
>>>      (autopref_multipass_init): Use it.  Handle PARALLEL sets.
>>>      (autopref_rank_data): New function.
>>>      (autopref_rank_for_schedule): Use it.
>>>      (autopref_multipass_dfa_lookahead_guard_1): Likewise.
>>
>> Looks pretty reasonable to me. Ok to commit with a few changes next 
>> Wednesday unless you hear from Vlad in the meantime (I just want to 
>> give him time to look at it).
>
> Thanks, I'll wait as you suggested (and cc'ing Vlad).
> In the meantime, here's the updated patch with the suggested changes 
> for the record.
Ok for me.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH][haifa-sched] model load/store multiples properly in autoprefetcher scheduling
  2015-10-16  3:55     ` Vladimir Makarov
@ 2015-10-16 13:08       ` Kyrill Tkachov
  0 siblings, 0 replies; 5+ messages in thread
From: Kyrill Tkachov @ 2015-10-16 13:08 UTC (permalink / raw)
  To: Vladimir Makarov, Bernd Schmidt, GCC Patches; +Cc: Maxim Kuvyrkov


On 16/10/15 04:55, Vladimir Makarov wrote:
> On 10/15/2015 11:27 AM, Kyrill Tkachov wrote:
>>
>> On 15/10/15 11:16, Bernd Schmidt wrote:
>>> On 10/15/2015 11:40 AM, Kyrill Tkachov wrote:
>>>> The code that analyzes the offsets of the loads/stores doesn't try to
>>>> handle load/store-multiple insns.
>>>> These appear rather frequently in memory streaming workloads on aarch64
>>>> in the form of load-pair/store-pair instructions
>>>> i.e. ldp/stp.  In RTL, they are created by the sched_fusion pass + a
>>>> subsequent peephole and during sched2 they appear
>>>> as PARALLEL rtxes of multiple SETs to/from memory.
>>>>
>>>
>>>>      * sched-int.h (struct autopref_multipass_data_): Remove offset
>>>>      field.  Add min_offset, max_offset, multi_mem_insn_p fields.
>>>>      * haifa-sched.c (analyze_set_insn_for_autopref): New function.
>>>>      (autopref_multipass_init): Use it.  Handle PARALLEL sets.
>>>>      (autopref_rank_data): New function.
>>>>      (autopref_rank_for_schedule): Use it.
>>>>      (autopref_multipass_dfa_lookahead_guard_1): Likewise.
>>>
>>> Looks pretty reasonable to me. Ok to commit with a few changes next Wednesday unless you hear from Vlad in the meantime (I just want to give him time to look at it).
>>
>> Thanks, I'll wait as you suggested (and cc'ing Vlad).
>> In the meantime, here's the updated patch with the suggested changes for the record.
> Ok for me.
>

Thanks, I'll commit it on Monday then.
Cheers,
Kyrill

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-10-16 13:08 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-15  9:40 [PATCH][haifa-sched] model load/store multiples properly in autoprefetcher scheduling Kyrill Tkachov
2015-10-15 10:16 ` Bernd Schmidt
2015-10-15 15:27   ` Kyrill Tkachov
2015-10-16  3:55     ` Vladimir Makarov
2015-10-16 13:08       ` Kyrill Tkachov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).