public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Add a couple new options to control loop prefetch pass
@ 2018-01-22 13:46 Luis Machado
  2018-01-22 14:01 ` [PATCH 1/2] Introduce prefetch-minimum stride option Luis Machado
  2018-01-22 14:10 ` [PATCH 2/2] Introduce prefetch-dynamic-strides option Luis Machado
  0 siblings, 2 replies; 28+ messages in thread
From: Luis Machado @ 2018-01-22 13:46 UTC (permalink / raw)
  To: gcc-patches; +Cc: james.greenhalgh, Richard.Earnshaw

The following couple patches add options to better control the loop prefetch
pass.

With the current settings the pass tends to be very aggressive, issuing a lot
of prefetch hints. Most of these don't translate to better performance. Some
of these issued hints may even cause more cache evictions.

Luis Machado (2):
  Introduce prefetch-minimum stride option
  Introduce prefetch-dynamic-strides option.

 gcc/config/aarch64/aarch64-protos.h |  6 ++++++
 gcc/config/aarch64/aarch64.c        | 24 +++++++++++++++++++++++-
 gcc/doc/invoke.texi                 | 25 +++++++++++++++++++++++++
 gcc/params.def                      | 18 ++++++++++++++++++
 gcc/params.h                        |  4 ++++
 gcc/tree-ssa-loop-prefetch.c        | 26 ++++++++++++++++++++++++++
 6 files changed, 102 insertions(+), 1 deletion(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/2] Introduce prefetch-minimum stride option
  2018-01-22 13:46 [PATCH 0/2] Add a couple new options to control loop prefetch pass Luis Machado
@ 2018-01-22 14:01 ` Luis Machado
  2018-01-23  9:46   ` Kyrill Tkachov
                     ` (2 more replies)
  2018-01-22 14:10 ` [PATCH 2/2] Introduce prefetch-dynamic-strides option Luis Machado
  1 sibling, 3 replies; 28+ messages in thread
From: Luis Machado @ 2018-01-22 14:01 UTC (permalink / raw)
  To: gcc-patches; +Cc: james.greenhalgh, Richard.Earnshaw

This patch adds a new option to control the minimum stride, for a memory
reference, after which the loop prefetch pass may issue software prefetch
hints for. There are two motivations:

* Make the pass less aggressive, only issuing prefetch hints for bigger strides
that are more likely to benefit from prefetching. I've noticed a case in cpu2017
where we were issuing thousands of hints, for example.

* For processors that have a hardware prefetcher, like Falkor, it allows the
loop prefetch pass to defer prefetching of smaller (less than the threshold)
strides to the hardware prefetcher instead. This prevents conflicts between
the software prefetcher and the hardware prefetcher.

I've noticed considerable reduction in the number of prefetch hints and
slightly positive performance numbers. This aligns GCC and LLVM in terms of
prefetch behavior for Falkor.

The default settings should guarantee no changes for existing targets. Those
are free to tweak the settings as necessary.

No regressions in the testsuite and bootstrapped ok on aarch64-linux.

Ok?

2018-01-22  Luis Machado  <luis.machado@linaro.org>

	Introduce option to limit software prefetching to known constant
	strides above a specific threshold with the goal of preventing
	conflicts with a hardware prefetcher.

	gcc/
	* config/aarch64/aarch64-protos.h (cpu_prefetch_tune)
	<minimum_stride>: New const int field.
	* config/aarch64/aarch64.c (generic_prefetch_tune): Update to include
	minimum_stride field.
	(exynosm1_prefetch_tune): Likewise.
	(thunderxt88_prefetch_tune): Likewise.
	(thunderx_prefetch_tune): Likewise.
	(thunderx2t99_prefetch_tune): Likewise.
	(qdf24xx_prefetch_tune): Likewise. Set minimum_stride to 2048.
	(aarch64_override_options_internal): Update to set
	PARAM_PREFETCH_MINIMUM_STRIDE.
	* doc/invoke.texi (prefetch-minimum-stride): Document new option.
	* params.def (PARAM_PREFETCH_MINIMUM_STRIDE): New.
	* params.h (PARAM_PREFETCH_MINIMUM_STRIDE): Define.
	* tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Return false if
	stride is constant and is below the minimum stride threshold.
---
 gcc/config/aarch64/aarch64-protos.h |  3 +++
 gcc/config/aarch64/aarch64.c        | 13 ++++++++++++-
 gcc/doc/invoke.texi                 | 15 +++++++++++++++
 gcc/params.def                      |  9 +++++++++
 gcc/params.h                        |  2 ++
 gcc/tree-ssa-loop-prefetch.c        | 16 ++++++++++++++++
 6 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index ef1b0bc..8736bd9 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -230,6 +230,9 @@ struct cpu_prefetch_tune
   const int l1_cache_size;
   const int l1_cache_line_size;
   const int l2_cache_size;
+  /* The minimum constant stride beyond which we should use prefetch
+     hints for.  */
+  const int minimum_stride;
   const int default_opt_level;
 };
 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 174310c..0ed9f14 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -547,6 +547,7 @@ static const cpu_prefetch_tune generic_prefetch_tune =
   -1,			/* l1_cache_size  */
   -1,			/* l1_cache_line_size  */
   -1,			/* l2_cache_size  */
+  -1,			/* minimum_stride */
   -1			/* default_opt_level  */
 };
 
@@ -556,6 +557,7 @@ static const cpu_prefetch_tune exynosm1_prefetch_tune =
   -1,			/* l1_cache_size  */
   64,			/* l1_cache_line_size  */
   -1,			/* l2_cache_size  */
+  -1,			/* minimum_stride */
   -1			/* default_opt_level  */
 };
 
@@ -565,7 +567,8 @@ static const cpu_prefetch_tune qdf24xx_prefetch_tune =
   32,			/* l1_cache_size  */
   64,			/* l1_cache_line_size  */
   1024,			/* l2_cache_size  */
-  -1			/* default_opt_level  */
+  2048,			/* minimum_stride */
+  3			/* default_opt_level  */
 };
 
 static const cpu_prefetch_tune thunderxt88_prefetch_tune =
@@ -574,6 +577,7 @@ static const cpu_prefetch_tune thunderxt88_prefetch_tune =
   32,			/* l1_cache_size  */
   128,			/* l1_cache_line_size  */
   16*1024,		/* l2_cache_size  */
+  -1,			/* minimum_stride */
   3			/* default_opt_level  */
 };
 
@@ -583,6 +587,7 @@ static const cpu_prefetch_tune thunderx_prefetch_tune =
   32,			/* l1_cache_size  */
   128,			/* l1_cache_line_size  */
   -1,			/* l2_cache_size  */
+  -1,			/* minimum_stride */
   -1			/* default_opt_level  */
 };
 
@@ -592,6 +597,7 @@ static const cpu_prefetch_tune thunderx2t99_prefetch_tune =
   32,			/* l1_cache_size  */
   64,			/* l1_cache_line_size  */
   256,			/* l2_cache_size  */
+  -1,			/* minimum_stride */
   -1			/* default_opt_level  */
 };
 
@@ -10461,6 +10467,11 @@ aarch64_override_options_internal (struct gcc_options *opts)
 			   aarch64_tune_params.prefetch->l2_cache_size,
 			   opts->x_param_values,
 			   global_options_set.x_param_values);
+  if (aarch64_tune_params.prefetch->minimum_stride >= 0)
+    maybe_set_param_value (PARAM_PREFETCH_MINIMUM_STRIDE,
+			   aarch64_tune_params.prefetch->minimum_stride,
+			   opts->x_param_values,
+			   global_options_set.x_param_values);
 
   /* Use the alternative scheduling-pressure algorithm by default.  */
   maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, SCHED_PRESSURE_MODEL,
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 27c5974..1cb1ef5 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -10567,6 +10567,21 @@ The size of L1 cache, in kilobytes.
 @item l2-cache-size
 The size of L2 cache, in kilobytes.
 
+@item prefetch-minimum-stride
+Minimum constant stride, in bytes, to start using prefetch hints for.  If
+the stride is less than this threshold, prefetch hints will not be issued.
+
+This setting is useful for processors that have hardware prefetchers, in
+which case there may be conflicts between the hardware prefetchers and
+the software prefetchers.  If the hardware prefetchers have a maximum
+stride they can handle, it should be used here to improve the use of
+software prefetchers.
+
+A value of -1, the default, means we don't have a threshold and therefore
+prefetch hints can be issued for any constant stride.
+
+This setting is only useful for strides that are known and constant.
+
 @item loop-interchange-max-num-stmts
 The maximum number of stmts in a loop to be interchanged.
 
diff --git a/gcc/params.def b/gcc/params.def
index 930b318..bf2d12c 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -790,6 +790,15 @@ DEFPARAM (PARAM_L2_CACHE_SIZE,
 	  "The size of L2 cache.",
 	  512, 0, 0)
 
+/* The minimum constant stride beyond which we should use prefetch hints
+   for.  */
+
+DEFPARAM (PARAM_PREFETCH_MINIMUM_STRIDE,
+	  "prefetch-minimum-stride",
+	  "The minimum constant stride beyond which we should use prefetch "
+	  "hints for.",
+	  -1, 0, 0)
+
 /* Maximum number of statements in loop nest for loop interchange.  */
 
 DEFPARAM (PARAM_LOOP_INTERCHANGE_MAX_NUM_STMTS,
diff --git a/gcc/params.h b/gcc/params.h
index 98249d2..96012db 100644
--- a/gcc/params.h
+++ b/gcc/params.h
@@ -196,6 +196,8 @@ extern void init_param_values (int *params);
   PARAM_VALUE (PARAM_L1_CACHE_LINE_SIZE)
 #define L2_CACHE_SIZE \
   PARAM_VALUE (PARAM_L2_CACHE_SIZE)
+#define PREFETCH_MINIMUM_STRIDE \
+  PARAM_VALUE (PARAM_PREFETCH_MINIMUM_STRIDE)
 #define USE_CANONICAL_TYPES \
   PARAM_VALUE (PARAM_USE_CANONICAL_TYPES)
 #define IRA_MAX_LOOPS_NUM \
diff --git a/gcc/tree-ssa-loop-prefetch.c b/gcc/tree-ssa-loop-prefetch.c
index 2f10db1..112ccac 100644
--- a/gcc/tree-ssa-loop-prefetch.c
+++ b/gcc/tree-ssa-loop-prefetch.c
@@ -992,6 +992,22 @@ prune_by_reuse (struct mem_ref_group *groups)
 static bool
 should_issue_prefetch_p (struct mem_ref *ref)
 {
+  /* Some processors may have a hardware prefetcher that may conflict with
+     prefetch hints for a range of strides.  Make sure we don't issue
+     prefetches for such cases if the stride is within this particular
+     range.  */
+  if (cst_and_fits_in_hwi (ref->group->step)
+      && absu_hwi (int_cst_value (ref->group->step)) < PREFETCH_MINIMUM_STRIDE)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file,
+		 "Step for reference %u:%u (%d) is less than the mininum "
+		 " required stride of %d\n",
+		 ref->group->uid, ref->uid, int_cst_value (ref->group->step),
+		 PREFETCH_MINIMUM_STRIDE);
+      return false;
+    }
+
   /* For now do not issue prefetches for only first few of the
      iterations.  */
   if (ref->prefetch_before != PREFETCH_ALL)
-- 
2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 2/2] Introduce prefetch-dynamic-strides option.
  2018-01-22 13:46 [PATCH 0/2] Add a couple new options to control loop prefetch pass Luis Machado
  2018-01-22 14:01 ` [PATCH 1/2] Introduce prefetch-minimum stride option Luis Machado
@ 2018-01-22 14:10 ` Luis Machado
  2018-01-23  9:53   ` Kyrill Tkachov
  2018-05-01 18:31   ` Jeff Law
  1 sibling, 2 replies; 28+ messages in thread
From: Luis Machado @ 2018-01-22 14:10 UTC (permalink / raw)
  To: gcc-patches; +Cc: james.greenhalgh, Richard.Earnshaw

The following patch adds an option to control software prefetching of memory
references with non-constant/unknown strides.

Currently we prefetch these references if the pass thinks there is benefit to
doing so. But, since this is all based on heuristics, it's not always the case
that we end up with better performance.

For Falkor there is also the problem of conflicts with the hardware prefetcher,
so we need to be more conservative in terms of what we issue software prefetch
hints for.

This also aligns GCC with what LLVM does for Falkor.

Similarly to the previous patch, the defaults guarantee no change in behavior
for other targets and architectures.

I've regression-tested and bootstrapped it on aarch64-linux. No problems found.

Ok?

2018-01-22  Luis Machado  <luis.machado@linaro.org>

	Introduce option to control whether the software prefetch pass should
	issue prefetch hints for non-constant strides.

	gcc/
	* config/aarch64/aarch64-protos.h (cpu_prefetch_tune)
	<prefetch_dynamic_strides>: New const unsigned int field.
	* config/aarch64/aarch64.c (generic_prefetch_tune): Update to include
	prefetch_dynamic_strides.
	(exynosm1_prefetch_tune): Likewise.
	(thunderxt88_prefetch_tune): Likewise.
	(thunderx_prefetch_tune): Likewise.
	(thunderx2t99_prefetch_tune): Likewise.
	(qdf24xx_prefetch_tune): Likewise. Set prefetch_dynamic_strides to 0.
	(aarch64_override_options_internal): Update to set
	PARAM_PREFETCH_DYNAMIC_STRIDES.
	* doc/invoke.texi (prefetch-dynamic-strides): Document new option.
	* params.def (PARAM_PREFETCH_DYNAMIC_STRIDES): New.
	* params.h (PARAM_PREFETCH_DYNAMIC_STRIDES): Define.
	* tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Account for
	prefetch-dynamic-strides setting.
---
 gcc/config/aarch64/aarch64-protos.h |  3 +++
 gcc/config/aarch64/aarch64.c        | 11 +++++++++++
 gcc/doc/invoke.texi                 | 10 ++++++++++
 gcc/params.def                      |  9 +++++++++
 gcc/params.h                        |  2 ++
 gcc/tree-ssa-loop-prefetch.c        | 10 ++++++++++
 6 files changed, 45 insertions(+)

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 8736bd9..22bd9ae 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -230,6 +230,9 @@ struct cpu_prefetch_tune
   const int l1_cache_size;
   const int l1_cache_line_size;
   const int l2_cache_size;
+  /* Whether software prefetch hints should be issued for non-constant
+     strides.  */
+  const unsigned int prefetch_dynamic_strides;
   /* The minimum constant stride beyond which we should use prefetch
      hints for.  */
   const int minimum_stride;
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 0ed9f14..713b230 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -547,6 +547,7 @@ static const cpu_prefetch_tune generic_prefetch_tune =
   -1,			/* l1_cache_size  */
   -1,			/* l1_cache_line_size  */
   -1,			/* l2_cache_size  */
+  1,			/* prefetch_dynamic_strides */
   -1,			/* minimum_stride */
   -1			/* default_opt_level  */
 };
@@ -557,6 +558,7 @@ static const cpu_prefetch_tune exynosm1_prefetch_tune =
   -1,			/* l1_cache_size  */
   64,			/* l1_cache_line_size  */
   -1,			/* l2_cache_size  */
+  1,			/* prefetch_dynamic_strides */
   -1,			/* minimum_stride */
   -1			/* default_opt_level  */
 };
@@ -567,6 +569,7 @@ static const cpu_prefetch_tune qdf24xx_prefetch_tune =
   32,			/* l1_cache_size  */
   64,			/* l1_cache_line_size  */
   1024,			/* l2_cache_size  */
+  0,			/* prefetch_dynamic_strides */
   2048,			/* minimum_stride */
   3			/* default_opt_level  */
 };
@@ -577,6 +580,7 @@ static const cpu_prefetch_tune thunderxt88_prefetch_tune =
   32,			/* l1_cache_size  */
   128,			/* l1_cache_line_size  */
   16*1024,		/* l2_cache_size  */
+  1,			/* prefetch_dynamic_strides */
   -1,			/* minimum_stride */
   3			/* default_opt_level  */
 };
@@ -587,6 +591,7 @@ static const cpu_prefetch_tune thunderx_prefetch_tune =
   32,			/* l1_cache_size  */
   128,			/* l1_cache_line_size  */
   -1,			/* l2_cache_size  */
+  1,			/* prefetch_dynamic_strides */
   -1,			/* minimum_stride */
   -1			/* default_opt_level  */
 };
@@ -597,6 +602,7 @@ static const cpu_prefetch_tune thunderx2t99_prefetch_tune =
   32,			/* l1_cache_size  */
   64,			/* l1_cache_line_size  */
   256,			/* l2_cache_size  */
+  1,			/* prefetch_dynamic_strides */
   -1,			/* minimum_stride */
   -1			/* default_opt_level  */
 };
@@ -10467,6 +10473,11 @@ aarch64_override_options_internal (struct gcc_options *opts)
 			   aarch64_tune_params.prefetch->l2_cache_size,
 			   opts->x_param_values,
 			   global_options_set.x_param_values);
+  if (aarch64_tune_params.prefetch->prefetch_dynamic_strides == 0)
+    maybe_set_param_value (PARAM_PREFETCH_DYNAMIC_STRIDES,
+			   aarch64_tune_params.prefetch->prefetch_dynamic_strides,
+			   opts->x_param_values,
+			   global_options_set.x_param_values);
   if (aarch64_tune_params.prefetch->minimum_stride >= 0)
     maybe_set_param_value (PARAM_PREFETCH_MINIMUM_STRIDE,
 			   aarch64_tune_params.prefetch->minimum_stride,
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 1cb1ef5..541c24c 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -10567,6 +10567,16 @@ The size of L1 cache, in kilobytes.
 @item l2-cache-size
 The size of L2 cache, in kilobytes.
 
+@item prefetch-dynamic-strides
+Whether the loop array prefetch pass should issue software prefetch hints
+for strides that are non-constant.  In some cases this may be
+beneficial, though the fact the stride is non-constant may make it
+hard to predict when there is clear benefit to issuing these hints.
+
+Set to 1, the default, if the prefetch hints should be issued for non-constant
+strides.  Set to 0 if prefetch hints should be issued only for strides that
+are known to be constant and below @option{prefetch-minimum-stride}.
+
 @item prefetch-minimum-stride
 Minimum constant stride, in bytes, to start using prefetch hints for.  If
 the stride is less than this threshold, prefetch hints will not be issued.
diff --git a/gcc/params.def b/gcc/params.def
index bf2d12c..c564178 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -790,6 +790,15 @@ DEFPARAM (PARAM_L2_CACHE_SIZE,
 	  "The size of L2 cache.",
 	  512, 0, 0)
 
+/* Whether software prefetch hints should be issued for non-constant
+   strides.  */
+
+DEFPARAM (PARAM_PREFETCH_DYNAMIC_STRIDES,
+	  "prefetch-dynamic-strides",
+	  "Whether software prefetch hints should be issued for non-constant "
+	  "strides.",
+	  1, 0, 1)
+
 /* The minimum constant stride beyond which we should use prefetch hints
    for.  */
 
diff --git a/gcc/params.h b/gcc/params.h
index 96012db..8aa960a 100644
--- a/gcc/params.h
+++ b/gcc/params.h
@@ -196,6 +196,8 @@ extern void init_param_values (int *params);
   PARAM_VALUE (PARAM_L1_CACHE_LINE_SIZE)
 #define L2_CACHE_SIZE \
   PARAM_VALUE (PARAM_L2_CACHE_SIZE)
+#define PREFETCH_DYNAMIC_STRIDES \
+  PARAM_VALUE (PARAM_PREFETCH_DYNAMIC_STRIDES)
 #define PREFETCH_MINIMUM_STRIDE \
   PARAM_VALUE (PARAM_PREFETCH_MINIMUM_STRIDE)
 #define USE_CANONICAL_TYPES \
diff --git a/gcc/tree-ssa-loop-prefetch.c b/gcc/tree-ssa-loop-prefetch.c
index 112ccac..de2acc8 100644
--- a/gcc/tree-ssa-loop-prefetch.c
+++ b/gcc/tree-ssa-loop-prefetch.c
@@ -992,6 +992,16 @@ prune_by_reuse (struct mem_ref_group *groups)
 static bool
 should_issue_prefetch_p (struct mem_ref *ref)
 {
+  /* Do we want to issue prefetches for non-constant strides?  */
+  if (!cst_and_fits_in_hwi (ref->group->step) && PREFETCH_DYNAMIC_STRIDES == 0)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file,
+		 "Skipping non-constant step for reference %u:%u\n",
+		 ref->group->uid, ref->uid);
+      return false;
+    }
+
   /* Some processors may have a hardware prefetcher that may conflict with
      prefetch hints for a range of strides.  Make sure we don't issue
      prefetches for such cases if the stride is within this particular
-- 
2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] Introduce prefetch-minimum stride option
  2018-01-22 14:01 ` [PATCH 1/2] Introduce prefetch-minimum stride option Luis Machado
@ 2018-01-23  9:46   ` Kyrill Tkachov
  2018-01-23 13:23     ` Luis Machado
  2018-05-01 18:30   ` Jeff Law
  2018-05-14 21:21   ` Luis Machado
  2 siblings, 1 reply; 28+ messages in thread
From: Kyrill Tkachov @ 2018-01-23  9:46 UTC (permalink / raw)
  To: Luis Machado, gcc-patches; +Cc: James Greenhalgh, Richard Earnshaw

Hi Luis,

On 22/01/18 13:46, Luis Machado wrote:
> This patch adds a new option to control the minimum stride, for a memory
> reference, after which the loop prefetch pass may issue software prefetch
> hints for. There are two motivations:
>
> * Make the pass less aggressive, only issuing prefetch hints for bigger strides
> that are more likely to benefit from prefetching. I've noticed a case in cpu2017
> where we were issuing thousands of hints, for example.
>

I've noticed a large amount of prefetch hints being issued as well, but had not
analysed it further.

> * For processors that have a hardware prefetcher, like Falkor, it allows the
> loop prefetch pass to defer prefetching of smaller (less than the threshold)
> strides to the hardware prefetcher instead. This prevents conflicts between
> the software prefetcher and the hardware prefetcher.
>
> I've noticed considerable reduction in the number of prefetch hints and
> slightly positive performance numbers. This aligns GCC and LLVM in terms of
> prefetch behavior for Falkor.

Do you, by any chance, have a link to the LLVM review that implemented that behavior?
It's okay if you don't, but I think it would be useful context.

>
> The default settings should guarantee no changes for existing targets. Those
> are free to tweak the settings as necessary.
>
> No regressions in the testsuite and bootstrapped ok on aarch64-linux.
>
> Ok?
>

Are there any benchmark numbers you can share?
I think this approach is sensible.

Since your patch touches generic code as well as AArch64
code you'll need an approval from a midend maintainer as well as an AArch64 maintainer.
Also, GCC development is now in the regression fixing stage, so unless this fixes a regression
it may have to wait until GCC 9 development is opened.

Thanks,
Kyrill

> 2018-01-22  Luis Machado  <luis.machado@linaro.org>
>
>         Introduce option to limit software prefetching to known constant
>         strides above a specific threshold with the goal of preventing
>         conflicts with a hardware prefetcher.
>
>         gcc/
>         * config/aarch64/aarch64-protos.h (cpu_prefetch_tune)
>         <minimum_stride>: New const int field.
>         * config/aarch64/aarch64.c (generic_prefetch_tune): Update to include
>         minimum_stride field.
>         (exynosm1_prefetch_tune): Likewise.
>         (thunderxt88_prefetch_tune): Likewise.
>         (thunderx_prefetch_tune): Likewise.
>         (thunderx2t99_prefetch_tune): Likewise.
>         (qdf24xx_prefetch_tune): Likewise. Set minimum_stride to 2048.
>         (aarch64_override_options_internal): Update to set
>         PARAM_PREFETCH_MINIMUM_STRIDE.
>         * doc/invoke.texi (prefetch-minimum-stride): Document new option.
>         * params.def (PARAM_PREFETCH_MINIMUM_STRIDE): New.
>         * params.h (PARAM_PREFETCH_MINIMUM_STRIDE): Define.
>         * tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Return false if
>         stride is constant and is below the minimum stride threshold.
> ---
>  gcc/config/aarch64/aarch64-protos.h |  3 +++
>  gcc/config/aarch64/aarch64.c        | 13 ++++++++++++-
>  gcc/doc/invoke.texi                 | 15 +++++++++++++++
>  gcc/params.def                      |  9 +++++++++
>  gcc/params.h                        |  2 ++
>  gcc/tree-ssa-loop-prefetch.c        | 16 ++++++++++++++++
>  6 files changed, 57 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
> index ef1b0bc..8736bd9 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -230,6 +230,9 @@ struct cpu_prefetch_tune
>    const int l1_cache_size;
>    const int l1_cache_line_size;
>    const int l2_cache_size;
> +  /* The minimum constant stride beyond which we should use prefetch
> +     hints for.  */
> +  const int minimum_stride;
>    const int default_opt_level;
>  };
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 174310c..0ed9f14 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -547,6 +547,7 @@ static const cpu_prefetch_tune generic_prefetch_tune =
>    -1,                  /* l1_cache_size  */
>    -1,                  /* l1_cache_line_size  */
>    -1,                  /* l2_cache_size  */
> +  -1,                  /* minimum_stride */
>    -1                   /* default_opt_level  */
>  };
>
> @@ -556,6 +557,7 @@ static const cpu_prefetch_tune exynosm1_prefetch_tune =
>    -1,                  /* l1_cache_size  */
>    64,                  /* l1_cache_line_size  */
>    -1,                  /* l2_cache_size  */
> +  -1,                  /* minimum_stride */
>    -1                   /* default_opt_level  */
>  };
>
> @@ -565,7 +567,8 @@ static const cpu_prefetch_tune qdf24xx_prefetch_tune =
>    32,                  /* l1_cache_size  */
>    64,                  /* l1_cache_line_size  */
>    1024,                        /* l2_cache_size  */
> -  -1                   /* default_opt_level  */
> +  2048,                        /* minimum_stride */
> +  3                    /* default_opt_level  */
>  };
>
>  static const cpu_prefetch_tune thunderxt88_prefetch_tune =
> @@ -574,6 +577,7 @@ static const cpu_prefetch_tune thunderxt88_prefetch_tune =
>    32,                  /* l1_cache_size  */
>    128,                 /* l1_cache_line_size  */
>    16*1024,             /* l2_cache_size  */
> +  -1,                  /* minimum_stride */
>    3                    /* default_opt_level  */
>  };
>
> @@ -583,6 +587,7 @@ static const cpu_prefetch_tune thunderx_prefetch_tune =
>    32,                  /* l1_cache_size  */
>    128,                 /* l1_cache_line_size  */
>    -1,                  /* l2_cache_size  */
> +  -1,                  /* minimum_stride */
>    -1                   /* default_opt_level  */
>  };
>
> @@ -592,6 +597,7 @@ static const cpu_prefetch_tune thunderx2t99_prefetch_tune =
>    32,                  /* l1_cache_size  */
>    64,                  /* l1_cache_line_size  */
>    256,                 /* l2_cache_size  */
> +  -1,                  /* minimum_stride */
>    -1                   /* default_opt_level  */
>  };
>
> @@ -10461,6 +10467,11 @@ aarch64_override_options_internal (struct gcc_options *opts)
> aarch64_tune_params.prefetch->l2_cache_size,
>                             opts->x_param_values,
> global_options_set.x_param_values);
> +  if (aarch64_tune_params.prefetch->minimum_stride >= 0)
> +    maybe_set_param_value (PARAM_PREFETCH_MINIMUM_STRIDE,
> + aarch64_tune_params.prefetch->minimum_stride,
> +                          opts->x_param_values,
> + global_options_set.x_param_values);
>
>    /* Use the alternative scheduling-pressure algorithm by default.  */
>    maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, SCHED_PRESSURE_MODEL,
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 27c5974..1cb1ef5 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -10567,6 +10567,21 @@ The size of L1 cache, in kilobytes.
>  @item l2-cache-size
>  The size of L2 cache, in kilobytes.
>
> +@item prefetch-minimum-stride
> +Minimum constant stride, in bytes, to start using prefetch hints for.  If
> +the stride is less than this threshold, prefetch hints will not be issued.
> +
> +This setting is useful for processors that have hardware prefetchers, in
> +which case there may be conflicts between the hardware prefetchers and
> +the software prefetchers.  If the hardware prefetchers have a maximum
> +stride they can handle, it should be used here to improve the use of
> +software prefetchers.
> +
> +A value of -1, the default, means we don't have a threshold and therefore
> +prefetch hints can be issued for any constant stride.
> +
> +This setting is only useful for strides that are known and constant.
> +
>  @item loop-interchange-max-num-stmts
>  The maximum number of stmts in a loop to be interchanged.
>
> diff --git a/gcc/params.def b/gcc/params.def
> index 930b318..bf2d12c 100644
> --- a/gcc/params.def
> +++ b/gcc/params.def
> @@ -790,6 +790,15 @@ DEFPARAM (PARAM_L2_CACHE_SIZE,
>            "The size of L2 cache.",
>            512, 0, 0)
>
> +/* The minimum constant stride beyond which we should use prefetch hints
> +   for.  */
> +
> +DEFPARAM (PARAM_PREFETCH_MINIMUM_STRIDE,
> +         "prefetch-minimum-stride",
> +         "The minimum constant stride beyond which we should use prefetch "
> +         "hints for.",
> +         -1, 0, 0)
> +
>  /* Maximum number of statements in loop nest for loop interchange.  */
>
>  DEFPARAM (PARAM_LOOP_INTERCHANGE_MAX_NUM_STMTS,
> diff --git a/gcc/params.h b/gcc/params.h
> index 98249d2..96012db 100644
> --- a/gcc/params.h
> +++ b/gcc/params.h
> @@ -196,6 +196,8 @@ extern void init_param_values (int *params);
>    PARAM_VALUE (PARAM_L1_CACHE_LINE_SIZE)
>  #define L2_CACHE_SIZE \
>    PARAM_VALUE (PARAM_L2_CACHE_SIZE)
> +#define PREFETCH_MINIMUM_STRIDE \
> +  PARAM_VALUE (PARAM_PREFETCH_MINIMUM_STRIDE)
>  #define USE_CANONICAL_TYPES \
>    PARAM_VALUE (PARAM_USE_CANONICAL_TYPES)
>  #define IRA_MAX_LOOPS_NUM \
> diff --git a/gcc/tree-ssa-loop-prefetch.c b/gcc/tree-ssa-loop-prefetch.c
> index 2f10db1..112ccac 100644
> --- a/gcc/tree-ssa-loop-prefetch.c
> +++ b/gcc/tree-ssa-loop-prefetch.c
> @@ -992,6 +992,22 @@ prune_by_reuse (struct mem_ref_group *groups)
>  static bool
>  should_issue_prefetch_p (struct mem_ref *ref)
>  {
> +  /* Some processors may have a hardware prefetcher that may conflict with
> +     prefetch hints for a range of strides.  Make sure we don't issue
> +     prefetches for such cases if the stride is within this particular
> +     range.  */
> +  if (cst_and_fits_in_hwi (ref->group->step)
> +      && absu_hwi (int_cst_value (ref->group->step)) < PREFETCH_MINIMUM_STRIDE)
> +    {
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +       fprintf (dump_file,
> +                "Step for reference %u:%u (%d) is less than the mininum "
> +                " required stride of %d\n",
> +                ref->group->uid, ref->uid, int_cst_value (ref->group->step),
> +                PREFETCH_MINIMUM_STRIDE);
> +      return false;
> +    }
> +
>    /* For now do not issue prefetches for only first few of the
>       iterations.  */
>    if (ref->prefetch_before != PREFETCH_ALL)
> -- 
> 2.7.4
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] Introduce prefetch-dynamic-strides option.
  2018-01-22 14:10 ` [PATCH 2/2] Introduce prefetch-dynamic-strides option Luis Machado
@ 2018-01-23  9:53   ` Kyrill Tkachov
  2018-01-23 13:32     ` Luis Machado
  2018-05-01 18:31   ` Jeff Law
  1 sibling, 1 reply; 28+ messages in thread
From: Kyrill Tkachov @ 2018-01-23  9:53 UTC (permalink / raw)
  To: Luis Machado, gcc-patches; +Cc: James Greenhalgh, Richard Earnshaw

Hi Luis,

On 22/01/18 13:46, Luis Machado wrote:
> The following patch adds an option to control software prefetching of memory
> references with non-constant/unknown strides.
>
> Currently we prefetch these references if the pass thinks there is benefit to
> doing so. But, since this is all based on heuristics, it's not always the case
> that we end up with better performance.
>
> For Falkor there is also the problem of conflicts with the hardware prefetcher,
> so we need to be more conservative in terms of what we issue software prefetch
> hints for.
>
> This also aligns GCC with what LLVM does for Falkor.
>
> Similarly to the previous patch, the defaults guarantee no change in behavior
> for other targets and architectures.
>
> I've regression-tested and bootstrapped it on aarch64-linux. No problems found.
>
> Ok?
>

This also looks like a sensible approach to me with a caveat inline.
The same general comments as for patch [1/2] apply.

Thanks,
Kyrill

> 2018-01-22  Luis Machado  <luis.machado@linaro.org>
>
>         Introduce option to control whether the software prefetch pass should
>         issue prefetch hints for non-constant strides.
>
>         gcc/
>         * config/aarch64/aarch64-protos.h (cpu_prefetch_tune)
>         <prefetch_dynamic_strides>: New const unsigned int field.
>         * config/aarch64/aarch64.c (generic_prefetch_tune): Update to include
>         prefetch_dynamic_strides.
>         (exynosm1_prefetch_tune): Likewise.
>         (thunderxt88_prefetch_tune): Likewise.
>         (thunderx_prefetch_tune): Likewise.
>         (thunderx2t99_prefetch_tune): Likewise.
>         (qdf24xx_prefetch_tune): Likewise. Set prefetch_dynamic_strides to 0.
>         (aarch64_override_options_internal): Update to set
>         PARAM_PREFETCH_DYNAMIC_STRIDES.
>         * doc/invoke.texi (prefetch-dynamic-strides): Document new option.
>         * params.def (PARAM_PREFETCH_DYNAMIC_STRIDES): New.
>         * params.h (PARAM_PREFETCH_DYNAMIC_STRIDES): Define.
>         * tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Account for
>         prefetch-dynamic-strides setting.
> ---
>  gcc/config/aarch64/aarch64-protos.h |  3 +++
>  gcc/config/aarch64/aarch64.c        | 11 +++++++++++
>  gcc/doc/invoke.texi                 | 10 ++++++++++
>  gcc/params.def                      |  9 +++++++++
>  gcc/params.h                        |  2 ++
>  gcc/tree-ssa-loop-prefetch.c        | 10 ++++++++++
>  6 files changed, 45 insertions(+)
>
> diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
> index 8736bd9..22bd9ae 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -230,6 +230,9 @@ struct cpu_prefetch_tune
>    const int l1_cache_size;
>    const int l1_cache_line_size;
>    const int l2_cache_size;
> +  /* Whether software prefetch hints should be issued for non-constant
> +     strides.  */
> +  const unsigned int prefetch_dynamic_strides;

I understand that the midend PARAMs are defined as integers, but I think
the backend tuning option here is better represented as a boolean as it really
is just a yes/no decision.

>    /* The minimum constant stride beyond which we should use prefetch
>       hints for.  */
>    const int minimum_stride;
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 0ed9f14..713b230 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -547,6 +547,7 @@ static const cpu_prefetch_tune generic_prefetch_tune =
>    -1,                  /* l1_cache_size  */
>    -1,                  /* l1_cache_line_size  */
>    -1,                  /* l2_cache_size  */
> +  1,                   /* prefetch_dynamic_strides */
>    -1,                  /* minimum_stride */
>    -1                   /* default_opt_level  */
>  };
> @@ -557,6 +558,7 @@ static const cpu_prefetch_tune exynosm1_prefetch_tune =
>    -1,                  /* l1_cache_size  */
>    64,                  /* l1_cache_line_size  */
>    -1,                  /* l2_cache_size  */
> +  1,                   /* prefetch_dynamic_strides */
>    -1,                  /* minimum_stride */
>    -1                   /* default_opt_level  */
>  };
> @@ -567,6 +569,7 @@ static const cpu_prefetch_tune qdf24xx_prefetch_tune =
>    32,                  /* l1_cache_size  */
>    64,                  /* l1_cache_line_size  */
>    1024,                        /* l2_cache_size  */
> +  0,                   /* prefetch_dynamic_strides */
>    2048,                        /* minimum_stride */
>    3                    /* default_opt_level  */
>  };
> @@ -577,6 +580,7 @@ static const cpu_prefetch_tune thunderxt88_prefetch_tune =
>    32,                  /* l1_cache_size  */
>    128,                 /* l1_cache_line_size  */
>    16*1024,             /* l2_cache_size  */
> +  1,                   /* prefetch_dynamic_strides */
>    -1,                  /* minimum_stride */
>    3                    /* default_opt_level  */
>  };
> @@ -587,6 +591,7 @@ static const cpu_prefetch_tune thunderx_prefetch_tune =
>    32,                  /* l1_cache_size  */
>    128,                 /* l1_cache_line_size  */
>    -1,                  /* l2_cache_size  */
> +  1,                   /* prefetch_dynamic_strides */
>    -1,                  /* minimum_stride */
>    -1                   /* default_opt_level  */
>  };
> @@ -597,6 +602,7 @@ static const cpu_prefetch_tune thunderx2t99_prefetch_tune =
>    32,                  /* l1_cache_size  */
>    64,                  /* l1_cache_line_size  */
>    256,                 /* l2_cache_size  */
> +  1,                   /* prefetch_dynamic_strides */
>    -1,                  /* minimum_stride */
>    -1                   /* default_opt_level  */
>  };
> @@ -10467,6 +10473,11 @@ aarch64_override_options_internal (struct gcc_options *opts)
> aarch64_tune_params.prefetch->l2_cache_size,
>                             opts->x_param_values,
> global_options_set.x_param_values);
> +  if (aarch64_tune_params.prefetch->prefetch_dynamic_strides == 0)
> +    maybe_set_param_value (PARAM_PREFETCH_DYNAMIC_STRIDES,
> + aarch64_tune_params.prefetch->prefetch_dynamic_strides,
> +                          opts->x_param_values,
> + global_options_set.x_param_values);
>    if (aarch64_tune_params.prefetch->minimum_stride >= 0)
>      maybe_set_param_value (PARAM_PREFETCH_MINIMUM_STRIDE,
> aarch64_tune_params.prefetch->minimum_stride,
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 1cb1ef5..541c24c 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -10567,6 +10567,16 @@ The size of L1 cache, in kilobytes.
>  @item l2-cache-size
>  The size of L2 cache, in kilobytes.
>
> +@item prefetch-dynamic-strides
> +Whether the loop array prefetch pass should issue software prefetch hints
> +for strides that are non-constant.  In some cases this may be
> +beneficial, though the fact the stride is non-constant may make it
> +hard to predict when there is clear benefit to issuing these hints.
> +
> +Set to 1, the default, if the prefetch hints should be issued for non-constant
> +strides.  Set to 0 if prefetch hints should be issued only for strides that
> +are known to be constant and below @option{prefetch-minimum-stride}.
> +
>  @item prefetch-minimum-stride
>  Minimum constant stride, in bytes, to start using prefetch hints for.  If
>  the stride is less than this threshold, prefetch hints will not be issued.
> diff --git a/gcc/params.def b/gcc/params.def
> index bf2d12c..c564178 100644
> --- a/gcc/params.def
> +++ b/gcc/params.def
> @@ -790,6 +790,15 @@ DEFPARAM (PARAM_L2_CACHE_SIZE,
>            "The size of L2 cache.",
>            512, 0, 0)
>
> +/* Whether software prefetch hints should be issued for non-constant
> +   strides.  */
> +
> +DEFPARAM (PARAM_PREFETCH_DYNAMIC_STRIDES,
> +         "prefetch-dynamic-strides",
> +         "Whether software prefetch hints should be issued for non-constant "
> +         "strides.",
> +         1, 0, 1)
> +
>  /* The minimum constant stride beyond which we should use prefetch hints
>     for.  */
>
> diff --git a/gcc/params.h b/gcc/params.h
> index 96012db..8aa960a 100644
> --- a/gcc/params.h
> +++ b/gcc/params.h
> @@ -196,6 +196,8 @@ extern void init_param_values (int *params);
>    PARAM_VALUE (PARAM_L1_CACHE_LINE_SIZE)
>  #define L2_CACHE_SIZE \
>    PARAM_VALUE (PARAM_L2_CACHE_SIZE)
> +#define PREFETCH_DYNAMIC_STRIDES \
> +  PARAM_VALUE (PARAM_PREFETCH_DYNAMIC_STRIDES)
>  #define PREFETCH_MINIMUM_STRIDE \
>    PARAM_VALUE (PARAM_PREFETCH_MINIMUM_STRIDE)
>  #define USE_CANONICAL_TYPES \
> diff --git a/gcc/tree-ssa-loop-prefetch.c b/gcc/tree-ssa-loop-prefetch.c
> index 112ccac..de2acc8 100644
> --- a/gcc/tree-ssa-loop-prefetch.c
> +++ b/gcc/tree-ssa-loop-prefetch.c
> @@ -992,6 +992,16 @@ prune_by_reuse (struct mem_ref_group *groups)
>  static bool
>  should_issue_prefetch_p (struct mem_ref *ref)
>  {
> +  /* Do we want to issue prefetches for non-constant strides?  */
> +  if (!cst_and_fits_in_hwi (ref->group->step) && PREFETCH_DYNAMIC_STRIDES == 0)
> +    {
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +       fprintf (dump_file,
> +                "Skipping non-constant step for reference %u:%u\n",
> +                ref->group->uid, ref->uid);
> +      return false;
> +    }
> +
>    /* Some processors may have a hardware prefetcher that may conflict with
>       prefetch hints for a range of strides.  Make sure we don't issue
>       prefetches for such cases if the stride is within this particular
> -- 
> 2.7.4
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] Introduce prefetch-minimum stride option
  2018-01-23  9:46   ` Kyrill Tkachov
@ 2018-01-23 13:23     ` Luis Machado
  0 siblings, 0 replies; 28+ messages in thread
From: Luis Machado @ 2018-01-23 13:23 UTC (permalink / raw)
  To: Kyrill Tkachov, gcc-patches; +Cc: James Greenhalgh, Richard Earnshaw

Hi Kyrill,

On 01/23/2018 07:32 AM, Kyrill Tkachov wrote:
> Hi Luis,
> 
> On 22/01/18 13:46, Luis Machado wrote:
>> This patch adds a new option to control the minimum stride, for a memory
>> reference, after which the loop prefetch pass may issue software prefetch
>> hints for. There are two motivations:
>>
>> * Make the pass less aggressive, only issuing prefetch hints for 
>> bigger strides
>> that are more likely to benefit from prefetching. I've noticed a case 
>> in cpu2017
>> where we were issuing thousands of hints, for example.
>>
> 
> I've noticed a large amount of prefetch hints being issued as well, but 
> had not
> analysed it further.
> 

I've gathered some numbers for this. Some of the most extreme cases 
before both patches:

CPU2017

xalancbmk_s: 3755 hints
wrf_s: 10950 hints
parest_r: 8521 hints

CPU2006

gamess: 11377 hints
wrf: 3238 hints

After both patches:

CPU2017

xalancbmk_s: 1 hint
wrf_s: 20 hints
parest_r: 0 hints

CPU2006

gamess: 44 hints
wrf: 16 hints


>> * For processors that have a hardware prefetcher, like Falkor, it 
>> allows the
>> loop prefetch pass to defer prefetching of smaller (less than the 
>> threshold)
>> strides to the hardware prefetcher instead. This prevents conflicts 
>> between
>> the software prefetcher and the hardware prefetcher.
>>
>> I've noticed considerable reduction in the number of prefetch hints and
>> slightly positive performance numbers. This aligns GCC and LLVM in 
>> terms of
>> prefetch behavior for Falkor.
> 
> Do you, by any chance, have a link to the LLVM review that implemented 
> that behavior?
> It's okay if you don't, but I think it would be useful context.
> 

I've dug it up. The base change was implemented here:

review: https://reviews.llvm.org/D17945
RFC: http://lists.llvm.org/pipermail/llvm-dev/2015-December/093514.html

And then target-specific changes were introduced later for specific 
processors.

One small difference in LLVM is the fact that the second parameter, 
prefetching of non-constant strides, is implicitly switched off if one 
sets the minimum stride length. My approach here makes that second 
parameter adjustable.

I've seen big gains due to prefetching of non-constant strides, but it 
tends to be tricky to control and usually comes together with 
significant regressions as well.

The fact that we potentially unroll loops along with issuing prefetch 
hints also makes things a bit erratic.

>>
>> The default settings should guarantee no changes for existing targets. 
>> Those
>> are free to tweak the settings as necessary.
>>
>> No regressions in the testsuite and bootstrapped ok on aarch64-linux.
>>
>> Ok?
>>
> 
> Are there any benchmark numbers you can share?
> I think this approach is sensible.
> 

Comparing the previous, more aggressive, pass behavior with the new one 
i've seen a slight improvement for CPU2006, 0.15% for both INT and FP.

For CPU2017 the previous behavior was actually a bit harmful, regressing 
performance by about 1.2% in intspeed. The new behavior kept intspeed 
stable and slightly improved fpspeed by 0.15%.

The motivation for the future is to have better control of software 
prefetching so we can fine-tune the pass, either through generic loop 
prefetch code or by using the target-specific parameters.

> Since your patch touches generic code as well as AArch64
> code you'll need an approval from a midend maintainer as well as an 
> AArch64 maintainer.
> Also, GCC development is now in the regression fixing stage, so unless 
> this fixes a regression
> it may have to wait until GCC 9 development is opened.

That is my understanding. I thought i'd put this up for review anyway so 
people can chime in and provide their thoughts.

Thanks for the review.

Luis

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] Introduce prefetch-dynamic-strides option.
  2018-01-23  9:53   ` Kyrill Tkachov
@ 2018-01-23 13:32     ` Luis Machado
  0 siblings, 0 replies; 28+ messages in thread
From: Luis Machado @ 2018-01-23 13:32 UTC (permalink / raw)
  To: Kyrill Tkachov, gcc-patches; +Cc: James Greenhalgh, Richard Earnshaw

Hi,

On 01/23/2018 07:40 AM, Kyrill Tkachov wrote:
> Hi Luis,
> 
> On 22/01/18 13:46, Luis Machado wrote:
>> The following patch adds an option to control software prefetching of 
>> memory
>> references with non-constant/unknown strides.
>>
>> Currently we prefetch these references if the pass thinks there is 
>> benefit to
>> doing so. But, since this is all based on heuristics, it's not always 
>> the case
>> that we end up with better performance.
>>
>> For Falkor there is also the problem of conflicts with the hardware 
>> prefetcher,
>> so we need to be more conservative in terms of what we issue software 
>> prefetch
>> hints for.
>>
>> This also aligns GCC with what LLVM does for Falkor.
>>
>> Similarly to the previous patch, the defaults guarantee no change in 
>> behavior
>> for other targets and architectures.
>>
>> I've regression-tested and bootstrapped it on aarch64-linux. No 
>> problems found.
>>
>> Ok?
>>
> 
> This also looks like a sensible approach to me with a caveat inline.
> The same general comments as for patch [1/2] apply.
>> diff --git a/gcc/config/aarch64/aarch64-protos.h 
>> b/gcc/config/aarch64/aarch64-protos.h
>> index 8736bd9..22bd9ae 100644
>> --- a/gcc/config/aarch64/aarch64-protos.h
>> +++ b/gcc/config/aarch64/aarch64-protos.h
>> @@ -230,6 +230,9 @@ struct cpu_prefetch_tune
>>    const int l1_cache_size;
>>    const int l1_cache_line_size;
>>    const int l2_cache_size;
>> +  /* Whether software prefetch hints should be issued for non-constant
>> +     strides.  */
>> +  const unsigned int prefetch_dynamic_strides;
> 
> I understand that the midend PARAMs are defined as integers, but I think
> the backend tuning option here is better represented as a boolean as it 
> really
> is just a yes/no decision.
>
I started off with a boolean to be honest. Then i noticed the midend 
only used integers, which i restricted to the range of 0..1.

I'll change this locally to use booleans again.

Thanks!
Luis

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] Introduce prefetch-minimum stride option
  2018-01-22 14:01 ` [PATCH 1/2] Introduce prefetch-minimum stride option Luis Machado
  2018-01-23  9:46   ` Kyrill Tkachov
@ 2018-05-01 18:30   ` Jeff Law
  2018-05-07 14:10     ` Luis Machado
  2018-05-14 21:21   ` Luis Machado
  2 siblings, 1 reply; 28+ messages in thread
From: Jeff Law @ 2018-05-01 18:30 UTC (permalink / raw)
  To: Luis Machado, gcc-patches; +Cc: james.greenhalgh, Richard.Earnshaw

On 01/22/2018 06:46 AM, Luis Machado wrote:
> This patch adds a new option to control the minimum stride, for a memory
> reference, after which the loop prefetch pass may issue software prefetch
> hints for. There are two motivations:
> 
> * Make the pass less aggressive, only issuing prefetch hints for bigger strides
> that are more likely to benefit from prefetching. I've noticed a case in cpu2017
> where we were issuing thousands of hints, for example.
> 
> * For processors that have a hardware prefetcher, like Falkor, it allows the
> loop prefetch pass to defer prefetching of smaller (less than the threshold)
> strides to the hardware prefetcher instead. This prevents conflicts between
> the software prefetcher and the hardware prefetcher.
> 
> I've noticed considerable reduction in the number of prefetch hints and
> slightly positive performance numbers. This aligns GCC and LLVM in terms of
> prefetch behavior for Falkor.
> 
> The default settings should guarantee no changes for existing targets. Those
> are free to tweak the settings as necessary.
> 
> No regressions in the testsuite and bootstrapped ok on aarch64-linux.
> 
> Ok?
> 
> 2018-01-22  Luis Machado  <luis.machado@linaro.org>
> 
> 	Introduce option to limit software prefetching to known constant
> 	strides above a specific threshold with the goal of preventing
> 	conflicts with a hardware prefetcher.
> 
> 	gcc/
> 	* config/aarch64/aarch64-protos.h (cpu_prefetch_tune)
> 	<minimum_stride>: New const int field.
> 	* config/aarch64/aarch64.c (generic_prefetch_tune): Update to include
> 	minimum_stride field.
> 	(exynosm1_prefetch_tune): Likewise.
> 	(thunderxt88_prefetch_tune): Likewise.
> 	(thunderx_prefetch_tune): Likewise.
> 	(thunderx2t99_prefetch_tune): Likewise.
> 	(qdf24xx_prefetch_tune): Likewise. Set minimum_stride to 2048.
> 	(aarch64_override_options_internal): Update to set
> 	PARAM_PREFETCH_MINIMUM_STRIDE.
> 	* doc/invoke.texi (prefetch-minimum-stride): Document new option.
> 	* params.def (PARAM_PREFETCH_MINIMUM_STRIDE): New.
> 	* params.h (PARAM_PREFETCH_MINIMUM_STRIDE): Define.
> 	* tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Return false if
> 	stride is constant and is below the minimum stride threshold.
OK for the trunk.
jeff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] Introduce prefetch-dynamic-strides option.
  2018-01-22 14:10 ` [PATCH 2/2] Introduce prefetch-dynamic-strides option Luis Machado
  2018-01-23  9:53   ` Kyrill Tkachov
@ 2018-05-01 18:31   ` Jeff Law
  2018-05-07 14:13     ` Luis Machado
  1 sibling, 1 reply; 28+ messages in thread
From: Jeff Law @ 2018-05-01 18:31 UTC (permalink / raw)
  To: Luis Machado, gcc-patches; +Cc: james.greenhalgh, Richard.Earnshaw

On 01/22/2018 06:46 AM, Luis Machado wrote:
> The following patch adds an option to control software prefetching of memory
> references with non-constant/unknown strides.
> 
> Currently we prefetch these references if the pass thinks there is benefit to
> doing so. But, since this is all based on heuristics, it's not always the case
> that we end up with better performance.
> 
> For Falkor there is also the problem of conflicts with the hardware prefetcher,
> so we need to be more conservative in terms of what we issue software prefetch
> hints for.
> 
> This also aligns GCC with what LLVM does for Falkor.
> 
> Similarly to the previous patch, the defaults guarantee no change in behavior
> for other targets and architectures.
> 
> I've regression-tested and bootstrapped it on aarch64-linux. No problems found.
> 
> Ok?
> 
> 2018-01-22  Luis Machado  <luis.machado@linaro.org>
> 
> 	Introduce option to control whether the software prefetch pass should
> 	issue prefetch hints for non-constant strides.
> 
> 	gcc/
> 	* config/aarch64/aarch64-protos.h (cpu_prefetch_tune)
> 	<prefetch_dynamic_strides>: New const unsigned int field.
> 	* config/aarch64/aarch64.c (generic_prefetch_tune): Update to include
> 	prefetch_dynamic_strides.
> 	(exynosm1_prefetch_tune): Likewise.
> 	(thunderxt88_prefetch_tune): Likewise.
> 	(thunderx_prefetch_tune): Likewise.
> 	(thunderx2t99_prefetch_tune): Likewise.
> 	(qdf24xx_prefetch_tune): Likewise. Set prefetch_dynamic_strides to 0.
> 	(aarch64_override_options_internal): Update to set
> 	PARAM_PREFETCH_DYNAMIC_STRIDES.
> 	* doc/invoke.texi (prefetch-dynamic-strides): Document new option.
> 	* params.def (PARAM_PREFETCH_DYNAMIC_STRIDES): New.
> 	* params.h (PARAM_PREFETCH_DYNAMIC_STRIDES): Define.
> 	* tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Account for
> 	prefetch-dynamic-strides setting.
OK for the trunk.
jeff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] Introduce prefetch-minimum stride option
  2018-05-01 18:30   ` Jeff Law
@ 2018-05-07 14:10     ` Luis Machado
  2018-05-07 15:15       ` H.J. Lu
  0 siblings, 1 reply; 28+ messages in thread
From: Luis Machado @ 2018-05-07 14:10 UTC (permalink / raw)
  To: Jeff Law, gcc-patches; +Cc: james.greenhalgh, Richard.Earnshaw



On 05/01/2018 03:30 PM, Jeff Law wrote:
> On 01/22/2018 06:46 AM, Luis Machado wrote:
>> This patch adds a new option to control the minimum stride, for a memory
>> reference, after which the loop prefetch pass may issue software prefetch
>> hints for. There are two motivations:
>>
>> * Make the pass less aggressive, only issuing prefetch hints for bigger strides
>> that are more likely to benefit from prefetching. I've noticed a case in cpu2017
>> where we were issuing thousands of hints, for example.
>>
>> * For processors that have a hardware prefetcher, like Falkor, it allows the
>> loop prefetch pass to defer prefetching of smaller (less than the threshold)
>> strides to the hardware prefetcher instead. This prevents conflicts between
>> the software prefetcher and the hardware prefetcher.
>>
>> I've noticed considerable reduction in the number of prefetch hints and
>> slightly positive performance numbers. This aligns GCC and LLVM in terms of
>> prefetch behavior for Falkor.
>>
>> The default settings should guarantee no changes for existing targets. Those
>> are free to tweak the settings as necessary.
>>
>> No regressions in the testsuite and bootstrapped ok on aarch64-linux.
>>
>> Ok?
>>
>> 2018-01-22  Luis Machado  <luis.machado@linaro.org>
>>
>> 	Introduce option to limit software prefetching to known constant
>> 	strides above a specific threshold with the goal of preventing
>> 	conflicts with a hardware prefetcher.
>>
>> 	gcc/
>> 	* config/aarch64/aarch64-protos.h (cpu_prefetch_tune)
>> 	<minimum_stride>: New const int field.
>> 	* config/aarch64/aarch64.c (generic_prefetch_tune): Update to include
>> 	minimum_stride field.
>> 	(exynosm1_prefetch_tune): Likewise.
>> 	(thunderxt88_prefetch_tune): Likewise.
>> 	(thunderx_prefetch_tune): Likewise.
>> 	(thunderx2t99_prefetch_tune): Likewise.
>> 	(qdf24xx_prefetch_tune): Likewise. Set minimum_stride to 2048.
>> 	(aarch64_override_options_internal): Update to set
>> 	PARAM_PREFETCH_MINIMUM_STRIDE.
>> 	* doc/invoke.texi (prefetch-minimum-stride): Document new option.
>> 	* params.def (PARAM_PREFETCH_MINIMUM_STRIDE): New.
>> 	* params.h (PARAM_PREFETCH_MINIMUM_STRIDE): Define.
>> 	* tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Return false if
>> 	stride is constant and is below the minimum stride threshold.
> OK for the trunk.
> jeff
> 

Thanks. Committed as revision 259995 now.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] Introduce prefetch-dynamic-strides option.
  2018-05-01 18:31   ` Jeff Law
@ 2018-05-07 14:13     ` Luis Machado
  0 siblings, 0 replies; 28+ messages in thread
From: Luis Machado @ 2018-05-07 14:13 UTC (permalink / raw)
  To: Jeff Law, gcc-patches; +Cc: james.greenhalgh, Richard.Earnshaw



On 05/01/2018 03:30 PM, Jeff Law wrote:
> On 01/22/2018 06:46 AM, Luis Machado wrote:
>> The following patch adds an option to control software prefetching of memory
>> references with non-constant/unknown strides.
>>
>> Currently we prefetch these references if the pass thinks there is benefit to
>> doing so. But, since this is all based on heuristics, it's not always the case
>> that we end up with better performance.
>>
>> For Falkor there is also the problem of conflicts with the hardware prefetcher,
>> so we need to be more conservative in terms of what we issue software prefetch
>> hints for.
>>
>> This also aligns GCC with what LLVM does for Falkor.
>>
>> Similarly to the previous patch, the defaults guarantee no change in behavior
>> for other targets and architectures.
>>
>> I've regression-tested and bootstrapped it on aarch64-linux. No problems found.
>>
>> Ok?
>>
>> 2018-01-22  Luis Machado  <luis.machado@linaro.org>
>>
>> 	Introduce option to control whether the software prefetch pass should
>> 	issue prefetch hints for non-constant strides.
>>
>> 	gcc/
>> 	* config/aarch64/aarch64-protos.h (cpu_prefetch_tune)
>> 	<prefetch_dynamic_strides>: New const unsigned int field.
>> 	* config/aarch64/aarch64.c (generic_prefetch_tune): Update to include
>> 	prefetch_dynamic_strides.
>> 	(exynosm1_prefetch_tune): Likewise.
>> 	(thunderxt88_prefetch_tune): Likewise.
>> 	(thunderx_prefetch_tune): Likewise.
>> 	(thunderx2t99_prefetch_tune): Likewise.
>> 	(qdf24xx_prefetch_tune): Likewise. Set prefetch_dynamic_strides to 0.
>> 	(aarch64_override_options_internal): Update to set
>> 	PARAM_PREFETCH_DYNAMIC_STRIDES.
>> 	* doc/invoke.texi (prefetch-dynamic-strides): Document new option.
>> 	* params.def (PARAM_PREFETCH_DYNAMIC_STRIDES): New.
>> 	* params.h (PARAM_PREFETCH_DYNAMIC_STRIDES): Define.
>> 	* tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Account for
>> 	prefetch-dynamic-strides setting.
> OK for the trunk.
> jeff
> 

Thanks. Committed as r259996.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] Introduce prefetch-minimum stride option
  2018-05-07 14:10     ` Luis Machado
@ 2018-05-07 15:15       ` H.J. Lu
  2018-05-07 15:51         ` Luis Machado
  0 siblings, 1 reply; 28+ messages in thread
From: H.J. Lu @ 2018-05-07 15:15 UTC (permalink / raw)
  To: Luis Machado; +Cc: Jeff Law, GCC Patches, James Greenhalgh, Richard Earnshaw

On Mon, May 7, 2018 at 7:09 AM, Luis Machado <luis.machado@linaro.org> wrote:
>
>
> On 05/01/2018 03:30 PM, Jeff Law wrote:
>>
>> On 01/22/2018 06:46 AM, Luis Machado wrote:
>>>
>>> This patch adds a new option to control the minimum stride, for a memory
>>> reference, after which the loop prefetch pass may issue software prefetch
>>> hints for. There are two motivations:
>>>
>>> * Make the pass less aggressive, only issuing prefetch hints for bigger
>>> strides
>>> that are more likely to benefit from prefetching. I've noticed a case in
>>> cpu2017
>>> where we were issuing thousands of hints, for example.
>>>
>>> * For processors that have a hardware prefetcher, like Falkor, it allows
>>> the
>>> loop prefetch pass to defer prefetching of smaller (less than the
>>> threshold)
>>> strides to the hardware prefetcher instead. This prevents conflicts
>>> between
>>> the software prefetcher and the hardware prefetcher.
>>>
>>> I've noticed considerable reduction in the number of prefetch hints and
>>> slightly positive performance numbers. This aligns GCC and LLVM in terms
>>> of
>>> prefetch behavior for Falkor.
>>>
>>> The default settings should guarantee no changes for existing targets.
>>> Those
>>> are free to tweak the settings as necessary.
>>>
>>> No regressions in the testsuite and bootstrapped ok on aarch64-linux.
>>>
>>> Ok?
>>>
>>> 2018-01-22  Luis Machado  <luis.machado@linaro.org>
>>>
>>>         Introduce option to limit software prefetching to known constant
>>>         strides above a specific threshold with the goal of preventing
>>>         conflicts with a hardware prefetcher.
>>>
>>>         gcc/
>>>         * config/aarch64/aarch64-protos.h (cpu_prefetch_tune)
>>>         <minimum_stride>: New const int field.
>>>         * config/aarch64/aarch64.c (generic_prefetch_tune): Update to
>>> include
>>>         minimum_stride field.
>>>         (exynosm1_prefetch_tune): Likewise.
>>>         (thunderxt88_prefetch_tune): Likewise.
>>>         (thunderx_prefetch_tune): Likewise.
>>>         (thunderx2t99_prefetch_tune): Likewise.
>>>         (qdf24xx_prefetch_tune): Likewise. Set minimum_stride to 2048.
>>>         (aarch64_override_options_internal): Update to set
>>>         PARAM_PREFETCH_MINIMUM_STRIDE.
>>>         * doc/invoke.texi (prefetch-minimum-stride): Document new option.
>>>         * params.def (PARAM_PREFETCH_MINIMUM_STRIDE): New.
>>>         * params.h (PARAM_PREFETCH_MINIMUM_STRIDE): Define.
>>>         * tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Return
>>> false if
>>>         stride is constant and is below the minimum stride threshold.
>>
>> OK for the trunk.
>> jeff
>>
>
> Thanks. Committed as revision 259995 now.

This breaks bootstrap on x86:

../../src-trunk/gcc/tree-ssa-loop-prefetch.c: In function ‘bool
should_issue_prefetch_p(mem_ref*)’:
../../src-trunk/gcc/tree-ssa-loop-prefetch.c:1010:54: error:
comparison of integer expressions of different signedness: ‘long long
unsigned int’ and ‘int’ [-Werror=sign-compare]
       && absu_hwi (int_cst_value (ref->group->step)) < PREFETCH_MINIMUM_STRIDE)
../../src-trunk/gcc/tree-ssa-loop-prefetch.c:1014:4: error: format
‘%d’ expects argument of type ‘int’, but argument 5 has type ‘long
long int’ [-Werror=format=]
    "Step for reference %u:%u (%d) is less than the mininum "
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    "required stride of %d\n",
    ~~~~~~~~~~~~~~~~~~~~~~~~~
    ref->group->uid, ref->uid, int_cst_value (ref->group->step),
                               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-- 
H.J.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] Introduce prefetch-minimum stride option
  2018-05-07 15:15       ` H.J. Lu
@ 2018-05-07 15:51         ` Luis Machado
  0 siblings, 0 replies; 28+ messages in thread
From: Luis Machado @ 2018-05-07 15:51 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Jeff Law, GCC Patches, James Greenhalgh, Richard Earnshaw



On 05/07/2018 12:15 PM, H.J. Lu wrote:
> On Mon, May 7, 2018 at 7:09 AM, Luis Machado <luis.machado@linaro.org> wrote:
>>
>>
>> On 05/01/2018 03:30 PM, Jeff Law wrote:
>>>
>>> On 01/22/2018 06:46 AM, Luis Machado wrote:
>>>>
>>>> This patch adds a new option to control the minimum stride, for a memory
>>>> reference, after which the loop prefetch pass may issue software prefetch
>>>> hints for. There are two motivations:
>>>>
>>>> * Make the pass less aggressive, only issuing prefetch hints for bigger
>>>> strides
>>>> that are more likely to benefit from prefetching. I've noticed a case in
>>>> cpu2017
>>>> where we were issuing thousands of hints, for example.
>>>>
>>>> * For processors that have a hardware prefetcher, like Falkor, it allows
>>>> the
>>>> loop prefetch pass to defer prefetching of smaller (less than the
>>>> threshold)
>>>> strides to the hardware prefetcher instead. This prevents conflicts
>>>> between
>>>> the software prefetcher and the hardware prefetcher.
>>>>
>>>> I've noticed considerable reduction in the number of prefetch hints and
>>>> slightly positive performance numbers. This aligns GCC and LLVM in terms
>>>> of
>>>> prefetch behavior for Falkor.
>>>>
>>>> The default settings should guarantee no changes for existing targets.
>>>> Those
>>>> are free to tweak the settings as necessary.
>>>>
>>>> No regressions in the testsuite and bootstrapped ok on aarch64-linux.
>>>>
>>>> Ok?
>>>>
>>>> 2018-01-22  Luis Machado  <luis.machado@linaro.org>
>>>>
>>>>          Introduce option to limit software prefetching to known constant
>>>>          strides above a specific threshold with the goal of preventing
>>>>          conflicts with a hardware prefetcher.
>>>>
>>>>          gcc/
>>>>          * config/aarch64/aarch64-protos.h (cpu_prefetch_tune)
>>>>          <minimum_stride>: New const int field.
>>>>          * config/aarch64/aarch64.c (generic_prefetch_tune): Update to
>>>> include
>>>>          minimum_stride field.
>>>>          (exynosm1_prefetch_tune): Likewise.
>>>>          (thunderxt88_prefetch_tune): Likewise.
>>>>          (thunderx_prefetch_tune): Likewise.
>>>>          (thunderx2t99_prefetch_tune): Likewise.
>>>>          (qdf24xx_prefetch_tune): Likewise. Set minimum_stride to 2048.
>>>>          (aarch64_override_options_internal): Update to set
>>>>          PARAM_PREFETCH_MINIMUM_STRIDE.
>>>>          * doc/invoke.texi (prefetch-minimum-stride): Document new option.
>>>>          * params.def (PARAM_PREFETCH_MINIMUM_STRIDE): New.
>>>>          * params.h (PARAM_PREFETCH_MINIMUM_STRIDE): Define.
>>>>          * tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Return
>>>> false if
>>>>          stride is constant and is below the minimum stride threshold.
>>>
>>> OK for the trunk.
>>> jeff
>>>
>>
>> Thanks. Committed as revision 259995 now.
> 
> This breaks bootstrap on x86:
> 
> ../../src-trunk/gcc/tree-ssa-loop-prefetch.c: In function ‘bool
> should_issue_prefetch_p(mem_ref*)’:
> ../../src-trunk/gcc/tree-ssa-loop-prefetch.c:1010:54: error:
> comparison of integer expressions of different signedness: ‘long long
> unsigned int’ and ‘int’ [-Werror=sign-compare]
>         && absu_hwi (int_cst_value (ref->group->step)) < PREFETCH_MINIMUM_STRIDE)
> ../../src-trunk/gcc/tree-ssa-loop-prefetch.c:1014:4: error: format
> ‘%d’ expects argument of type ‘int’, but argument 5 has type ‘long
> long int’ [-Werror=format=]
>      "Step for reference %u:%u (%d) is less than the mininum "
>      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>      "required stride of %d\n",
>      ~~~~~~~~~~~~~~~~~~~~~~~~~
>      ref->group->uid, ref->uid, int_cst_value (ref->group->step),
>                                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 

I've reverted this for now while i address the bootstrap problem.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] Introduce prefetch-minimum stride option
  2018-01-22 14:01 ` [PATCH 1/2] Introduce prefetch-minimum stride option Luis Machado
  2018-01-23  9:46   ` Kyrill Tkachov
  2018-05-01 18:30   ` Jeff Law
@ 2018-05-14 21:21   ` Luis Machado
  2018-05-15  9:59     ` Kyrill Tkachov
  2 siblings, 1 reply; 28+ messages in thread
From: Luis Machado @ 2018-05-14 21:21 UTC (permalink / raw)
  To: gcc-patches
  Cc: james.greenhalgh, Richard.Earnshaw, Kyrill Tkachov, H.J. Lu, Jeff Law

[-- Attachment #1: Type: text/plain, Size: 1563 bytes --]

Hi,

Here's an updated version of the patch (now reverted) that addresses the 
previous bootstrap problem (signedness and long long/int conversion).

I've checked that it bootstraps properly on both aarch64-linux and 
x86_64-linux and that tests look sane.

James, would you please give this one a try to see if you can still 
reproduce PR85682? I couldn't reproduce it in multiple attempts.

Thanks,
Luis

On 01/22/2018 11:46 AM, Luis Machado wrote:
> This patch adds a new option to control the minimum stride, for a memory
> reference, after which the loop prefetch pass may issue software prefetch
> hints for. There are two motivations:
> 
> * Make the pass less aggressive, only issuing prefetch hints for bigger strides
> that are more likely to benefit from prefetching. I've noticed a case in cpu2017
> where we were issuing thousands of hints, for example.
> 
> * For processors that have a hardware prefetcher, like Falkor, it allows the
> loop prefetch pass to defer prefetching of smaller (less than the threshold)
> strides to the hardware prefetcher instead. This prevents conflicts between
> the software prefetcher and the hardware prefetcher.
> 
> I've noticed considerable reduction in the number of prefetch hints and
> slightly positive performance numbers. This aligns GCC and LLVM in terms of
> prefetch behavior for Falkor.
> 
> The default settings should guarantee no changes for existing targets. Those
> are free to tweak the settings as necessary.
> 
> No regressions in the testsuite and bootstrapped ok on aarch64-linux.
> 
> Ok?



[-- Attachment #2: 0001-Introduce-prefetch-minimum-stride-option.patch --]
[-- Type: text/x-patch, Size: 8800 bytes --]

From e0207950a6d7793cdaceaa71fc5ada05a93dc1b3 Mon Sep 17 00:00:00 2001
From: Luis Machado <luis.machado@linaro.org>
Date: Thu, 21 Dec 2017 15:23:59 -0200
Subject: [PATCH 1/2] Introduce prefetch-minimum stride option

This patch adds a new option to control the minimum stride, for a memory
reference, after which the loop prefetch pass may issue software prefetch
hints for. There are two motivations:

* Make the pass less aggressive, only issuing prefetch hints for bigger strides
that are more likely to benefit from prefetching. I've noticed a case in cpu2017
where we were issuing thousands of hints, for example.

* For processors that have a hardware prefetcher, like Falkor, it allows the
loop prefetch pass to defer prefetching of smaller (less than the threshold)
strides to the hardware prefetcher instead. This prevents conflicts between
the software prefetcher and the hardware prefetcher.

I've noticed considerable reduction in the number of prefetch hints and
slightly positive performance numbers. This aligns GCC and LLVM in terms of
prefetch behavior for Falkor.

The default settings should guarantee no changes for existing targets. Those
are free to tweak the settings as necessary.

No regressions in the testsuite and bootstrapped ok on aarch64-linux and
x86_64-linux.

Ok?

2018-05-14  Luis Machado  <luis.machado@linaro.org>

	Introduce option to limit software prefetching to known constant
	strides above a specific threshold with the goal of preventing
	conflicts with a hardware prefetcher.

	gcc/
	* config/aarch64/aarch64-protos.h (cpu_prefetch_tune)
	<minimum_stride>: New const int field.
	* config/aarch64/aarch64.c (generic_prefetch_tune): Update to include
	minimum_stride field.
	(exynosm1_prefetch_tune): Likewise.
	(thunderxt88_prefetch_tune): Likewise.
	(thunderx_prefetch_tune): Likewise.
	(thunderx2t99_prefetch_tune): Likewise.
	(qdf24xx_prefetch_tune): Likewise. Set minimum_stride to 2048.
	<default_opt_level>: Set to 3.
	(aarch64_override_options_internal): Update to set
	PARAM_PREFETCH_MINIMUM_STRIDE.
	* doc/invoke.texi (prefetch-minimum-stride): Document new option.
	* params.def (PARAM_PREFETCH_MINIMUM_STRIDE): New.
	* params.h (PARAM_PREFETCH_MINIMUM_STRIDE): Define.
	* tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Return false if
	stride is constant and is below the minimum stride threshold.
---
 gcc/config/aarch64/aarch64-protos.h |  3 +++
 gcc/config/aarch64/aarch64.c        | 13 ++++++++++++-
 gcc/doc/invoke.texi                 | 15 +++++++++++++++
 gcc/params.def                      |  9 +++++++++
 gcc/params.h                        |  2 ++
 gcc/tree-ssa-loop-prefetch.c        | 17 +++++++++++++++++
 6 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index cda2895..5d3b9d7 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -230,6 +230,9 @@ struct cpu_prefetch_tune
   const int l1_cache_size;
   const int l1_cache_line_size;
   const int l2_cache_size;
+  /* The minimum constant stride beyond which we should use prefetch
+     hints for.  */
+  const int minimum_stride;
   const int default_opt_level;
 };
 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index a2003fe..5215deb 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -547,6 +547,7 @@ static const cpu_prefetch_tune generic_prefetch_tune =
   -1,			/* l1_cache_size  */
   -1,			/* l1_cache_line_size  */
   -1,			/* l2_cache_size  */
+  -1,			/* minimum_stride */
   -1			/* default_opt_level  */
 };
 
@@ -556,6 +557,7 @@ static const cpu_prefetch_tune exynosm1_prefetch_tune =
   -1,			/* l1_cache_size  */
   64,			/* l1_cache_line_size  */
   -1,			/* l2_cache_size  */
+  -1,			/* minimum_stride */
   -1			/* default_opt_level  */
 };
 
@@ -565,7 +567,8 @@ static const cpu_prefetch_tune qdf24xx_prefetch_tune =
   32,			/* l1_cache_size  */
   64,			/* l1_cache_line_size  */
   512,			/* l2_cache_size  */
-  -1			/* default_opt_level  */
+  2048,			/* minimum_stride */
+  3			/* default_opt_level  */
 };
 
 static const cpu_prefetch_tune thunderxt88_prefetch_tune =
@@ -574,6 +577,7 @@ static const cpu_prefetch_tune thunderxt88_prefetch_tune =
   32,			/* l1_cache_size  */
   128,			/* l1_cache_line_size  */
   16*1024,		/* l2_cache_size  */
+  -1,			/* minimum_stride */
   3			/* default_opt_level  */
 };
 
@@ -583,6 +587,7 @@ static const cpu_prefetch_tune thunderx_prefetch_tune =
   32,			/* l1_cache_size  */
   128,			/* l1_cache_line_size  */
   -1,			/* l2_cache_size  */
+  -1,			/* minimum_stride */
   -1			/* default_opt_level  */
 };
 
@@ -592,6 +597,7 @@ static const cpu_prefetch_tune thunderx2t99_prefetch_tune =
   32,			/* l1_cache_size  */
   64,			/* l1_cache_line_size  */
   256,			/* l2_cache_size  */
+  -1,			/* minimum_stride */
   -1			/* default_opt_level  */
 };
 
@@ -10617,6 +10623,11 @@ aarch64_override_options_internal (struct gcc_options *opts)
 			   aarch64_tune_params.prefetch->l2_cache_size,
 			   opts->x_param_values,
 			   global_options_set.x_param_values);
+  if (aarch64_tune_params.prefetch->minimum_stride >= 0)
+    maybe_set_param_value (PARAM_PREFETCH_MINIMUM_STRIDE,
+			   aarch64_tune_params.prefetch->minimum_stride,
+			   opts->x_param_values,
+			   global_options_set.x_param_values);
 
   /* Use the alternative scheduling-pressure algorithm by default.  */
   maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, SCHED_PRESSURE_MODEL,
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 6019e1f..c92153e 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -10718,6 +10718,21 @@ The size of L1 cache, in kilobytes.
 @item l2-cache-size
 The size of L2 cache, in kilobytes.
 
+@item prefetch-minimum-stride
+Minimum constant stride, in bytes, to start using prefetch hints for.  If
+the stride is less than this threshold, prefetch hints will not be issued.
+
+This setting is useful for processors that have hardware prefetchers, in
+which case there may be conflicts between the hardware prefetchers and
+the software prefetchers.  If the hardware prefetchers have a maximum
+stride they can handle, it should be used here to improve the use of
+software prefetchers.
+
+A value of -1, the default, means we don't have a threshold and therefore
+prefetch hints can be issued for any constant stride.
+
+This setting is only useful for strides that are known and constant.
+
 @item loop-interchange-max-num-stmts
 The maximum number of stmts in a loop to be interchanged.
 
diff --git a/gcc/params.def b/gcc/params.def
index dad47ec..2166deb 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -790,6 +790,15 @@ DEFPARAM (PARAM_L2_CACHE_SIZE,
 	  "The size of L2 cache.",
 	  512, 0, 0)
 
+/* The minimum constant stride beyond which we should use prefetch hints
+   for.  */
+
+DEFPARAM (PARAM_PREFETCH_MINIMUM_STRIDE,
+	  "prefetch-minimum-stride",
+	  "The minimum constant stride beyond which we should use prefetch "
+	  "hints for.",
+	  -1, 0, 0)
+
 /* Maximum number of statements in loop nest for loop interchange.  */
 
 DEFPARAM (PARAM_LOOP_INTERCHANGE_MAX_NUM_STMTS,
diff --git a/gcc/params.h b/gcc/params.h
index 98249d2..96012db 100644
--- a/gcc/params.h
+++ b/gcc/params.h
@@ -196,6 +196,8 @@ extern void init_param_values (int *params);
   PARAM_VALUE (PARAM_L1_CACHE_LINE_SIZE)
 #define L2_CACHE_SIZE \
   PARAM_VALUE (PARAM_L2_CACHE_SIZE)
+#define PREFETCH_MINIMUM_STRIDE \
+  PARAM_VALUE (PARAM_PREFETCH_MINIMUM_STRIDE)
 #define USE_CANONICAL_TYPES \
   PARAM_VALUE (PARAM_USE_CANONICAL_TYPES)
 #define IRA_MAX_LOOPS_NUM \
diff --git a/gcc/tree-ssa-loop-prefetch.c b/gcc/tree-ssa-loop-prefetch.c
index 2f10db1..e9d8e5f 100644
--- a/gcc/tree-ssa-loop-prefetch.c
+++ b/gcc/tree-ssa-loop-prefetch.c
@@ -992,6 +992,23 @@ prune_by_reuse (struct mem_ref_group *groups)
 static bool
 should_issue_prefetch_p (struct mem_ref *ref)
 {
+  /* Some processors may have a hardware prefetcher that may conflict with
+     prefetch hints for a range of strides.  Make sure we don't issue
+     prefetches for such cases if the stride is within this particular
+     range.  */
+  if (cst_and_fits_in_hwi (ref->group->step)
+      && abs_hwi (int_cst_value (ref->group->step)) <
+	  (HOST_WIDE_INT) PREFETCH_MINIMUM_STRIDE)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file,
+		 "Step for reference %u:%u (%ld) is less than the mininum "
+		 "required stride of %d\n",
+		 ref->group->uid, ref->uid, int_cst_value (ref->group->step),
+		 PREFETCH_MINIMUM_STRIDE);
+      return false;
+    }
+
   /* For now do not issue prefetches for only first few of the
      iterations.  */
   if (ref->prefetch_before != PREFETCH_ALL)
-- 
2.7.4


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] Introduce prefetch-minimum stride option
  2018-05-14 21:21   ` Luis Machado
@ 2018-05-15  9:59     ` Kyrill Tkachov
  2018-05-15 11:21       ` Luis Machado
  0 siblings, 1 reply; 28+ messages in thread
From: Kyrill Tkachov @ 2018-05-15  9:59 UTC (permalink / raw)
  To: Luis Machado, gcc-patches
  Cc: james.greenhalgh, Richard.Earnshaw, H.J. Lu, Jeff Law

Hi Luis,

On 14/05/18 22:18, Luis Machado wrote:
> Hi,
>
> Here's an updated version of the patch (now reverted) that addresses the previous bootstrap problem (signedness and long long/int conversion).
>
> I've checked that it bootstraps properly on both aarch64-linux and x86_64-linux and that tests look sane.
>
> James, would you please give this one a try to see if you can still reproduce PR85682? I couldn't reproduce it in multiple attempts.
>

The patch doesn't hit the regressions in PR85682 from what I can see.
I have a comment on the patch below.

> Thanks,
> Luis
>
> On 01/22/2018 11:46 AM, Luis Machado wrote:
>> This patch adds a new option to control the minimum stride, for a memory
>> reference, after which the loop prefetch pass may issue software prefetch
>> hints for. There are two motivations:
>>
>> * Make the pass less aggressive, only issuing prefetch hints for bigger strides
>> that are more likely to benefit from prefetching. I've noticed a case in cpu2017
>> where we were issuing thousands of hints, for example.
>>
>> * For processors that have a hardware prefetcher, like Falkor, it allows the
>> loop prefetch pass to defer prefetching of smaller (less than the threshold)
>> strides to the hardware prefetcher instead. This prevents conflicts between
>> the software prefetcher and the hardware prefetcher.
>>
>> I've noticed considerable reduction in the number of prefetch hints and
>> slightly positive performance numbers. This aligns GCC and LLVM in terms of
>> prefetch behavior for Falkor.
>>
>> The default settings should guarantee no changes for existing targets. Those
>> are free to tweak the settings as necessary.
>>
>> No regressions in the testsuite and bootstrapped ok on aarch64-linux.
>>
>> Ok?
>
>

--- a/gcc/tree-ssa-loop-prefetch.c
+++ b/gcc/tree-ssa-loop-prefetch.c
@@ -992,6 +992,23 @@ prune_by_reuse (struct mem_ref_group *groups)
  static bool
  should_issue_prefetch_p (struct mem_ref *ref)
  {
+  /* Some processors may have a hardware prefetcher that may conflict with
+     prefetch hints for a range of strides.  Make sure we don't issue
+     prefetches for such cases if the stride is within this particular
+     range.  */
+  if (cst_and_fits_in_hwi (ref->group->step)
+      && abs_hwi (int_cst_value (ref->group->step)) <
+	  (HOST_WIDE_INT) PREFETCH_MINIMUM_STRIDE)
+    {

The '<' should go on the line below together with PREFETCH_MINIMUM_STRIDE.

Thanks,
Kyrill


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] Introduce prefetch-minimum stride option
  2018-05-15  9:59     ` Kyrill Tkachov
@ 2018-05-15 11:21       ` Luis Machado
  2018-05-16  9:22         ` Kyrill Tkachov
  0 siblings, 1 reply; 28+ messages in thread
From: Luis Machado @ 2018-05-15 11:21 UTC (permalink / raw)
  To: Kyrill Tkachov, gcc-patches
  Cc: james.greenhalgh, Richard.Earnshaw, H.J. Lu, Jeff Law

Hi,

On 05/15/2018 06:37 AM, Kyrill Tkachov wrote:
> Hi Luis,
> 
> On 14/05/18 22:18, Luis Machado wrote:
>> Hi,
>>
>> Here's an updated version of the patch (now reverted) that addresses 
>> the previous bootstrap problem (signedness and long long/int conversion).
>>
>> I've checked that it bootstraps properly on both aarch64-linux and 
>> x86_64-linux and that tests look sane.
>>
>> James, would you please give this one a try to see if you can still 
>> reproduce PR85682? I couldn't reproduce it in multiple attempts.
>>
> 
> The patch doesn't hit the regressions in PR85682 from what I can see.
> I have a comment on the patch below.
> 

Great. Thanks for checking Kyrill.

> --- a/gcc/tree-ssa-loop-prefetch.c
> +++ b/gcc/tree-ssa-loop-prefetch.c
> @@ -992,6 +992,23 @@ prune_by_reuse (struct mem_ref_group *groups)
>   static bool
>   should_issue_prefetch_p (struct mem_ref *ref)
>   {
> +  /* Some processors may have a hardware prefetcher that may conflict with
> +     prefetch hints for a range of strides.  Make sure we don't issue
> +     prefetches for such cases if the stride is within this particular
> +     range.  */
> +  if (cst_and_fits_in_hwi (ref->group->step)
> +      && abs_hwi (int_cst_value (ref->group->step)) <
> +      (HOST_WIDE_INT) PREFETCH_MINIMUM_STRIDE)
> +    {
> 
> The '<' should go on the line below together with PREFETCH_MINIMUM_STRIDE.

I've fixed this locally now.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] Introduce prefetch-minimum stride option
  2018-05-15 11:21       ` Luis Machado
@ 2018-05-16  9:22         ` Kyrill Tkachov
  2018-05-16 11:53           ` Luis Machado
  0 siblings, 1 reply; 28+ messages in thread
From: Kyrill Tkachov @ 2018-05-16  9:22 UTC (permalink / raw)
  To: Luis Machado, gcc-patches
  Cc: james.greenhalgh, Richard.Earnshaw, H.J. Lu, Jeff Law


On 15/05/18 12:12, Luis Machado wrote:
> Hi,
>
> On 05/15/2018 06:37 AM, Kyrill Tkachov wrote:
>> Hi Luis,
>>
>> On 14/05/18 22:18, Luis Machado wrote:
>>> Hi,
>>>
>>> Here's an updated version of the patch (now reverted) that addresses the previous bootstrap problem (signedness and long long/int conversion).
>>>
>>> I've checked that it bootstraps properly on both aarch64-linux and x86_64-linux and that tests look sane.
>>>
>>> James, would you please give this one a try to see if you can still reproduce PR85682? I couldn't reproduce it in multiple attempts.
>>>
>>
>> The patch doesn't hit the regressions in PR85682 from what I can see.
>> I have a comment on the patch below.
>>
>
> Great. Thanks for checking Kyrill.
>
>> --- a/gcc/tree-ssa-loop-prefetch.c
>> +++ b/gcc/tree-ssa-loop-prefetch.c
>> @@ -992,6 +992,23 @@ prune_by_reuse (struct mem_ref_group *groups)
>>   static bool
>>   should_issue_prefetch_p (struct mem_ref *ref)
>>   {
>> +  /* Some processors may have a hardware prefetcher that may conflict with
>> +     prefetch hints for a range of strides.  Make sure we don't issue
>> +     prefetches for such cases if the stride is within this particular
>> +     range.  */
>> +  if (cst_and_fits_in_hwi (ref->group->step)
>> +      && abs_hwi (int_cst_value (ref->group->step)) <
>> +      (HOST_WIDE_INT) PREFETCH_MINIMUM_STRIDE)
>> +    {
>>
>> The '<' should go on the line below together with PREFETCH_MINIMUM_STRIDE.
>
> I've fixed this locally now.

Thanks. I haven't followed the patch in detail, are you looking for midend changes approval since the last version?
Or do you need aarch64 approval?

Kyrill

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] Introduce prefetch-minimum stride option
  2018-05-16  9:22         ` Kyrill Tkachov
@ 2018-05-16 11:53           ` Luis Machado
  2018-05-22 18:56             ` Luis Machado
  0 siblings, 1 reply; 28+ messages in thread
From: Luis Machado @ 2018-05-16 11:53 UTC (permalink / raw)
  To: Kyrill Tkachov, gcc-patches
  Cc: james.greenhalgh, Richard.Earnshaw, H.J. Lu, Jeff Law



On 05/16/2018 06:08 AM, Kyrill Tkachov wrote:
> 
> On 15/05/18 12:12, Luis Machado wrote:
>> Hi,
>>
>> On 05/15/2018 06:37 AM, Kyrill Tkachov wrote:
>>> Hi Luis,
>>>
>>> On 14/05/18 22:18, Luis Machado wrote:
>>>> Hi,
>>>>
>>>> Here's an updated version of the patch (now reverted) that addresses 
>>>> the previous bootstrap problem (signedness and long long/int 
>>>> conversion).
>>>>
>>>> I've checked that it bootstraps properly on both aarch64-linux and 
>>>> x86_64-linux and that tests look sane.
>>>>
>>>> James, would you please give this one a try to see if you can still 
>>>> reproduce PR85682? I couldn't reproduce it in multiple attempts.
>>>>
>>>
>>> The patch doesn't hit the regressions in PR85682 from what I can see.
>>> I have a comment on the patch below.
>>>
>>
>> Great. Thanks for checking Kyrill.
>>
>>> --- a/gcc/tree-ssa-loop-prefetch.c
>>> +++ b/gcc/tree-ssa-loop-prefetch.c
>>> @@ -992,6 +992,23 @@ prune_by_reuse (struct mem_ref_group *groups)
>>>   static bool
>>>   should_issue_prefetch_p (struct mem_ref *ref)
>>>   {
>>> +  /* Some processors may have a hardware prefetcher that may 
>>> conflict with
>>> +     prefetch hints for a range of strides.  Make sure we don't issue
>>> +     prefetches for such cases if the stride is within this particular
>>> +     range.  */
>>> +  if (cst_and_fits_in_hwi (ref->group->step)
>>> +      && abs_hwi (int_cst_value (ref->group->step)) <
>>> +      (HOST_WIDE_INT) PREFETCH_MINIMUM_STRIDE)
>>> +    {
>>>
>>> The '<' should go on the line below together with 
>>> PREFETCH_MINIMUM_STRIDE.
>>
>> I've fixed this locally now.
> 
> Thanks. I haven't followed the patch in detail, are you looking for 
> midend changes approval since the last version?
> Or do you need aarch64 approval?

The changes are not substantial, but midend approval i what i was aiming at.

Also the confirmation that PR85682 is no longer happening.

Luis

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] Introduce prefetch-minimum stride option
  2018-05-16 11:53           ` Luis Machado
@ 2018-05-22 18:56             ` Luis Machado
  2018-05-22 21:21               ` Jeff Law
  2018-05-23 20:27               ` H.J. Lu
  0 siblings, 2 replies; 28+ messages in thread
From: Luis Machado @ 2018-05-22 18:56 UTC (permalink / raw)
  To: Kyrill Tkachov, gcc-patches
  Cc: james.greenhalgh, Richard.Earnshaw, H.J. Lu, Jeff Law



On 05/16/2018 08:18 AM, Luis Machado wrote:
> 
> 
> On 05/16/2018 06:08 AM, Kyrill Tkachov wrote:
>>
>> On 15/05/18 12:12, Luis Machado wrote:
>>> Hi,
>>>
>>> On 05/15/2018 06:37 AM, Kyrill Tkachov wrote:
>>>> Hi Luis,
>>>>
>>>> On 14/05/18 22:18, Luis Machado wrote:
>>>>> Hi,
>>>>>
>>>>> Here's an updated version of the patch (now reverted) that 
>>>>> addresses the previous bootstrap problem (signedness and long 
>>>>> long/int conversion).
>>>>>
>>>>> I've checked that it bootstraps properly on both aarch64-linux and 
>>>>> x86_64-linux and that tests look sane.
>>>>>
>>>>> James, would you please give this one a try to see if you can still 
>>>>> reproduce PR85682? I couldn't reproduce it in multiple attempts.
>>>>>
>>>>
>>>> The patch doesn't hit the regressions in PR85682 from what I can see.
>>>> I have a comment on the patch below.
>>>>
>>>
>>> Great. Thanks for checking Kyrill.
>>>
>>>> --- a/gcc/tree-ssa-loop-prefetch.c
>>>> +++ b/gcc/tree-ssa-loop-prefetch.c
>>>> @@ -992,6 +992,23 @@ prune_by_reuse (struct mem_ref_group *groups)
>>>>   static bool
>>>>   should_issue_prefetch_p (struct mem_ref *ref)
>>>>   {
>>>> +  /* Some processors may have a hardware prefetcher that may 
>>>> conflict with
>>>> +     prefetch hints for a range of strides.  Make sure we don't issue
>>>> +     prefetches for such cases if the stride is within this particular
>>>> +     range.  */
>>>> +  if (cst_and_fits_in_hwi (ref->group->step)
>>>> +      && abs_hwi (int_cst_value (ref->group->step)) <
>>>> +      (HOST_WIDE_INT) PREFETCH_MINIMUM_STRIDE)
>>>> +    {
>>>>
>>>> The '<' should go on the line below together with 
>>>> PREFETCH_MINIMUM_STRIDE.
>>>
>>> I've fixed this locally now.
>>
>> Thanks. I haven't followed the patch in detail, are you looking for 
>> midend changes approval since the last version?
>> Or do you need aarch64 approval?
> 
> The changes are not substantial, but midend approval i what i was aiming 
> at.
> 
> Also the confirmation that PR85682 is no longer happening.

James confirmed PR85682 is no longer reproducible with the updated patch 
and the bootstrap issue is fixed now. So i take it this should be OK to 
push to mainline?

Also, i'd like to discuss the possibility of having these couple options 
backported to GCC 8. As is, the changes don't alter code generation by 
default, but they allow better tuning of the software prefetcher for 
targets that benefit from it.

Maybe after letting the changes bake on mainline enough to be confirmed 
stable?

Thanks,
Luis

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] Introduce prefetch-minimum stride option
  2018-05-22 18:56             ` Luis Machado
@ 2018-05-22 21:21               ` Jeff Law
  2018-05-23 20:27               ` H.J. Lu
  1 sibling, 0 replies; 28+ messages in thread
From: Jeff Law @ 2018-05-22 21:21 UTC (permalink / raw)
  To: Luis Machado, Kyrill Tkachov, gcc-patches
  Cc: james.greenhalgh, Richard.Earnshaw, H.J. Lu

On 05/22/2018 12:55 PM, Luis Machado wrote:
> 
> 
> On 05/16/2018 08:18 AM, Luis Machado wrote:
>>
>>
>> On 05/16/2018 06:08 AM, Kyrill Tkachov wrote:
>>>
>>> On 15/05/18 12:12, Luis Machado wrote:
>>>> Hi,
>>>>
>>>> On 05/15/2018 06:37 AM, Kyrill Tkachov wrote:
>>>>> Hi Luis,
>>>>>
>>>>> On 14/05/18 22:18, Luis Machado wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Here's an updated version of the patch (now reverted) that
>>>>>> addresses the previous bootstrap problem (signedness and long
>>>>>> long/int conversion).
>>>>>>
>>>>>> I've checked that it bootstraps properly on both aarch64-linux and
>>>>>> x86_64-linux and that tests look sane.
>>>>>>
>>>>>> James, would you please give this one a try to see if you can
>>>>>> still reproduce PR85682? I couldn't reproduce it in multiple
>>>>>> attempts.
>>>>>>
>>>>>
>>>>> The patch doesn't hit the regressions in PR85682 from what I can see.
>>>>> I have a comment on the patch below.
>>>>>
>>>>
>>>> Great. Thanks for checking Kyrill.
>>>>
>>>>> --- a/gcc/tree-ssa-loop-prefetch.c
>>>>> +++ b/gcc/tree-ssa-loop-prefetch.c
>>>>> @@ -992,6 +992,23 @@ prune_by_reuse (struct mem_ref_group *groups)
>>>>>   static bool
>>>>>   should_issue_prefetch_p (struct mem_ref *ref)
>>>>>   {
>>>>> +  /* Some processors may have a hardware prefetcher that may
>>>>> conflict with
>>>>> +     prefetch hints for a range of strides.  Make sure we don't issue
>>>>> +     prefetches for such cases if the stride is within this
>>>>> particular
>>>>> +     range.  */
>>>>> +  if (cst_and_fits_in_hwi (ref->group->step)
>>>>> +      && abs_hwi (int_cst_value (ref->group->step)) <
>>>>> +      (HOST_WIDE_INT) PREFETCH_MINIMUM_STRIDE)
>>>>> +    {
>>>>>
>>>>> The '<' should go on the line below together with
>>>>> PREFETCH_MINIMUM_STRIDE.
>>>>
>>>> I've fixed this locally now.
>>>
>>> Thanks. I haven't followed the patch in detail, are you looking for
>>> midend changes approval since the last version?
>>> Or do you need aarch64 approval?
>>
>> The changes are not substantial, but midend approval i what i was
>> aiming at.
>>
>> Also the confirmation that PR85682 is no longer happening.
> 
> James confirmed PR85682 is no longer reproducible with the updated patch
> and the bootstrap issue is fixed now. So i take it this should be OK to
> push to mainline?
> 
> Also, i'd like to discuss the possibility of having these couple options
> backported to GCC 8. As is, the changes don't alter code generation by
> default, but they allow better tuning of the software prefetcher for
> targets that benefit from it.
> 
> Maybe after letting the changes bake on mainline enough to be confirmed
> stable?
OK for the trunk.  But they don't really seem appropriate for the
release branches.  We're primarily concerned with correctness issues on
the release branches.

jeff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] Introduce prefetch-minimum stride option
  2018-05-22 18:56             ` Luis Machado
  2018-05-22 21:21               ` Jeff Law
@ 2018-05-23 20:27               ` H.J. Lu
  2018-05-23 22:34                 ` Luis Machado
  1 sibling, 1 reply; 28+ messages in thread
From: H.J. Lu @ 2018-05-23 20:27 UTC (permalink / raw)
  To: Luis Machado
  Cc: Kyrill Tkachov, GCC Patches, James Greenhalgh, Richard Earnshaw,
	Jeff Law

On Tue, May 22, 2018 at 11:55 AM, Luis Machado <luis.machado@linaro.org> wrote:
>
>
> On 05/16/2018 08:18 AM, Luis Machado wrote:
>>
>>
>>
>> On 05/16/2018 06:08 AM, Kyrill Tkachov wrote:
>>>
>>>
>>> On 15/05/18 12:12, Luis Machado wrote:
>>>>
>>>> Hi,
>>>>
>>>> On 05/15/2018 06:37 AM, Kyrill Tkachov wrote:
>>>>>
>>>>> Hi Luis,
>>>>>
>>>>> On 14/05/18 22:18, Luis Machado wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Here's an updated version of the patch (now reverted) that addresses
>>>>>> the previous bootstrap problem (signedness and long long/int conversion).
>>>>>>
>>>>>> I've checked that it bootstraps properly on both aarch64-linux and
>>>>>> x86_64-linux and that tests look sane.
>>>>>>
>>>>>> James, would you please give this one a try to see if you can still
>>>>>> reproduce PR85682? I couldn't reproduce it in multiple attempts.
>>>>>>
>>>>>
>>>>> The patch doesn't hit the regressions in PR85682 from what I can see.
>>>>> I have a comment on the patch below.
>>>>>
>>>>
>>>> Great. Thanks for checking Kyrill.
>>>>
>>>>> --- a/gcc/tree-ssa-loop-prefetch.c
>>>>> +++ b/gcc/tree-ssa-loop-prefetch.c
>>>>> @@ -992,6 +992,23 @@ prune_by_reuse (struct mem_ref_group *groups)
>>>>>   static bool
>>>>>   should_issue_prefetch_p (struct mem_ref *ref)
>>>>>   {
>>>>> +  /* Some processors may have a hardware prefetcher that may conflict
>>>>> with
>>>>> +     prefetch hints for a range of strides.  Make sure we don't issue
>>>>> +     prefetches for such cases if the stride is within this particular
>>>>> +     range.  */
>>>>> +  if (cst_and_fits_in_hwi (ref->group->step)
>>>>> +      && abs_hwi (int_cst_value (ref->group->step)) <
>>>>> +      (HOST_WIDE_INT) PREFETCH_MINIMUM_STRIDE)
>>>>> +    {
>>>>>
>>>>> The '<' should go on the line below together with
>>>>> PREFETCH_MINIMUM_STRIDE.
>>>>
>>>>
>>>> I've fixed this locally now.
>>>
>>>
>>> Thanks. I haven't followed the patch in detail, are you looking for
>>> midend changes approval since the last version?
>>> Or do you need aarch64 approval?
>>
>>
>> The changes are not substantial, but midend approval i what i was aiming
>> at.
>>
>> Also the confirmation that PR85682 is no longer happening.
>
>
> James confirmed PR85682 is no longer reproducible with the updated patch and
> the bootstrap issue is fixed now. So i take it this should be OK to push to
> mainline?
>
> Also, i'd like to discuss the possibility of having these couple options
> backported to GCC 8. As is, the changes don't alter code generation by
> default, but they allow better tuning of the software prefetcher for targets
> that benefit from it.
>
> Maybe after letting the changes bake on mainline enough to be confirmed
> stable?

It breaks GCC bootstrap on i686:

../../src-trunk/gcc/tree-ssa-loop-prefetch.c: In function ‘bool
should_issue_prefetch_p(mem_ref*)’:
../../src-trunk/gcc/tree-ssa-loop-prefetch.c:1015:4: error: format
‘%ld’ expects argument of type ‘long int’, but argument 5 has type
‘long long int’ [-Werror=format=]
    "Step for reference %u:%u (%ld) is less than the mininum "
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    "required stride of %d\n",
    ~~~~~~~~~~~~~~~~~~~~~~~~~
    ref->group->uid, ref->uid, int_cst_value (ref->group->step),
                               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-- 
H.J.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] Introduce prefetch-minimum stride option
  2018-05-23 20:27               ` H.J. Lu
@ 2018-05-23 22:34                 ` Luis Machado
  2018-05-23 22:41                   ` H.J. Lu
  0 siblings, 1 reply; 28+ messages in thread
From: Luis Machado @ 2018-05-23 22:34 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Kyrill Tkachov, GCC Patches, James Greenhalgh, Richard Earnshaw,
	Jeff Law

[-- Attachment #1: Type: text/plain, Size: 3592 bytes --]



On 05/23/2018 05:01 PM, H.J. Lu wrote:
> On Tue, May 22, 2018 at 11:55 AM, Luis Machado <luis.machado@linaro.org> wrote:
>>
>>
>> On 05/16/2018 08:18 AM, Luis Machado wrote:
>>>
>>>
>>>
>>> On 05/16/2018 06:08 AM, Kyrill Tkachov wrote:
>>>>
>>>>
>>>> On 15/05/18 12:12, Luis Machado wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> On 05/15/2018 06:37 AM, Kyrill Tkachov wrote:
>>>>>>
>>>>>> Hi Luis,
>>>>>>
>>>>>> On 14/05/18 22:18, Luis Machado wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Here's an updated version of the patch (now reverted) that addresses
>>>>>>> the previous bootstrap problem (signedness and long long/int conversion).
>>>>>>>
>>>>>>> I've checked that it bootstraps properly on both aarch64-linux and
>>>>>>> x86_64-linux and that tests look sane.
>>>>>>>
>>>>>>> James, would you please give this one a try to see if you can still
>>>>>>> reproduce PR85682? I couldn't reproduce it in multiple attempts.
>>>>>>>
>>>>>>
>>>>>> The patch doesn't hit the regressions in PR85682 from what I can see.
>>>>>> I have a comment on the patch below.
>>>>>>
>>>>>
>>>>> Great. Thanks for checking Kyrill.
>>>>>
>>>>>> --- a/gcc/tree-ssa-loop-prefetch.c
>>>>>> +++ b/gcc/tree-ssa-loop-prefetch.c
>>>>>> @@ -992,6 +992,23 @@ prune_by_reuse (struct mem_ref_group *groups)
>>>>>>    static bool
>>>>>>    should_issue_prefetch_p (struct mem_ref *ref)
>>>>>>    {
>>>>>> +  /* Some processors may have a hardware prefetcher that may conflict
>>>>>> with
>>>>>> +     prefetch hints for a range of strides.  Make sure we don't issue
>>>>>> +     prefetches for such cases if the stride is within this particular
>>>>>> +     range.  */
>>>>>> +  if (cst_and_fits_in_hwi (ref->group->step)
>>>>>> +      && abs_hwi (int_cst_value (ref->group->step)) <
>>>>>> +      (HOST_WIDE_INT) PREFETCH_MINIMUM_STRIDE)
>>>>>> +    {
>>>>>>
>>>>>> The '<' should go on the line below together with
>>>>>> PREFETCH_MINIMUM_STRIDE.
>>>>>
>>>>>
>>>>> I've fixed this locally now.
>>>>
>>>>
>>>> Thanks. I haven't followed the patch in detail, are you looking for
>>>> midend changes approval since the last version?
>>>> Or do you need aarch64 approval?
>>>
>>>
>>> The changes are not substantial, but midend approval i what i was aiming
>>> at.
>>>
>>> Also the confirmation that PR85682 is no longer happening.
>>
>>
>> James confirmed PR85682 is no longer reproducible with the updated patch and
>> the bootstrap issue is fixed now. So i take it this should be OK to push to
>> mainline?
>>
>> Also, i'd like to discuss the possibility of having these couple options
>> backported to GCC 8. As is, the changes don't alter code generation by
>> default, but they allow better tuning of the software prefetcher for targets
>> that benefit from it.
>>
>> Maybe after letting the changes bake on mainline enough to be confirmed
>> stable?
> 
> It breaks GCC bootstrap on i686:
> 
> ../../src-trunk/gcc/tree-ssa-loop-prefetch.c: In function ‘bool
> should_issue_prefetch_p(mem_ref*)’:
> ../../src-trunk/gcc/tree-ssa-loop-prefetch.c:1015:4: error: format
> ‘%ld’ expects argument of type ‘long int’, but argument 5 has type
> ‘long long int’ [-Werror=format=]
>      "Step for reference %u:%u (%ld) is less than the mininum "
>      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>      "required stride of %d\n",
>      ~~~~~~~~~~~~~~~~~~~~~~~~~
>      ref->group->uid, ref->uid, int_cst_value (ref->group->step),
>                                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 

Sorry. Does the following fix it for i686?

I had bootstrapped it for x86_64.

[-- Attachment #2: i686_bootstrap.diff --]
[-- Type: text/x-patch, Size: 688 bytes --]

2018-05-22  Luis Machado  <luis.machado@linaro.org>

	* tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Cast value
	to HOST_WIDE_INT.

Index: gcc/tree-ssa-loop-prefetch.c
===================================================================
--- gcc/tree-ssa-loop-prefetch.c	(revision 260625)
+++ gcc/tree-ssa-loop-prefetch.c	(working copy)
@@ -1014,7 +1014,8 @@
 	fprintf (dump_file,
 		 "Step for reference %u:%u (%ld) is less than the mininum "
 		 "required stride of %d\n",
-		 ref->group->uid, ref->uid, int_cst_value (ref->group->step),
+		 ref->group->uid, ref->uid,
+		 (HOST_WIDE_INT) int_cst_value (ref->group->step),
 		 PREFETCH_MINIMUM_STRIDE);
       return false;
     }

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] Introduce prefetch-minimum stride option
  2018-05-23 22:34                 ` Luis Machado
@ 2018-05-23 22:41                   ` H.J. Lu
  2018-05-23 22:42                     ` H.J. Lu
  0 siblings, 1 reply; 28+ messages in thread
From: H.J. Lu @ 2018-05-23 22:41 UTC (permalink / raw)
  To: Luis Machado
  Cc: Kyrill Tkachov, GCC Patches, James Greenhalgh, Richard Earnshaw,
	Jeff Law

On Wed, May 23, 2018 at 3:29 PM, Luis Machado <luis.machado@linaro.org> wrote:
>
>
> On 05/23/2018 05:01 PM, H.J. Lu wrote:
>>
>> On Tue, May 22, 2018 at 11:55 AM, Luis Machado <luis.machado@linaro.org>
>> wrote:
>>>
>>>
>>>
>>> On 05/16/2018 08:18 AM, Luis Machado wrote:
>>>>
>>>>
>>>>
>>>>
>>>> On 05/16/2018 06:08 AM, Kyrill Tkachov wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 15/05/18 12:12, Luis Machado wrote:
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> On 05/15/2018 06:37 AM, Kyrill Tkachov wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi Luis,
>>>>>>>
>>>>>>> On 14/05/18 22:18, Luis Machado wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Here's an updated version of the patch (now reverted) that addresses
>>>>>>>> the previous bootstrap problem (signedness and long long/int
>>>>>>>> conversion).
>>>>>>>>
>>>>>>>> I've checked that it bootstraps properly on both aarch64-linux and
>>>>>>>> x86_64-linux and that tests look sane.
>>>>>>>>
>>>>>>>> James, would you please give this one a try to see if you can still
>>>>>>>> reproduce PR85682? I couldn't reproduce it in multiple attempts.
>>>>>>>>
>>>>>>>
>>>>>>> The patch doesn't hit the regressions in PR85682 from what I can see.
>>>>>>> I have a comment on the patch below.
>>>>>>>
>>>>>>
>>>>>> Great. Thanks for checking Kyrill.
>>>>>>
>>>>>>> --- a/gcc/tree-ssa-loop-prefetch.c
>>>>>>> +++ b/gcc/tree-ssa-loop-prefetch.c
>>>>>>> @@ -992,6 +992,23 @@ prune_by_reuse (struct mem_ref_group *groups)
>>>>>>>    static bool
>>>>>>>    should_issue_prefetch_p (struct mem_ref *ref)
>>>>>>>    {
>>>>>>> +  /* Some processors may have a hardware prefetcher that may
>>>>>>> conflict
>>>>>>> with
>>>>>>> +     prefetch hints for a range of strides.  Make sure we don't
>>>>>>> issue
>>>>>>> +     prefetches for such cases if the stride is within this
>>>>>>> particular
>>>>>>> +     range.  */
>>>>>>> +  if (cst_and_fits_in_hwi (ref->group->step)
>>>>>>> +      && abs_hwi (int_cst_value (ref->group->step)) <
>>>>>>> +      (HOST_WIDE_INT) PREFETCH_MINIMUM_STRIDE)
>>>>>>> +    {
>>>>>>>
>>>>>>> The '<' should go on the line below together with
>>>>>>> PREFETCH_MINIMUM_STRIDE.
>>>>>>
>>>>>>
>>>>>>
>>>>>> I've fixed this locally now.
>>>>>
>>>>>
>>>>>
>>>>> Thanks. I haven't followed the patch in detail, are you looking for
>>>>> midend changes approval since the last version?
>>>>> Or do you need aarch64 approval?
>>>>
>>>>
>>>>
>>>> The changes are not substantial, but midend approval i what i was aiming
>>>> at.
>>>>
>>>> Also the confirmation that PR85682 is no longer happening.
>>>
>>>
>>>
>>> James confirmed PR85682 is no longer reproducible with the updated patch
>>> and
>>> the bootstrap issue is fixed now. So i take it this should be OK to push
>>> to
>>> mainline?
>>>
>>> Also, i'd like to discuss the possibility of having these couple options
>>> backported to GCC 8. As is, the changes don't alter code generation by
>>> default, but they allow better tuning of the software prefetcher for
>>> targets
>>> that benefit from it.
>>>
>>> Maybe after letting the changes bake on mainline enough to be confirmed
>>> stable?
>>
>>
>> It breaks GCC bootstrap on i686:
>>
>> ../../src-trunk/gcc/tree-ssa-loop-prefetch.c: In function ‘bool
>> should_issue_prefetch_p(mem_ref*)’:
>> ../../src-trunk/gcc/tree-ssa-loop-prefetch.c:1015:4: error: format
>> ‘%ld’ expects argument of type ‘long int’, but argument 5 has type
>> ‘long long int’ [-Werror=format=]
>>      "Step for reference %u:%u (%ld) is less than the mininum "
>>      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>      "required stride of %d\n",
>>      ~~~~~~~~~~~~~~~~~~~~~~~~~
>>      ref->group->uid, ref->uid, int_cst_value (ref->group->step),
>>                                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>
> Sorry. Does the following fix it for i686?
>

Index: gcc/tree-ssa-loop-prefetch.c
===================================================================
--- gcc/tree-ssa-loop-prefetch.c (revision 260625)
+++ gcc/tree-ssa-loop-prefetch.c (working copy)
@@ -1014,7 +1014,8 @@
  fprintf (dump_file,
  "Step for reference %u:%u (%ld) is less than the mininum "
                                              ^^^ Please use
HOST_WIDE_INT_PRINT_DEC
  "required stride of %d\n",
- ref->group->uid, ref->uid, int_cst_value (ref->group->step),
+ ref->group->uid, ref->uid,
+ (HOST_WIDE_INT) int_cst_value (ref->group->step),
  PREFETCH_MINIMUM_STRIDE);


-- 
H.J.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] Introduce prefetch-minimum stride option
  2018-05-23 22:41                   ` H.J. Lu
@ 2018-05-23 22:42                     ` H.J. Lu
  2018-05-23 22:45                       ` H.J. Lu
  0 siblings, 1 reply; 28+ messages in thread
From: H.J. Lu @ 2018-05-23 22:42 UTC (permalink / raw)
  To: Luis Machado
  Cc: Kyrill Tkachov, GCC Patches, James Greenhalgh, Richard Earnshaw,
	Jeff Law

On Wed, May 23, 2018 at 3:35 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Wed, May 23, 2018 at 3:29 PM, Luis Machado <luis.machado@linaro.org> wrote:
>>
>>
>> On 05/23/2018 05:01 PM, H.J. Lu wrote:
>>>
>>> On Tue, May 22, 2018 at 11:55 AM, Luis Machado <luis.machado@linaro.org>
>>> wrote:
>>>>
>>>>
>>>>
>>>> On 05/16/2018 08:18 AM, Luis Machado wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 05/16/2018 06:08 AM, Kyrill Tkachov wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 15/05/18 12:12, Luis Machado wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> On 05/15/2018 06:37 AM, Kyrill Tkachov wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi Luis,
>>>>>>>>
>>>>>>>> On 14/05/18 22:18, Luis Machado wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Here's an updated version of the patch (now reverted) that addresses
>>>>>>>>> the previous bootstrap problem (signedness and long long/int
>>>>>>>>> conversion).
>>>>>>>>>
>>>>>>>>> I've checked that it bootstraps properly on both aarch64-linux and
>>>>>>>>> x86_64-linux and that tests look sane.
>>>>>>>>>
>>>>>>>>> James, would you please give this one a try to see if you can still
>>>>>>>>> reproduce PR85682? I couldn't reproduce it in multiple attempts.
>>>>>>>>>
>>>>>>>>
>>>>>>>> The patch doesn't hit the regressions in PR85682 from what I can see.
>>>>>>>> I have a comment on the patch below.
>>>>>>>>
>>>>>>>
>>>>>>> Great. Thanks for checking Kyrill.
>>>>>>>
>>>>>>>> --- a/gcc/tree-ssa-loop-prefetch.c
>>>>>>>> +++ b/gcc/tree-ssa-loop-prefetch.c
>>>>>>>> @@ -992,6 +992,23 @@ prune_by_reuse (struct mem_ref_group *groups)
>>>>>>>>    static bool
>>>>>>>>    should_issue_prefetch_p (struct mem_ref *ref)
>>>>>>>>    {
>>>>>>>> +  /* Some processors may have a hardware prefetcher that may
>>>>>>>> conflict
>>>>>>>> with
>>>>>>>> +     prefetch hints for a range of strides.  Make sure we don't
>>>>>>>> issue
>>>>>>>> +     prefetches for such cases if the stride is within this
>>>>>>>> particular
>>>>>>>> +     range.  */
>>>>>>>> +  if (cst_and_fits_in_hwi (ref->group->step)
>>>>>>>> +      && abs_hwi (int_cst_value (ref->group->step)) <
>>>>>>>> +      (HOST_WIDE_INT) PREFETCH_MINIMUM_STRIDE)
>>>>>>>> +    {
>>>>>>>>
>>>>>>>> The '<' should go on the line below together with
>>>>>>>> PREFETCH_MINIMUM_STRIDE.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I've fixed this locally now.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks. I haven't followed the patch in detail, are you looking for
>>>>>> midend changes approval since the last version?
>>>>>> Or do you need aarch64 approval?
>>>>>
>>>>>
>>>>>
>>>>> The changes are not substantial, but midend approval i what i was aiming
>>>>> at.
>>>>>
>>>>> Also the confirmation that PR85682 is no longer happening.
>>>>
>>>>
>>>>
>>>> James confirmed PR85682 is no longer reproducible with the updated patch
>>>> and
>>>> the bootstrap issue is fixed now. So i take it this should be OK to push
>>>> to
>>>> mainline?
>>>>
>>>> Also, i'd like to discuss the possibility of having these couple options
>>>> backported to GCC 8. As is, the changes don't alter code generation by
>>>> default, but they allow better tuning of the software prefetcher for
>>>> targets
>>>> that benefit from it.
>>>>
>>>> Maybe after letting the changes bake on mainline enough to be confirmed
>>>> stable?
>>>
>>>
>>> It breaks GCC bootstrap on i686:
>>>
>>> ../../src-trunk/gcc/tree-ssa-loop-prefetch.c: In function ‘bool
>>> should_issue_prefetch_p(mem_ref*)’:
>>> ../../src-trunk/gcc/tree-ssa-loop-prefetch.c:1015:4: error: format
>>> ‘%ld’ expects argument of type ‘long int’, but argument 5 has type
>>> ‘long long int’ [-Werror=format=]
>>>      "Step for reference %u:%u (%ld) is less than the mininum "
>>>      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>      "required stride of %d\n",
>>>      ~~~~~~~~~~~~~~~~~~~~~~~~~
>>>      ref->group->uid, ref->uid, int_cst_value (ref->group->step),
>>>                                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>
>>
>> Sorry. Does the following fix it for i686?
>>
>
> Index: gcc/tree-ssa-loop-prefetch.c
> ===================================================================
> --- gcc/tree-ssa-loop-prefetch.c (revision 260625)
> +++ gcc/tree-ssa-loop-prefetch.c (working copy)
> @@ -1014,7 +1014,8 @@
>   fprintf (dump_file,
>   "Step for reference %u:%u (%ld) is less than the mininum "
>                                               ^^^ Please use
> HOST_WIDE_INT_PRINT_DEC
>   "required stride of %d\n",
> - ref->group->uid, ref->uid, int_cst_value (ref->group->step),
> + ref->group->uid, ref->uid,
> + (HOST_WIDE_INT) int_cst_value (ref->group->step),
>   PREFETCH_MINIMUM_STRIDE);
>
>

Something like this.

-- 
H.J.
---
diff --git a/gcc/tree-ssa-loop-prefetch.c b/gcc/tree-ssa-loop-prefetch.c
index c3e7fd1e529..949a67f360e 100644
--- a/gcc/tree-ssa-loop-prefetch.c
+++ b/gcc/tree-ssa-loop-prefetch.c
@@ -1012,8 +1012,8 @@ should_issue_prefetch_p (struct mem_ref *ref)
     {
       if (dump_file && (dump_flags & TDF_DETAILS))
   fprintf (dump_file,
-      "Step for reference %u:%u (%ld) is less than the mininum "
-      "required stride of %d\n",
+      "Step for reference %u:%u (" HOST_WIDE_INT_PRINT_C
+      ") is less than the mininum required stride of %d\n",
       ref->group->uid, ref->uid, int_cst_value (ref->group->step),
       PREFETCH_MINIMUM_STRIDE);
       return false;

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] Introduce prefetch-minimum stride option
  2018-05-23 22:42                     ` H.J. Lu
@ 2018-05-23 22:45                       ` H.J. Lu
  2018-05-23 23:29                         ` Luis Machado
  0 siblings, 1 reply; 28+ messages in thread
From: H.J. Lu @ 2018-05-23 22:45 UTC (permalink / raw)
  To: Luis Machado
  Cc: Kyrill Tkachov, GCC Patches, James Greenhalgh, Richard Earnshaw,
	Jeff Law

On Wed, May 23, 2018 at 3:41 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Wed, May 23, 2018 at 3:35 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>
>>>
>>> Sorry. Does the following fix it for i686?
>>>
>>
>> Index: gcc/tree-ssa-loop-prefetch.c
>> ===================================================================
>> --- gcc/tree-ssa-loop-prefetch.c (revision 260625)
>> +++ gcc/tree-ssa-loop-prefetch.c (working copy)
>> @@ -1014,7 +1014,8 @@
>>   fprintf (dump_file,
>>   "Step for reference %u:%u (%ld) is less than the mininum "
>>                                               ^^^ Please use
>> HOST_WIDE_INT_PRINT_DEC
>>   "required stride of %d\n",
>> - ref->group->uid, ref->uid, int_cst_value (ref->group->step),
>> + ref->group->uid, ref->uid,
>> + (HOST_WIDE_INT) int_cst_value (ref->group->step),
>>   PREFETCH_MINIMUM_STRIDE);
>>
>>
>
> Something like this.
>
> --
> H.J.
> ---
> diff --git a/gcc/tree-ssa-loop-prefetch.c b/gcc/tree-ssa-loop-prefetch.c
> index c3e7fd1e529..949a67f360e 100644
> --- a/gcc/tree-ssa-loop-prefetch.c
> +++ b/gcc/tree-ssa-loop-prefetch.c
> @@ -1012,8 +1012,8 @@ should_issue_prefetch_p (struct mem_ref *ref)
>      {
>        if (dump_file && (dump_flags & TDF_DETAILS))
>    fprintf (dump_file,
> -      "Step for reference %u:%u (%ld) is less than the mininum "
> -      "required stride of %d\n",
> +      "Step for reference %u:%u (" HOST_WIDE_INT_PRINT_C
> +      ") is less than the mininum required stride of %d\n",
>        ref->group->uid, ref->uid, int_cst_value (ref->group->step),
>        PREFETCH_MINIMUM_STRIDE);
>        return false;

I meant:

diff --git a/gcc/tree-ssa-loop-prefetch.c b/gcc/tree-ssa-loop-prefetch.c
index c3e7fd1e529..e34b78dc186 100644
--- a/gcc/tree-ssa-loop-prefetch.c
+++ b/gcc/tree-ssa-loop-prefetch.c
@@ -1012,8 +1012,8 @@ should_issue_prefetch_p (struct mem_ref *ref)
     {
       if (dump_file && (dump_flags & TDF_DETAILS))
   fprintf (dump_file,
-      "Step for reference %u:%u (%ld) is less than the mininum "
-      "required stride of %d\n",
+      "Step for reference %u:%u (" HOST_WIDE_INT_PRINT_DEC
+      ") is less than the mininum required stride of %d\n",
       ref->group->uid, ref->uid, int_cst_value (ref->group->step),
       PREFETCH_MINIMUM_STRIDE);
       return false;


-- 
H.J.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] Introduce prefetch-minimum stride option
  2018-05-23 22:45                       ` H.J. Lu
@ 2018-05-23 23:29                         ` Luis Machado
  2018-05-24  2:51                           ` Jeff Law
  0 siblings, 1 reply; 28+ messages in thread
From: Luis Machado @ 2018-05-23 23:29 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Kyrill Tkachov, GCC Patches, James Greenhalgh, Richard Earnshaw,
	Jeff Law

On 05/23/2018 07:42 PM, H.J. Lu wrote:
> On Wed, May 23, 2018 at 3:41 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Wed, May 23, 2018 at 3:35 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>
>>>>
>>>> Sorry. Does the following fix it for i686?
>>>>
>>>
>>> Index: gcc/tree-ssa-loop-prefetch.c
>>> ===================================================================
>>> --- gcc/tree-ssa-loop-prefetch.c (revision 260625)
>>> +++ gcc/tree-ssa-loop-prefetch.c (working copy)
>>> @@ -1014,7 +1014,8 @@
>>>    fprintf (dump_file,
>>>    "Step for reference %u:%u (%ld) is less than the mininum "
>>>                                                ^^^ Please use
>>> HOST_WIDE_INT_PRINT_DEC
>>>    "required stride of %d\n",
>>> - ref->group->uid, ref->uid, int_cst_value (ref->group->step),
>>> + ref->group->uid, ref->uid,
>>> + (HOST_WIDE_INT) int_cst_value (ref->group->step),
>>>    PREFETCH_MINIMUM_STRIDE);
>>>
>>>
>>
>> Something like this.
>>
>> --
>> H.J.
>> ---
>> diff --git a/gcc/tree-ssa-loop-prefetch.c b/gcc/tree-ssa-loop-prefetch.c
>> index c3e7fd1e529..949a67f360e 100644
>> --- a/gcc/tree-ssa-loop-prefetch.c
>> +++ b/gcc/tree-ssa-loop-prefetch.c
>> @@ -1012,8 +1012,8 @@ should_issue_prefetch_p (struct mem_ref *ref)
>>       {
>>         if (dump_file && (dump_flags & TDF_DETAILS))
>>     fprintf (dump_file,
>> -      "Step for reference %u:%u (%ld) is less than the mininum "
>> -      "required stride of %d\n",
>> +      "Step for reference %u:%u (" HOST_WIDE_INT_PRINT_C
>> +      ") is less than the mininum required stride of %d\n",
>>         ref->group->uid, ref->uid, int_cst_value (ref->group->step),
>>         PREFETCH_MINIMUM_STRIDE);
>>         return false;
> 
> I meant:
> 
> diff --git a/gcc/tree-ssa-loop-prefetch.c b/gcc/tree-ssa-loop-prefetch.c
> index c3e7fd1e529..e34b78dc186 100644
> --- a/gcc/tree-ssa-loop-prefetch.c
> +++ b/gcc/tree-ssa-loop-prefetch.c
> @@ -1012,8 +1012,8 @@ should_issue_prefetch_p (struct mem_ref *ref)
>       {
>         if (dump_file && (dump_flags & TDF_DETAILS))
>     fprintf (dump_file,
> -      "Step for reference %u:%u (%ld) is less than the mininum "
> -      "required stride of %d\n",
> +      "Step for reference %u:%u (" HOST_WIDE_INT_PRINT_DEC
> +      ") is less than the mininum required stride of %d\n",
>         ref->group->uid, ref->uid, int_cst_value (ref->group->step),
>         PREFETCH_MINIMUM_STRIDE);
>         return false;
> 
> 

Pushed now. Sorry for the breakage.

For future reference, is there an i686 system on the gcc farm that i can 
use to validate this?

Thanks,
Luis

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] Introduce prefetch-minimum stride option
  2018-05-23 23:29                         ` Luis Machado
@ 2018-05-24  2:51                           ` Jeff Law
  2018-05-24 12:21                             ` Luis Machado
  0 siblings, 1 reply; 28+ messages in thread
From: Jeff Law @ 2018-05-24  2:51 UTC (permalink / raw)
  To: Luis Machado, H.J. Lu
  Cc: Kyrill Tkachov, GCC Patches, James Greenhalgh, Richard Earnshaw

On 05/23/2018 04:50 PM, Luis Machado wrote:
> On 05/23/2018 07:42 PM, H.J. Lu wrote:
>> On Wed, May 23, 2018 at 3:41 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Wed, May 23, 2018 at 3:35 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>
>>>>>
>>>>> Sorry. Does the following fix it for i686?
>>>>>
>>>>
>>>> Index: gcc/tree-ssa-loop-prefetch.c
>>>> ===================================================================
>>>> --- gcc/tree-ssa-loop-prefetch.c (revision 260625)
>>>> +++ gcc/tree-ssa-loop-prefetch.c (working copy)
>>>> @@ -1014,7 +1014,8 @@
>>>>    fprintf (dump_file,
>>>>    "Step for reference %u:%u (%ld) is less than the mininum "
>>>>                                                ^^^ Please use
>>>> HOST_WIDE_INT_PRINT_DEC
>>>>    "required stride of %d\n",
>>>> - ref->group->uid, ref->uid, int_cst_value (ref->group->step),
>>>> + ref->group->uid, ref->uid,
>>>> + (HOST_WIDE_INT) int_cst_value (ref->group->step),
>>>>    PREFETCH_MINIMUM_STRIDE);
>>>>
>>>>
>>>
>>> Something like this.
>>>
>>> -- 
>>> H.J.
>>> ---
>>> diff --git a/gcc/tree-ssa-loop-prefetch.c b/gcc/tree-ssa-loop-prefetch.c
>>> index c3e7fd1e529..949a67f360e 100644
>>> --- a/gcc/tree-ssa-loop-prefetch.c
>>> +++ b/gcc/tree-ssa-loop-prefetch.c
>>> @@ -1012,8 +1012,8 @@ should_issue_prefetch_p (struct mem_ref *ref)
>>>       {
>>>         if (dump_file && (dump_flags & TDF_DETAILS))
>>>     fprintf (dump_file,
>>> -      "Step for reference %u:%u (%ld) is less than the mininum "
>>> -      "required stride of %d\n",
>>> +      "Step for reference %u:%u (" HOST_WIDE_INT_PRINT_C
>>> +      ") is less than the mininum required stride of %d\n",
>>>         ref->group->uid, ref->uid, int_cst_value (ref->group->step),
>>>         PREFETCH_MINIMUM_STRIDE);
>>>         return false;
>>
>> I meant:
>>
>> diff --git a/gcc/tree-ssa-loop-prefetch.c b/gcc/tree-ssa-loop-prefetch.c
>> index c3e7fd1e529..e34b78dc186 100644
>> --- a/gcc/tree-ssa-loop-prefetch.c
>> +++ b/gcc/tree-ssa-loop-prefetch.c
>> @@ -1012,8 +1012,8 @@ should_issue_prefetch_p (struct mem_ref *ref)
>>       {
>>         if (dump_file && (dump_flags & TDF_DETAILS))
>>     fprintf (dump_file,
>> -      "Step for reference %u:%u (%ld) is less than the mininum "
>> -      "required stride of %d\n",
>> +      "Step for reference %u:%u (" HOST_WIDE_INT_PRINT_DEC
>> +      ") is less than the mininum required stride of %d\n",
>>         ref->group->uid, ref->uid, int_cst_value (ref->group->step),
>>         PREFETCH_MINIMUM_STRIDE);
>>         return false;
>>
>>
> 
> Pushed now. Sorry for the breakage.
> 
> For future reference, is there an i686 system on the gcc farm that i can
> use to validate this?
My tester uses gcc45.  In theory you could probably also do it in an
i686 chroot on an x86_64 system.  My tester does that for testing ppc32
on a ppc64 system.

jeff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] Introduce prefetch-minimum stride option
  2018-05-24  2:51                           ` Jeff Law
@ 2018-05-24 12:21                             ` Luis Machado
  0 siblings, 0 replies; 28+ messages in thread
From: Luis Machado @ 2018-05-24 12:21 UTC (permalink / raw)
  To: Jeff Law, H.J. Lu
  Cc: Kyrill Tkachov, GCC Patches, James Greenhalgh, Richard Earnshaw

On 05/23/2018 10:57 PM, Jeff Law wrote:
> On 05/23/2018 04:50 PM, Luis Machado wrote:
>> On 05/23/2018 07:42 PM, H.J. Lu wrote:
>>> On Wed, May 23, 2018 at 3:41 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>> On Wed, May 23, 2018 at 3:35 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>
>>>>>>
>>>>>> Sorry. Does the following fix it for i686?
>>>>>>
>>>>>
>>>>> Index: gcc/tree-ssa-loop-prefetch.c
>>>>> ===================================================================
>>>>> --- gcc/tree-ssa-loop-prefetch.c (revision 260625)
>>>>> +++ gcc/tree-ssa-loop-prefetch.c (working copy)
>>>>> @@ -1014,7 +1014,8 @@
>>>>>     fprintf (dump_file,
>>>>>     "Step for reference %u:%u (%ld) is less than the mininum "
>>>>>                                                 ^^^ Please use
>>>>> HOST_WIDE_INT_PRINT_DEC
>>>>>     "required stride of %d\n",
>>>>> - ref->group->uid, ref->uid, int_cst_value (ref->group->step),
>>>>> + ref->group->uid, ref->uid,
>>>>> + (HOST_WIDE_INT) int_cst_value (ref->group->step),
>>>>>     PREFETCH_MINIMUM_STRIDE);
>>>>>
>>>>>
>>>>
>>>> Something like this.
>>>>
>>>> -- 
>>>> H.J.
>>>> ---
>>>> diff --git a/gcc/tree-ssa-loop-prefetch.c b/gcc/tree-ssa-loop-prefetch.c
>>>> index c3e7fd1e529..949a67f360e 100644
>>>> --- a/gcc/tree-ssa-loop-prefetch.c
>>>> +++ b/gcc/tree-ssa-loop-prefetch.c
>>>> @@ -1012,8 +1012,8 @@ should_issue_prefetch_p (struct mem_ref *ref)
>>>>        {
>>>>          if (dump_file && (dump_flags & TDF_DETAILS))
>>>>      fprintf (dump_file,
>>>> -      "Step for reference %u:%u (%ld) is less than the mininum "
>>>> -      "required stride of %d\n",
>>>> +      "Step for reference %u:%u (" HOST_WIDE_INT_PRINT_C
>>>> +      ") is less than the mininum required stride of %d\n",
>>>>          ref->group->uid, ref->uid, int_cst_value (ref->group->step),
>>>>          PREFETCH_MINIMUM_STRIDE);
>>>>          return false;
>>>
>>> I meant:
>>>
>>> diff --git a/gcc/tree-ssa-loop-prefetch.c b/gcc/tree-ssa-loop-prefetch.c
>>> index c3e7fd1e529..e34b78dc186 100644
>>> --- a/gcc/tree-ssa-loop-prefetch.c
>>> +++ b/gcc/tree-ssa-loop-prefetch.c
>>> @@ -1012,8 +1012,8 @@ should_issue_prefetch_p (struct mem_ref *ref)
>>>        {
>>>          if (dump_file && (dump_flags & TDF_DETAILS))
>>>      fprintf (dump_file,
>>> -      "Step for reference %u:%u (%ld) is less than the mininum "
>>> -      "required stride of %d\n",
>>> +      "Step for reference %u:%u (" HOST_WIDE_INT_PRINT_DEC
>>> +      ") is less than the mininum required stride of %d\n",
>>>          ref->group->uid, ref->uid, int_cst_value (ref->group->step),
>>>          PREFETCH_MINIMUM_STRIDE);
>>>          return false;
>>>
>>>
>>
>> Pushed now. Sorry for the breakage.
>>
>> For future reference, is there an i686 system on the gcc farm that i can
>> use to validate this?
> My tester uses gcc45.  In theory you could probably also do it in an
> i686 chroot on an x86_64 system.  My tester does that for testing ppc32
> on a ppc64 system.

Got it. Thanks for the info.

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2018-05-24 12:20 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-22 13:46 [PATCH 0/2] Add a couple new options to control loop prefetch pass Luis Machado
2018-01-22 14:01 ` [PATCH 1/2] Introduce prefetch-minimum stride option Luis Machado
2018-01-23  9:46   ` Kyrill Tkachov
2018-01-23 13:23     ` Luis Machado
2018-05-01 18:30   ` Jeff Law
2018-05-07 14:10     ` Luis Machado
2018-05-07 15:15       ` H.J. Lu
2018-05-07 15:51         ` Luis Machado
2018-05-14 21:21   ` Luis Machado
2018-05-15  9:59     ` Kyrill Tkachov
2018-05-15 11:21       ` Luis Machado
2018-05-16  9:22         ` Kyrill Tkachov
2018-05-16 11:53           ` Luis Machado
2018-05-22 18:56             ` Luis Machado
2018-05-22 21:21               ` Jeff Law
2018-05-23 20:27               ` H.J. Lu
2018-05-23 22:34                 ` Luis Machado
2018-05-23 22:41                   ` H.J. Lu
2018-05-23 22:42                     ` H.J. Lu
2018-05-23 22:45                       ` H.J. Lu
2018-05-23 23:29                         ` Luis Machado
2018-05-24  2:51                           ` Jeff Law
2018-05-24 12:21                             ` Luis Machado
2018-01-22 14:10 ` [PATCH 2/2] Introduce prefetch-dynamic-strides option Luis Machado
2018-01-23  9:53   ` Kyrill Tkachov
2018-01-23 13:32     ` Luis Machado
2018-05-01 18:31   ` Jeff Law
2018-05-07 14:13     ` Luis Machado

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).