public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Luis Machado <luis.machado@linaro.org>
To: gcc-patches@gcc.gnu.org
Cc: james.greenhalgh@arm.com, Richard.Earnshaw@arm.com,
	Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>,
	"H.J. Lu" <hjl.tools@gmail.com>, Jeff Law <law@redhat.com>
Subject: Re: [PATCH 1/2] Introduce prefetch-minimum stride option
Date: Mon, 14 May 2018 21:21:00 -0000	[thread overview]
Message-ID: <468c1099-3a87-6e95-53c4-3ba62fe3472f@linaro.org> (raw)
In-Reply-To: <1516628770-25036-2-git-send-email-luis.machado@linaro.org>

[-- Attachment #1: Type: text/plain, Size: 1563 bytes --]

Hi,

Here's an updated version of the patch (now reverted) that addresses the 
previous bootstrap problem (signedness and long long/int conversion).

I've checked that it bootstraps properly on both aarch64-linux and 
x86_64-linux and that tests look sane.

James, would you please give this one a try to see if you can still 
reproduce PR85682? I couldn't reproduce it in multiple attempts.

Thanks,
Luis

On 01/22/2018 11:46 AM, Luis Machado wrote:
> This patch adds a new option to control the minimum stride, for a memory
> reference, after which the loop prefetch pass may issue software prefetch
> hints for. There are two motivations:
> 
> * Make the pass less aggressive, only issuing prefetch hints for bigger strides
> that are more likely to benefit from prefetching. I've noticed a case in cpu2017
> where we were issuing thousands of hints, for example.
> 
> * For processors that have a hardware prefetcher, like Falkor, it allows the
> loop prefetch pass to defer prefetching of smaller (less than the threshold)
> strides to the hardware prefetcher instead. This prevents conflicts between
> the software prefetcher and the hardware prefetcher.
> 
> I've noticed considerable reduction in the number of prefetch hints and
> slightly positive performance numbers. This aligns GCC and LLVM in terms of
> prefetch behavior for Falkor.
> 
> The default settings should guarantee no changes for existing targets. Those
> are free to tweak the settings as necessary.
> 
> No regressions in the testsuite and bootstrapped ok on aarch64-linux.
> 
> Ok?



[-- Attachment #2: 0001-Introduce-prefetch-minimum-stride-option.patch --]
[-- Type: text/x-patch, Size: 8800 bytes --]

From e0207950a6d7793cdaceaa71fc5ada05a93dc1b3 Mon Sep 17 00:00:00 2001
From: Luis Machado <luis.machado@linaro.org>
Date: Thu, 21 Dec 2017 15:23:59 -0200
Subject: [PATCH 1/2] Introduce prefetch-minimum stride option

This patch adds a new option to control the minimum stride, for a memory
reference, after which the loop prefetch pass may issue software prefetch
hints for. There are two motivations:

* Make the pass less aggressive, only issuing prefetch hints for bigger strides
that are more likely to benefit from prefetching. I've noticed a case in cpu2017
where we were issuing thousands of hints, for example.

* For processors that have a hardware prefetcher, like Falkor, it allows the
loop prefetch pass to defer prefetching of smaller (less than the threshold)
strides to the hardware prefetcher instead. This prevents conflicts between
the software prefetcher and the hardware prefetcher.

I've noticed considerable reduction in the number of prefetch hints and
slightly positive performance numbers. This aligns GCC and LLVM in terms of
prefetch behavior for Falkor.

The default settings should guarantee no changes for existing targets. Those
are free to tweak the settings as necessary.

No regressions in the testsuite and bootstrapped ok on aarch64-linux and
x86_64-linux.

Ok?

2018-05-14  Luis Machado  <luis.machado@linaro.org>

	Introduce option to limit software prefetching to known constant
	strides above a specific threshold with the goal of preventing
	conflicts with a hardware prefetcher.

	gcc/
	* config/aarch64/aarch64-protos.h (cpu_prefetch_tune)
	<minimum_stride>: New const int field.
	* config/aarch64/aarch64.c (generic_prefetch_tune): Update to include
	minimum_stride field.
	(exynosm1_prefetch_tune): Likewise.
	(thunderxt88_prefetch_tune): Likewise.
	(thunderx_prefetch_tune): Likewise.
	(thunderx2t99_prefetch_tune): Likewise.
	(qdf24xx_prefetch_tune): Likewise. Set minimum_stride to 2048.
	<default_opt_level>: Set to 3.
	(aarch64_override_options_internal): Update to set
	PARAM_PREFETCH_MINIMUM_STRIDE.
	* doc/invoke.texi (prefetch-minimum-stride): Document new option.
	* params.def (PARAM_PREFETCH_MINIMUM_STRIDE): New.
	* params.h (PARAM_PREFETCH_MINIMUM_STRIDE): Define.
	* tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Return false if
	stride is constant and is below the minimum stride threshold.
---
 gcc/config/aarch64/aarch64-protos.h |  3 +++
 gcc/config/aarch64/aarch64.c        | 13 ++++++++++++-
 gcc/doc/invoke.texi                 | 15 +++++++++++++++
 gcc/params.def                      |  9 +++++++++
 gcc/params.h                        |  2 ++
 gcc/tree-ssa-loop-prefetch.c        | 17 +++++++++++++++++
 6 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index cda2895..5d3b9d7 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -230,6 +230,9 @@ struct cpu_prefetch_tune
   const int l1_cache_size;
   const int l1_cache_line_size;
   const int l2_cache_size;
+  /* The minimum constant stride beyond which we should use prefetch
+     hints for.  */
+  const int minimum_stride;
   const int default_opt_level;
 };
 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index a2003fe..5215deb 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -547,6 +547,7 @@ static const cpu_prefetch_tune generic_prefetch_tune =
   -1,			/* l1_cache_size  */
   -1,			/* l1_cache_line_size  */
   -1,			/* l2_cache_size  */
+  -1,			/* minimum_stride */
   -1			/* default_opt_level  */
 };
 
@@ -556,6 +557,7 @@ static const cpu_prefetch_tune exynosm1_prefetch_tune =
   -1,			/* l1_cache_size  */
   64,			/* l1_cache_line_size  */
   -1,			/* l2_cache_size  */
+  -1,			/* minimum_stride */
   -1			/* default_opt_level  */
 };
 
@@ -565,7 +567,8 @@ static const cpu_prefetch_tune qdf24xx_prefetch_tune =
   32,			/* l1_cache_size  */
   64,			/* l1_cache_line_size  */
   512,			/* l2_cache_size  */
-  -1			/* default_opt_level  */
+  2048,			/* minimum_stride */
+  3			/* default_opt_level  */
 };
 
 static const cpu_prefetch_tune thunderxt88_prefetch_tune =
@@ -574,6 +577,7 @@ static const cpu_prefetch_tune thunderxt88_prefetch_tune =
   32,			/* l1_cache_size  */
   128,			/* l1_cache_line_size  */
   16*1024,		/* l2_cache_size  */
+  -1,			/* minimum_stride */
   3			/* default_opt_level  */
 };
 
@@ -583,6 +587,7 @@ static const cpu_prefetch_tune thunderx_prefetch_tune =
   32,			/* l1_cache_size  */
   128,			/* l1_cache_line_size  */
   -1,			/* l2_cache_size  */
+  -1,			/* minimum_stride */
   -1			/* default_opt_level  */
 };
 
@@ -592,6 +597,7 @@ static const cpu_prefetch_tune thunderx2t99_prefetch_tune =
   32,			/* l1_cache_size  */
   64,			/* l1_cache_line_size  */
   256,			/* l2_cache_size  */
+  -1,			/* minimum_stride */
   -1			/* default_opt_level  */
 };
 
@@ -10617,6 +10623,11 @@ aarch64_override_options_internal (struct gcc_options *opts)
 			   aarch64_tune_params.prefetch->l2_cache_size,
 			   opts->x_param_values,
 			   global_options_set.x_param_values);
+  if (aarch64_tune_params.prefetch->minimum_stride >= 0)
+    maybe_set_param_value (PARAM_PREFETCH_MINIMUM_STRIDE,
+			   aarch64_tune_params.prefetch->minimum_stride,
+			   opts->x_param_values,
+			   global_options_set.x_param_values);
 
   /* Use the alternative scheduling-pressure algorithm by default.  */
   maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, SCHED_PRESSURE_MODEL,
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 6019e1f..c92153e 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -10718,6 +10718,21 @@ The size of L1 cache, in kilobytes.
 @item l2-cache-size
 The size of L2 cache, in kilobytes.
 
+@item prefetch-minimum-stride
+Minimum constant stride, in bytes, to start using prefetch hints for.  If
+the stride is less than this threshold, prefetch hints will not be issued.
+
+This setting is useful for processors that have hardware prefetchers, in
+which case there may be conflicts between the hardware prefetchers and
+the software prefetchers.  If the hardware prefetchers have a maximum
+stride they can handle, it should be used here to improve the use of
+software prefetchers.
+
+A value of -1, the default, means we don't have a threshold and therefore
+prefetch hints can be issued for any constant stride.
+
+This setting is only useful for strides that are known and constant.
+
 @item loop-interchange-max-num-stmts
 The maximum number of stmts in a loop to be interchanged.
 
diff --git a/gcc/params.def b/gcc/params.def
index dad47ec..2166deb 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -790,6 +790,15 @@ DEFPARAM (PARAM_L2_CACHE_SIZE,
 	  "The size of L2 cache.",
 	  512, 0, 0)
 
+/* The minimum constant stride beyond which we should use prefetch hints
+   for.  */
+
+DEFPARAM (PARAM_PREFETCH_MINIMUM_STRIDE,
+	  "prefetch-minimum-stride",
+	  "The minimum constant stride beyond which we should use prefetch "
+	  "hints for.",
+	  -1, 0, 0)
+
 /* Maximum number of statements in loop nest for loop interchange.  */
 
 DEFPARAM (PARAM_LOOP_INTERCHANGE_MAX_NUM_STMTS,
diff --git a/gcc/params.h b/gcc/params.h
index 98249d2..96012db 100644
--- a/gcc/params.h
+++ b/gcc/params.h
@@ -196,6 +196,8 @@ extern void init_param_values (int *params);
   PARAM_VALUE (PARAM_L1_CACHE_LINE_SIZE)
 #define L2_CACHE_SIZE \
   PARAM_VALUE (PARAM_L2_CACHE_SIZE)
+#define PREFETCH_MINIMUM_STRIDE \
+  PARAM_VALUE (PARAM_PREFETCH_MINIMUM_STRIDE)
 #define USE_CANONICAL_TYPES \
   PARAM_VALUE (PARAM_USE_CANONICAL_TYPES)
 #define IRA_MAX_LOOPS_NUM \
diff --git a/gcc/tree-ssa-loop-prefetch.c b/gcc/tree-ssa-loop-prefetch.c
index 2f10db1..e9d8e5f 100644
--- a/gcc/tree-ssa-loop-prefetch.c
+++ b/gcc/tree-ssa-loop-prefetch.c
@@ -992,6 +992,23 @@ prune_by_reuse (struct mem_ref_group *groups)
 static bool
 should_issue_prefetch_p (struct mem_ref *ref)
 {
+  /* Some processors may have a hardware prefetcher that may conflict with
+     prefetch hints for a range of strides.  Make sure we don't issue
+     prefetches for such cases if the stride is within this particular
+     range.  */
+  if (cst_and_fits_in_hwi (ref->group->step)
+      && abs_hwi (int_cst_value (ref->group->step)) <
+	  (HOST_WIDE_INT) PREFETCH_MINIMUM_STRIDE)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file,
+		 "Step for reference %u:%u (%ld) is less than the mininum "
+		 "required stride of %d\n",
+		 ref->group->uid, ref->uid, int_cst_value (ref->group->step),
+		 PREFETCH_MINIMUM_STRIDE);
+      return false;
+    }
+
   /* For now do not issue prefetches for only first few of the
      iterations.  */
   if (ref->prefetch_before != PREFETCH_ALL)
-- 
2.7.4


  parent reply	other threads:[~2018-05-14 21:18 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-22 13:46 [PATCH 0/2] Add a couple new options to control loop prefetch pass Luis Machado
2018-01-22 14:01 ` [PATCH 1/2] Introduce prefetch-minimum stride option Luis Machado
2018-01-23  9:46   ` Kyrill Tkachov
2018-01-23 13:23     ` Luis Machado
2018-05-01 18:30   ` Jeff Law
2018-05-07 14:10     ` Luis Machado
2018-05-07 15:15       ` H.J. Lu
2018-05-07 15:51         ` Luis Machado
2018-05-14 21:21   ` Luis Machado [this message]
2018-05-15  9:59     ` Kyrill Tkachov
2018-05-15 11:21       ` Luis Machado
2018-05-16  9:22         ` Kyrill Tkachov
2018-05-16 11:53           ` Luis Machado
2018-05-22 18:56             ` Luis Machado
2018-05-22 21:21               ` Jeff Law
2018-05-23 20:27               ` H.J. Lu
2018-05-23 22:34                 ` Luis Machado
2018-05-23 22:41                   ` H.J. Lu
2018-05-23 22:42                     ` H.J. Lu
2018-05-23 22:45                       ` H.J. Lu
2018-05-23 23:29                         ` Luis Machado
2018-05-24  2:51                           ` Jeff Law
2018-05-24 12:21                             ` Luis Machado
2018-01-22 14:10 ` [PATCH 2/2] Introduce prefetch-dynamic-strides option Luis Machado
2018-01-23  9:53   ` Kyrill Tkachov
2018-01-23 13:32     ` Luis Machado
2018-05-01 18:31   ` Jeff Law
2018-05-07 14:13     ` Luis Machado

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=468c1099-3a87-6e95-53c4-3ba62fe3472f@linaro.org \
    --to=luis.machado@linaro.org \
    --cc=Richard.Earnshaw@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=hjl.tools@gmail.com \
    --cc=james.greenhalgh@arm.com \
    --cc=kyrylo.tkachov@foss.arm.com \
    --cc=law@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).