From: Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>
To: Luis Machado <luis.machado@linaro.org>,
"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Cc: James Greenhalgh <James.Greenhalgh@arm.com>,
Richard Earnshaw <Richard.Earnshaw@arm.com>
Subject: Re: [PATCH 1/2] Introduce prefetch-minimum stride option
Date: Tue, 23 Jan 2018 09:46:00 -0000 [thread overview]
Message-ID: <5A670133.4010303@foss.arm.com> (raw)
In-Reply-To: <1516628770-25036-2-git-send-email-luis.machado@linaro.org>
Hi Luis,
On 22/01/18 13:46, Luis Machado wrote:
> This patch adds a new option to control the minimum stride, for a memory
> reference, after which the loop prefetch pass may issue software prefetch
> hints for. There are two motivations:
>
> * Make the pass less aggressive, only issuing prefetch hints for bigger strides
> that are more likely to benefit from prefetching. I've noticed a case in cpu2017
> where we were issuing thousands of hints, for example.
>
I've noticed a large amount of prefetch hints being issued as well, but had not
analysed it further.
> * For processors that have a hardware prefetcher, like Falkor, it allows the
> loop prefetch pass to defer prefetching of smaller (less than the threshold)
> strides to the hardware prefetcher instead. This prevents conflicts between
> the software prefetcher and the hardware prefetcher.
>
> I've noticed considerable reduction in the number of prefetch hints and
> slightly positive performance numbers. This aligns GCC and LLVM in terms of
> prefetch behavior for Falkor.
Do you, by any chance, have a link to the LLVM review that implemented that behavior?
It's okay if you don't, but I think it would be useful context.
>
> The default settings should guarantee no changes for existing targets. Those
> are free to tweak the settings as necessary.
>
> No regressions in the testsuite and bootstrapped ok on aarch64-linux.
>
> Ok?
>
Are there any benchmark numbers you can share?
I think this approach is sensible.
Since your patch touches generic code as well as AArch64
code you'll need an approval from a midend maintainer as well as an AArch64 maintainer.
Also, GCC development is now in the regression fixing stage, so unless this fixes a regression
it may have to wait until GCC 9 development is opened.
Thanks,
Kyrill
> 2018-01-22 Luis Machado <luis.machado@linaro.org>
>
> Introduce option to limit software prefetching to known constant
> strides above a specific threshold with the goal of preventing
> conflicts with a hardware prefetcher.
>
> gcc/
> * config/aarch64/aarch64-protos.h (cpu_prefetch_tune)
> <minimum_stride>: New const int field.
> * config/aarch64/aarch64.c (generic_prefetch_tune): Update to include
> minimum_stride field.
> (exynosm1_prefetch_tune): Likewise.
> (thunderxt88_prefetch_tune): Likewise.
> (thunderx_prefetch_tune): Likewise.
> (thunderx2t99_prefetch_tune): Likewise.
> (qdf24xx_prefetch_tune): Likewise. Set minimum_stride to 2048.
> (aarch64_override_options_internal): Update to set
> PARAM_PREFETCH_MINIMUM_STRIDE.
> * doc/invoke.texi (prefetch-minimum-stride): Document new option.
> * params.def (PARAM_PREFETCH_MINIMUM_STRIDE): New.
> * params.h (PARAM_PREFETCH_MINIMUM_STRIDE): Define.
> * tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Return false if
> stride is constant and is below the minimum stride threshold.
> ---
> gcc/config/aarch64/aarch64-protos.h | 3 +++
> gcc/config/aarch64/aarch64.c | 13 ++++++++++++-
> gcc/doc/invoke.texi | 15 +++++++++++++++
> gcc/params.def | 9 +++++++++
> gcc/params.h | 2 ++
> gcc/tree-ssa-loop-prefetch.c | 16 ++++++++++++++++
> 6 files changed, 57 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
> index ef1b0bc..8736bd9 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -230,6 +230,9 @@ struct cpu_prefetch_tune
> const int l1_cache_size;
> const int l1_cache_line_size;
> const int l2_cache_size;
> + /* The minimum constant stride beyond which we should use prefetch
> + hints for. */
> + const int minimum_stride;
> const int default_opt_level;
> };
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 174310c..0ed9f14 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -547,6 +547,7 @@ static const cpu_prefetch_tune generic_prefetch_tune =
> -1, /* l1_cache_size */
> -1, /* l1_cache_line_size */
> -1, /* l2_cache_size */
> + -1, /* minimum_stride */
> -1 /* default_opt_level */
> };
>
> @@ -556,6 +557,7 @@ static const cpu_prefetch_tune exynosm1_prefetch_tune =
> -1, /* l1_cache_size */
> 64, /* l1_cache_line_size */
> -1, /* l2_cache_size */
> + -1, /* minimum_stride */
> -1 /* default_opt_level */
> };
>
> @@ -565,7 +567,8 @@ static const cpu_prefetch_tune qdf24xx_prefetch_tune =
> 32, /* l1_cache_size */
> 64, /* l1_cache_line_size */
> 1024, /* l2_cache_size */
> - -1 /* default_opt_level */
> + 2048, /* minimum_stride */
> + 3 /* default_opt_level */
> };
>
> static const cpu_prefetch_tune thunderxt88_prefetch_tune =
> @@ -574,6 +577,7 @@ static const cpu_prefetch_tune thunderxt88_prefetch_tune =
> 32, /* l1_cache_size */
> 128, /* l1_cache_line_size */
> 16*1024, /* l2_cache_size */
> + -1, /* minimum_stride */
> 3 /* default_opt_level */
> };
>
> @@ -583,6 +587,7 @@ static const cpu_prefetch_tune thunderx_prefetch_tune =
> 32, /* l1_cache_size */
> 128, /* l1_cache_line_size */
> -1, /* l2_cache_size */
> + -1, /* minimum_stride */
> -1 /* default_opt_level */
> };
>
> @@ -592,6 +597,7 @@ static const cpu_prefetch_tune thunderx2t99_prefetch_tune =
> 32, /* l1_cache_size */
> 64, /* l1_cache_line_size */
> 256, /* l2_cache_size */
> + -1, /* minimum_stride */
> -1 /* default_opt_level */
> };
>
> @@ -10461,6 +10467,11 @@ aarch64_override_options_internal (struct gcc_options *opts)
> aarch64_tune_params.prefetch->l2_cache_size,
> opts->x_param_values,
> global_options_set.x_param_values);
> + if (aarch64_tune_params.prefetch->minimum_stride >= 0)
> + maybe_set_param_value (PARAM_PREFETCH_MINIMUM_STRIDE,
> + aarch64_tune_params.prefetch->minimum_stride,
> + opts->x_param_values,
> + global_options_set.x_param_values);
>
> /* Use the alternative scheduling-pressure algorithm by default. */
> maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, SCHED_PRESSURE_MODEL,
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 27c5974..1cb1ef5 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -10567,6 +10567,21 @@ The size of L1 cache, in kilobytes.
> @item l2-cache-size
> The size of L2 cache, in kilobytes.
>
> +@item prefetch-minimum-stride
> +Minimum constant stride, in bytes, to start using prefetch hints for. If
> +the stride is less than this threshold, prefetch hints will not be issued.
> +
> +This setting is useful for processors that have hardware prefetchers, in
> +which case there may be conflicts between the hardware prefetchers and
> +the software prefetchers. If the hardware prefetchers have a maximum
> +stride they can handle, it should be used here to improve the use of
> +software prefetchers.
> +
> +A value of -1, the default, means we don't have a threshold and therefore
> +prefetch hints can be issued for any constant stride.
> +
> +This setting is only useful for strides that are known and constant.
> +
> @item loop-interchange-max-num-stmts
> The maximum number of stmts in a loop to be interchanged.
>
> diff --git a/gcc/params.def b/gcc/params.def
> index 930b318..bf2d12c 100644
> --- a/gcc/params.def
> +++ b/gcc/params.def
> @@ -790,6 +790,15 @@ DEFPARAM (PARAM_L2_CACHE_SIZE,
> "The size of L2 cache.",
> 512, 0, 0)
>
> +/* The minimum constant stride beyond which we should use prefetch hints
> + for. */
> +
> +DEFPARAM (PARAM_PREFETCH_MINIMUM_STRIDE,
> + "prefetch-minimum-stride",
> + "The minimum constant stride beyond which we should use prefetch "
> + "hints for.",
> + -1, 0, 0)
> +
> /* Maximum number of statements in loop nest for loop interchange. */
>
> DEFPARAM (PARAM_LOOP_INTERCHANGE_MAX_NUM_STMTS,
> diff --git a/gcc/params.h b/gcc/params.h
> index 98249d2..96012db 100644
> --- a/gcc/params.h
> +++ b/gcc/params.h
> @@ -196,6 +196,8 @@ extern void init_param_values (int *params);
> PARAM_VALUE (PARAM_L1_CACHE_LINE_SIZE)
> #define L2_CACHE_SIZE \
> PARAM_VALUE (PARAM_L2_CACHE_SIZE)
> +#define PREFETCH_MINIMUM_STRIDE \
> + PARAM_VALUE (PARAM_PREFETCH_MINIMUM_STRIDE)
> #define USE_CANONICAL_TYPES \
> PARAM_VALUE (PARAM_USE_CANONICAL_TYPES)
> #define IRA_MAX_LOOPS_NUM \
> diff --git a/gcc/tree-ssa-loop-prefetch.c b/gcc/tree-ssa-loop-prefetch.c
> index 2f10db1..112ccac 100644
> --- a/gcc/tree-ssa-loop-prefetch.c
> +++ b/gcc/tree-ssa-loop-prefetch.c
> @@ -992,6 +992,22 @@ prune_by_reuse (struct mem_ref_group *groups)
> static bool
> should_issue_prefetch_p (struct mem_ref *ref)
> {
> + /* Some processors may have a hardware prefetcher that may conflict with
> + prefetch hints for a range of strides. Make sure we don't issue
> + prefetches for such cases if the stride is within this particular
> + range. */
> + if (cst_and_fits_in_hwi (ref->group->step)
> + && absu_hwi (int_cst_value (ref->group->step)) < PREFETCH_MINIMUM_STRIDE)
> + {
> + if (dump_file && (dump_flags & TDF_DETAILS))
> + fprintf (dump_file,
> + "Step for reference %u:%u (%d) is less than the mininum "
> + " required stride of %d\n",
> + ref->group->uid, ref->uid, int_cst_value (ref->group->step),
> + PREFETCH_MINIMUM_STRIDE);
> + return false;
> + }
> +
> /* For now do not issue prefetches for only first few of the
> iterations. */
> if (ref->prefetch_before != PREFETCH_ALL)
> --
> 2.7.4
>
next prev parent reply other threads:[~2018-01-23 9:32 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-01-22 13:46 [PATCH 0/2] Add a couple new options to control loop prefetch pass Luis Machado
2018-01-22 14:01 ` [PATCH 1/2] Introduce prefetch-minimum stride option Luis Machado
2018-01-23 9:46 ` Kyrill Tkachov [this message]
2018-01-23 13:23 ` Luis Machado
2018-05-01 18:30 ` Jeff Law
2018-05-07 14:10 ` Luis Machado
2018-05-07 15:15 ` H.J. Lu
2018-05-07 15:51 ` Luis Machado
2018-05-14 21:21 ` Luis Machado
2018-05-15 9:59 ` Kyrill Tkachov
2018-05-15 11:21 ` Luis Machado
2018-05-16 9:22 ` Kyrill Tkachov
2018-05-16 11:53 ` Luis Machado
2018-05-22 18:56 ` Luis Machado
2018-05-22 21:21 ` Jeff Law
2018-05-23 20:27 ` H.J. Lu
2018-05-23 22:34 ` Luis Machado
2018-05-23 22:41 ` H.J. Lu
2018-05-23 22:42 ` H.J. Lu
2018-05-23 22:45 ` H.J. Lu
2018-05-23 23:29 ` Luis Machado
2018-05-24 2:51 ` Jeff Law
2018-05-24 12:21 ` Luis Machado
2018-01-22 14:10 ` [PATCH 2/2] Introduce prefetch-dynamic-strides option Luis Machado
2018-01-23 9:53 ` Kyrill Tkachov
2018-01-23 13:32 ` Luis Machado
2018-05-01 18:31 ` Jeff Law
2018-05-07 14:13 ` Luis Machado
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5A670133.4010303@foss.arm.com \
--to=kyrylo.tkachov@foss.arm.com \
--cc=James.Greenhalgh@arm.com \
--cc=Richard.Earnshaw@arm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=luis.machado@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).