public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Kyrill  Tkachov <kyrylo.tkachov@foss.arm.com>
To: Luis Machado <luis.machado@linaro.org>,
	 "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Cc: James Greenhalgh <James.Greenhalgh@arm.com>,
	 Richard Earnshaw <Richard.Earnshaw@arm.com>
Subject: Re: [PATCH 1/2] Introduce prefetch-minimum stride option
Date: Tue, 23 Jan 2018 09:46:00 -0000	[thread overview]
Message-ID: <5A670133.4010303@foss.arm.com> (raw)
In-Reply-To: <1516628770-25036-2-git-send-email-luis.machado@linaro.org>

Hi Luis,

On 22/01/18 13:46, Luis Machado wrote:
> This patch adds a new option to control the minimum stride, for a memory
> reference, after which the loop prefetch pass may issue software prefetch
> hints for. There are two motivations:
>
> * Make the pass less aggressive, only issuing prefetch hints for bigger strides
> that are more likely to benefit from prefetching. I've noticed a case in cpu2017
> where we were issuing thousands of hints, for example.
>

I've noticed a large amount of prefetch hints being issued as well, but had not
analysed it further.

> * For processors that have a hardware prefetcher, like Falkor, it allows the
> loop prefetch pass to defer prefetching of smaller (less than the threshold)
> strides to the hardware prefetcher instead. This prevents conflicts between
> the software prefetcher and the hardware prefetcher.
>
> I've noticed considerable reduction in the number of prefetch hints and
> slightly positive performance numbers. This aligns GCC and LLVM in terms of
> prefetch behavior for Falkor.

Do you, by any chance, have a link to the LLVM review that implemented that behavior?
It's okay if you don't, but I think it would be useful context.

>
> The default settings should guarantee no changes for existing targets. Those
> are free to tweak the settings as necessary.
>
> No regressions in the testsuite and bootstrapped ok on aarch64-linux.
>
> Ok?
>

Are there any benchmark numbers you can share?
I think this approach is sensible.

Since your patch touches generic code as well as AArch64
code you'll need an approval from a midend maintainer as well as an AArch64 maintainer.
Also, GCC development is now in the regression fixing stage, so unless this fixes a regression
it may have to wait until GCC 9 development is opened.

Thanks,
Kyrill

> 2018-01-22  Luis Machado  <luis.machado@linaro.org>
>
>         Introduce option to limit software prefetching to known constant
>         strides above a specific threshold with the goal of preventing
>         conflicts with a hardware prefetcher.
>
>         gcc/
>         * config/aarch64/aarch64-protos.h (cpu_prefetch_tune)
>         <minimum_stride>: New const int field.
>         * config/aarch64/aarch64.c (generic_prefetch_tune): Update to include
>         minimum_stride field.
>         (exynosm1_prefetch_tune): Likewise.
>         (thunderxt88_prefetch_tune): Likewise.
>         (thunderx_prefetch_tune): Likewise.
>         (thunderx2t99_prefetch_tune): Likewise.
>         (qdf24xx_prefetch_tune): Likewise. Set minimum_stride to 2048.
>         (aarch64_override_options_internal): Update to set
>         PARAM_PREFETCH_MINIMUM_STRIDE.
>         * doc/invoke.texi (prefetch-minimum-stride): Document new option.
>         * params.def (PARAM_PREFETCH_MINIMUM_STRIDE): New.
>         * params.h (PARAM_PREFETCH_MINIMUM_STRIDE): Define.
>         * tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Return false if
>         stride is constant and is below the minimum stride threshold.
> ---
>  gcc/config/aarch64/aarch64-protos.h |  3 +++
>  gcc/config/aarch64/aarch64.c        | 13 ++++++++++++-
>  gcc/doc/invoke.texi                 | 15 +++++++++++++++
>  gcc/params.def                      |  9 +++++++++
>  gcc/params.h                        |  2 ++
>  gcc/tree-ssa-loop-prefetch.c        | 16 ++++++++++++++++
>  6 files changed, 57 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
> index ef1b0bc..8736bd9 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -230,6 +230,9 @@ struct cpu_prefetch_tune
>    const int l1_cache_size;
>    const int l1_cache_line_size;
>    const int l2_cache_size;
> +  /* The minimum constant stride beyond which we should use prefetch
> +     hints for.  */
> +  const int minimum_stride;
>    const int default_opt_level;
>  };
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 174310c..0ed9f14 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -547,6 +547,7 @@ static const cpu_prefetch_tune generic_prefetch_tune =
>    -1,                  /* l1_cache_size  */
>    -1,                  /* l1_cache_line_size  */
>    -1,                  /* l2_cache_size  */
> +  -1,                  /* minimum_stride */
>    -1                   /* default_opt_level  */
>  };
>
> @@ -556,6 +557,7 @@ static const cpu_prefetch_tune exynosm1_prefetch_tune =
>    -1,                  /* l1_cache_size  */
>    64,                  /* l1_cache_line_size  */
>    -1,                  /* l2_cache_size  */
> +  -1,                  /* minimum_stride */
>    -1                   /* default_opt_level  */
>  };
>
> @@ -565,7 +567,8 @@ static const cpu_prefetch_tune qdf24xx_prefetch_tune =
>    32,                  /* l1_cache_size  */
>    64,                  /* l1_cache_line_size  */
>    1024,                        /* l2_cache_size  */
> -  -1                   /* default_opt_level  */
> +  2048,                        /* minimum_stride */
> +  3                    /* default_opt_level  */
>  };
>
>  static const cpu_prefetch_tune thunderxt88_prefetch_tune =
> @@ -574,6 +577,7 @@ static const cpu_prefetch_tune thunderxt88_prefetch_tune =
>    32,                  /* l1_cache_size  */
>    128,                 /* l1_cache_line_size  */
>    16*1024,             /* l2_cache_size  */
> +  -1,                  /* minimum_stride */
>    3                    /* default_opt_level  */
>  };
>
> @@ -583,6 +587,7 @@ static const cpu_prefetch_tune thunderx_prefetch_tune =
>    32,                  /* l1_cache_size  */
>    128,                 /* l1_cache_line_size  */
>    -1,                  /* l2_cache_size  */
> +  -1,                  /* minimum_stride */
>    -1                   /* default_opt_level  */
>  };
>
> @@ -592,6 +597,7 @@ static const cpu_prefetch_tune thunderx2t99_prefetch_tune =
>    32,                  /* l1_cache_size  */
>    64,                  /* l1_cache_line_size  */
>    256,                 /* l2_cache_size  */
> +  -1,                  /* minimum_stride */
>    -1                   /* default_opt_level  */
>  };
>
> @@ -10461,6 +10467,11 @@ aarch64_override_options_internal (struct gcc_options *opts)
> aarch64_tune_params.prefetch->l2_cache_size,
>                             opts->x_param_values,
> global_options_set.x_param_values);
> +  if (aarch64_tune_params.prefetch->minimum_stride >= 0)
> +    maybe_set_param_value (PARAM_PREFETCH_MINIMUM_STRIDE,
> + aarch64_tune_params.prefetch->minimum_stride,
> +                          opts->x_param_values,
> + global_options_set.x_param_values);
>
>    /* Use the alternative scheduling-pressure algorithm by default.  */
>    maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, SCHED_PRESSURE_MODEL,
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 27c5974..1cb1ef5 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -10567,6 +10567,21 @@ The size of L1 cache, in kilobytes.
>  @item l2-cache-size
>  The size of L2 cache, in kilobytes.
>
> +@item prefetch-minimum-stride
> +Minimum constant stride, in bytes, to start using prefetch hints for.  If
> +the stride is less than this threshold, prefetch hints will not be issued.
> +
> +This setting is useful for processors that have hardware prefetchers, in
> +which case there may be conflicts between the hardware prefetchers and
> +the software prefetchers.  If the hardware prefetchers have a maximum
> +stride they can handle, it should be used here to improve the use of
> +software prefetchers.
> +
> +A value of -1, the default, means we don't have a threshold and therefore
> +prefetch hints can be issued for any constant stride.
> +
> +This setting is only useful for strides that are known and constant.
> +
>  @item loop-interchange-max-num-stmts
>  The maximum number of stmts in a loop to be interchanged.
>
> diff --git a/gcc/params.def b/gcc/params.def
> index 930b318..bf2d12c 100644
> --- a/gcc/params.def
> +++ b/gcc/params.def
> @@ -790,6 +790,15 @@ DEFPARAM (PARAM_L2_CACHE_SIZE,
>            "The size of L2 cache.",
>            512, 0, 0)
>
> +/* The minimum constant stride beyond which we should use prefetch hints
> +   for.  */
> +
> +DEFPARAM (PARAM_PREFETCH_MINIMUM_STRIDE,
> +         "prefetch-minimum-stride",
> +         "The minimum constant stride beyond which we should use prefetch "
> +         "hints for.",
> +         -1, 0, 0)
> +
>  /* Maximum number of statements in loop nest for loop interchange.  */
>
>  DEFPARAM (PARAM_LOOP_INTERCHANGE_MAX_NUM_STMTS,
> diff --git a/gcc/params.h b/gcc/params.h
> index 98249d2..96012db 100644
> --- a/gcc/params.h
> +++ b/gcc/params.h
> @@ -196,6 +196,8 @@ extern void init_param_values (int *params);
>    PARAM_VALUE (PARAM_L1_CACHE_LINE_SIZE)
>  #define L2_CACHE_SIZE \
>    PARAM_VALUE (PARAM_L2_CACHE_SIZE)
> +#define PREFETCH_MINIMUM_STRIDE \
> +  PARAM_VALUE (PARAM_PREFETCH_MINIMUM_STRIDE)
>  #define USE_CANONICAL_TYPES \
>    PARAM_VALUE (PARAM_USE_CANONICAL_TYPES)
>  #define IRA_MAX_LOOPS_NUM \
> diff --git a/gcc/tree-ssa-loop-prefetch.c b/gcc/tree-ssa-loop-prefetch.c
> index 2f10db1..112ccac 100644
> --- a/gcc/tree-ssa-loop-prefetch.c
> +++ b/gcc/tree-ssa-loop-prefetch.c
> @@ -992,6 +992,22 @@ prune_by_reuse (struct mem_ref_group *groups)
>  static bool
>  should_issue_prefetch_p (struct mem_ref *ref)
>  {
> +  /* Some processors may have a hardware prefetcher that may conflict with
> +     prefetch hints for a range of strides.  Make sure we don't issue
> +     prefetches for such cases if the stride is within this particular
> +     range.  */
> +  if (cst_and_fits_in_hwi (ref->group->step)
> +      && absu_hwi (int_cst_value (ref->group->step)) < PREFETCH_MINIMUM_STRIDE)
> +    {
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +       fprintf (dump_file,
> +                "Step for reference %u:%u (%d) is less than the mininum "
> +                " required stride of %d\n",
> +                ref->group->uid, ref->uid, int_cst_value (ref->group->step),
> +                PREFETCH_MINIMUM_STRIDE);
> +      return false;
> +    }
> +
>    /* For now do not issue prefetches for only first few of the
>       iterations.  */
>    if (ref->prefetch_before != PREFETCH_ALL)
> -- 
> 2.7.4
>

  reply	other threads:[~2018-01-23  9:32 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-22 13:46 [PATCH 0/2] Add a couple new options to control loop prefetch pass Luis Machado
2018-01-22 14:01 ` [PATCH 1/2] Introduce prefetch-minimum stride option Luis Machado
2018-01-23  9:46   ` Kyrill Tkachov [this message]
2018-01-23 13:23     ` Luis Machado
2018-05-01 18:30   ` Jeff Law
2018-05-07 14:10     ` Luis Machado
2018-05-07 15:15       ` H.J. Lu
2018-05-07 15:51         ` Luis Machado
2018-05-14 21:21   ` Luis Machado
2018-05-15  9:59     ` Kyrill Tkachov
2018-05-15 11:21       ` Luis Machado
2018-05-16  9:22         ` Kyrill Tkachov
2018-05-16 11:53           ` Luis Machado
2018-05-22 18:56             ` Luis Machado
2018-05-22 21:21               ` Jeff Law
2018-05-23 20:27               ` H.J. Lu
2018-05-23 22:34                 ` Luis Machado
2018-05-23 22:41                   ` H.J. Lu
2018-05-23 22:42                     ` H.J. Lu
2018-05-23 22:45                       ` H.J. Lu
2018-05-23 23:29                         ` Luis Machado
2018-05-24  2:51                           ` Jeff Law
2018-05-24 12:21                             ` Luis Machado
2018-01-22 14:10 ` [PATCH 2/2] Introduce prefetch-dynamic-strides option Luis Machado
2018-01-23  9:53   ` Kyrill Tkachov
2018-01-23 13:32     ` Luis Machado
2018-05-01 18:31   ` Jeff Law
2018-05-07 14:13     ` Luis Machado

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5A670133.4010303@foss.arm.com \
    --to=kyrylo.tkachov@foss.arm.com \
    --cc=James.Greenhalgh@arm.com \
    --cc=Richard.Earnshaw@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=luis.machado@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).