From: Richard Sandiford <richard.sandiford@arm.com>
To: Richard Biener <rguenther@suse.de>
Cc: Andrew Stubbs <ams@codesourcery.com>,
gcc-patches@gcc.gnu.org, juzhe.zhong@rivai.ai
Subject: Re: [PATCH][RFT] Vectorization of first-order recurrences
Date: Mon, 17 Oct 2022 09:48:34 +0100 [thread overview]
Message-ID: <mpt5ygibztp.fsf@arm.com> (raw)
In-Reply-To: <271ron87-q0o2-5pr5-8s65-1682p750o0@fhfr.qr> (Richard Biener's message of "Fri, 14 Oct 2022 09:07:57 +0200 (CEST)")
Richard Biener <rguenther@suse.de> writes:
> On Tue, 11 Oct 2022, Richard Sandiford wrote:
>
>> Richard Biener <rguenther@suse.de> writes:
>> > On Mon, 10 Oct 2022, Andrew Stubbs wrote:
>> >> On 10/10/2022 12:03, Richard Biener wrote:
>> >> > The following picks up the prototype by Ju-Zhe Zhong for vectorizing
>> >> > first order recurrences. That solves two TSVC missed optimization PRs.
>> >> >
>> >> > There's a new scalar cycle def kind, vect_first_order_recurrence
>> >> > and it's handling of the backedge value vectorization is complicated
>> >> > by the fact that the vectorized value isn't the PHI but instead
>> >> > a (series of) permute(s) shifting in the recurring value from the
>> >> > previous iteration. I've implemented this by creating both the
>> >> > single vectorized PHI and the series of permutes when vectorizing
>> >> > the scalar PHI but leave the backedge values in both unassigned.
>> >> > The backedge values are (for the testcases) computed by a load
>> >> > which is also the place after which the permutes are inserted.
>> >> > That placement also restricts the cases we can handle (without
>> >> > resorting to code motion).
>> >> >
>> >> > I added both costing and SLP handling though SLP handling is
>> >> > restricted to the case where a single vectorized PHI is enough.
>> >> >
>> >> > Missing is epilogue handling - while prologue peeling would
>> >> > be handled transparently by adjusting iv_phi_p the epilogue
>> >> > case doesn't work with just inserting a scalar LC PHI since
>> >> > that a) keeps the scalar load live and b) that loads is the
>> >> > wrong one, it has to be the last, much like when we'd vectorize
>> >> > the LC PHI as live operation. Unfortunately LIVE
>> >> > compute/analysis happens too early before we decide on
>> >> > peeling. When using fully masked loop vectorization the
>> >> > vect-recurr-6.c works as expected though.
>> >> >
>> >> > I have tested this on x86_64 for now, but since epilogue
>> >> > handling is missing there's probably no practical cases.
>> >> > My prototype WHILE_ULT AVX512 patch can handle vect-recurr-6.c
>> >> > just fine but I didn't feel like running SPEC within SDE nor
>> >> > is the WHILE_ULT patch complete enough. Builds of SPEC 2k7
>> >> > with fully masked loops succeed (minus three cases of
>> >> > PR107096, caused by my WHILE_ULT prototype).
>> >> >
>> >> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
>> >> >
>> >> > Testing with SVE, GCN or RVV appreciated, ideas how to cleanly
>> >> > handle epilogues welcome.
>> >>
>> >> The testcases all produce correct code on GCN and pass the execution tests.
>> >>
>> >> The code isn't terribly optimal because we don't have a two-input permutation
>> >> instruction, so we permute each half separately and vec_merge the results. In
>> >> this case the first vector is always a no-op permutation so that's wasted
>> >> cycles. We'd really want a vector rotate and write-lane (or the other way
>> >> around). I think the special-case permutations can be recognised and coded
>> >> into the backend, but I don't know if we can easily tell that the first vector
>> >> is just a bunch of duplicates, when it's not constant.
>> >
>> > It's not actually a bunch of duplicates in all but the first iteration.
>> > But what you can recognize is that we're only using lane N - 1 of the
>> > first vector, so you could model the permute as extract last
>> > + shift in scalar (the extracted lane). IIRC VLA vector targets usually
>> > have something like shift the vector and set the low lane from a
>> > scalar?
>>
>> Yeah.
>>
>> > The extract lane N - 1 might be more difficult but then
>> > a rotate plus extracting lane 0 might work as well.
>>
>> I guess for SVE we should probably use SPLICE, which joins two vectors
>> and uses a predicate to select the first element that should be extracted.
>>
>> Unfortunately we don't have a way of representing "last bit set, all other
>> bits clear" as a constant though, so I guess it'll have to be hidden
>> behind unspecs.
>>
>> I meant to start SVE tests running once I'd finished for the day yesterday,
>> but forgot, sorry. Will try to test today.
>>
>> On the patch:
>>
>> + /* This is the second phase of vectorizing first-order rececurrences. An
>> + overview of the transformation is described below. Suppose we have the
>> + following loop.
>> +
>> + int32_t t = 0;
>> + for (int i = 0; i < n; ++i)
>> + {
>> + b[i] = a[i] - t;
>> + t = a[i];
>> + }
>> +
>> + There is a first-order recurrence on "a". For this loop, the shorthand
>> + scalar IR looks like:
>> +
>> + scalar.preheader:
>> + init = a[-1]
>> + br loop.body
>> +
>> + scalar.body:
>> + i = PHI <0(scalar.preheader), i+1(scalar.body)>
>> + _2 = PHI <(init(scalar.preheader), <_1(scalar.body)>
>> + _1 = a[i]
>> + b[i] = _1 - _2
>> + br cond, scalar.body, ...
>> +
>> + In this example, _2 is a recurrence because it's value depends on the
>> + previous iteration. In the first phase of vectorization, we created a
>> + temporary value for _2. We now complete the vectorization and produce the
>> + shorthand vector IR shown below (VF = 4).
>> +
>> + vector.preheader:
>> + vect_init = vect_cst(..., ..., ..., a[-1])
>> + br vector.body
>> +
>> + vector.body
>> + i = PHI <0(vector.preheader), i+4(vector.body)>
>> + vect_1 = PHI <vect_init(vector.preheader), v2(vector.body)>
>> + vect_2 = a[i, i+1, i+2, i+3];
>> + vect_3 = vector(vect_1(3), vect_2(0, 1, 2))
>> + b[i, i+1, i+2, i+3] = vect_2 - vect_3
>> + br cond, vector.body, middle.block
>> +
>> + middle.block:
>> + x = vect_2(3)
>> + br scalar.preheader
>> +
>> + scalar.ph:
>> + s_init = PHI <x(middle.block), a[-1], otherwise>
>> + br scalar.body
>> +
>> + After execution completes the vector loop, we extract the next value of
>> + the recurrence (x) to use as the initial value in the scalar loop. */
>>
>> Looks like a[-1] should be zero in the example (or t should be initialised
>> to a[-1]).
>
> Indeed. I've promoted the comment to function level and adjusted it
> to reflect what is done now (I've missed to update it from Ju-Zhes
> prototype). I have also applied your correctness fix.
>
> Re-bootstrap and regtest running on x86_64-unknown-linux-gnu.
>
> OK (given limited x86 test quality)?
LGTM FWIW.
Thanks,
Richard
>
> Thanks,
> Richard.
>
> From 054bb007b09334402f018a4b2b65aef150cd8182 Mon Sep 17 00:00:00 2001
> From: Richard Biener <rguenther@suse.de>
> Date: Thu, 6 Oct 2022 13:56:09 +0200
> Subject: [PATCH] Vectorization of first-order recurrences
> To: gcc-patches@gcc.gnu.org
>
> The following picks up the prototype by Ju-Zhe Zhong for vectorizing
> first order recurrences. That solves two TSVC missed optimization PRs.
>
> There's a new scalar cycle def kind, vect_first_order_recurrence
> and it's handling of the backedge value vectorization is complicated
> by the fact that the vectorized value isn't the PHI but instead
> a (series of) permute(s) shifting in the recurring value from the
> previous iteration. I've implemented this by creating both the
> single vectorized PHI and the series of permutes when vectorizing
> the scalar PHI but leave the backedge values in both unassigned.
> The backedge values are (for the testcases) computed by a load
> which is also the place after which the permutes are inserted.
> That placement also restricts the cases we can handle (without
> resorting to code motion).
>
> I added both costing and SLP handling though SLP handling is
> restricted to the case where a single vectorized PHI is enough.
>
> Missing is epilogue handling - while prologue peeling would
> be handled transparently by adjusting iv_phi_p the epilogue
> case doesn't work with just inserting a scalar LC PHI since
> that a) keeps the scalar load live and b) that loads is the
> wrong one, it has to be the last, much like when we'd vectorize
> the LC PHI as live operation. Unfortunately LIVE
> compute/analysis happens too early before we decide on
> peeling. When using fully masked loop vectorization the
> vect-recurr-6.c works as expected though.
>
> I have tested this on x86_64 for now, but since epilogue
> handling is missing there's probably no practical cases.
> My prototype WHILE_ULT AVX512 patch can handle vect-recurr-6.c
> just fine but I didn't feel like running SPEC within SDE nor
> is the WHILE_ULT patch complete enough.
>
> PR tree-optimization/99409
> PR tree-optimization/99394
> * tree-vectorizer.h (vect_def_type::vect_first_order_recurrence): Add.
> (stmt_vec_info_type::recurr_info_type): Likewise.
> (vectorizable_recurr): New function.
> * tree-vect-loop.cc (vect_phi_first_order_recurrence_p): New
> function.
> (vect_analyze_scalar_cycles_1): Look for first order
> recurrences.
> (vect_analyze_loop_operations): Handle them.
> (vect_transform_loop): Likewise.
> (vectorizable_recurr): New function.
> (maybe_set_vectorized_backedge_value): Handle the backedge value
> setting in the first order recurrence PHI and the permutes.
> * tree-vect-stmts.cc (vect_analyze_stmt): Handle first order
> recurrences.
> (vect_transform_stmt): Likewise.
> (vect_is_simple_use): Likewise.
> (vect_is_simple_use): Likewise.
> * tree-vect-slp.cc (vect_get_and_check_slp_defs): Likewise.
> (vect_build_slp_tree_2): Likewise.
> (vect_schedule_scc): Handle the backedge value setting in the
> first order recurrence PHI and the permutes.
>
> * gcc.dg/vect/vect-recurr-1.c: New testcase.
> * gcc.dg/vect/vect-recurr-2.c: Likewise.
> * gcc.dg/vect/vect-recurr-3.c: Likewise.
> * gcc.dg/vect/vect-recurr-4.c: Likewise.
> * gcc.dg/vect/vect-recurr-5.c: Likewise.
> * gcc.dg/vect/vect-recurr-6.c: Likewise.
> * gcc.dg/vect/tsvc/vect-tsvc-s252.c: Un-XFAIL.
> * gcc.dg/vect/tsvc/vect-tsvc-s254.c: Likewise.
> * gcc.dg/vect/tsvc/vect-tsvc-s291.c: Likewise.
>
> Co-authored-by: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
> ---
> .../gcc.dg/vect/tsvc/vect-tsvc-s252.c | 2 +-
> .../gcc.dg/vect/tsvc/vect-tsvc-s254.c | 2 +-
> .../gcc.dg/vect/tsvc/vect-tsvc-s291.c | 2 +-
> gcc/testsuite/gcc.dg/vect/vect-recurr-1.c | 38 +++
> gcc/testsuite/gcc.dg/vect/vect-recurr-2.c | 39 +++
> gcc/testsuite/gcc.dg/vect/vect-recurr-3.c | 39 +++
> gcc/testsuite/gcc.dg/vect/vect-recurr-4.c | 42 +++
> gcc/testsuite/gcc.dg/vect/vect-recurr-5.c | 43 +++
> gcc/testsuite/gcc.dg/vect/vect-recurr-6.c | 39 +++
> gcc/tree-vect-loop.cc | 281 ++++++++++++++++--
> gcc/tree-vect-slp.cc | 38 ++-
> gcc/tree-vect-stmts.cc | 17 +-
> gcc/tree-vectorizer.h | 4 +
> 13 files changed, 558 insertions(+), 28 deletions(-)
> create mode 100644 gcc/testsuite/gcc.dg/vect/vect-recurr-1.c
> create mode 100644 gcc/testsuite/gcc.dg/vect/vect-recurr-2.c
> create mode 100644 gcc/testsuite/gcc.dg/vect/vect-recurr-3.c
> create mode 100644 gcc/testsuite/gcc.dg/vect/vect-recurr-4.c
> create mode 100644 gcc/testsuite/gcc.dg/vect/vect-recurr-5.c
> create mode 100644 gcc/testsuite/gcc.dg/vect/vect-recurr-6.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s252.c b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s252.c
> index f1302b60ae5..83eaa7a8ff5 100644
> --- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s252.c
> +++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s252.c
> @@ -40,4 +40,4 @@ int main (int argc, char **argv)
> return 0;
> }
>
> -/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } } */
> +/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s254.c b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s254.c
> index bdc8a01e2a5..06e9b0a849d 100644
> --- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s254.c
> +++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s254.c
> @@ -39,4 +39,4 @@ int main (int argc, char **argv)
> return 0;
> }
>
> -/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } } */
> +/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s291.c b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s291.c
> index 0b474c2e81a..91cdc121095 100644
> --- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s291.c
> +++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s291.c
> @@ -39,4 +39,4 @@ int main (int argc, char **argv)
> return 0;
> }
>
> -/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } } */
> +/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-recurr-1.c b/gcc/testsuite/gcc.dg/vect/vect-recurr-1.c
> new file mode 100644
> index 00000000000..6eb59fdf854
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-recurr-1.c
> @@ -0,0 +1,38 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target vect_int } */
> +
> +#include "tree-vect.h"
> +
> +void __attribute__((noipa))
> +foo (int * __restrict__ a, int * __restrict__ b, int * __restrict__ c)
> +{
> + int t = *c;
> + for (int i = 0; i < 64; ++i)
> + {
> + b[i] = a[i] - t;
> + t = a[i];
> + }
> +}
> +
> +int a[64], b[64];
> +
> +int
> +main ()
> +{
> + check_vect ();
> + for (int i = 0; i < 64; ++i)
> + {
> + a[i] = i;
> + __asm__ volatile ("" ::: "memory");
> + }
> + int c = 7;
> + foo (a, b, &c);
> + for (int i = 1; i < 64; ++i)
> + if (b[i] != a[i] - a[i-1])
> + abort ();
> + if (b[0] != -7)
> + abort ();
> + return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-recurr-2.c b/gcc/testsuite/gcc.dg/vect/vect-recurr-2.c
> new file mode 100644
> index 00000000000..97efaaa38bc
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-recurr-2.c
> @@ -0,0 +1,39 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target vect_int } */
> +
> +#include "tree-vect.h"
> +
> +void __attribute__((noipa))
> +foo (int * __restrict__ a, short * __restrict__ b, int * __restrict__ c)
> +{
> + int t = *c;
> + for (int i = 0; i < 64; ++i)
> + {
> + b[i] = a[i] - t;
> + t = a[i];
> + }
> +}
> +
> +int a[64];
> +short b[64];
> +
> +int
> +main ()
> +{
> + check_vect ();
> + for (int i = 0; i < 64; ++i)
> + {
> + a[i] = i;
> + __asm__ volatile ("" ::: "memory");
> + }
> + int c = 7;
> + foo (a, b, &c);
> + for (int i = 1; i < 64; ++i)
> + if (b[i] != a[i] - a[i-1])
> + abort ();
> + if (b[0] != -7)
> + abort ();
> + return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-recurr-3.c b/gcc/testsuite/gcc.dg/vect/vect-recurr-3.c
> new file mode 100644
> index 00000000000..621a5d8a257
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-recurr-3.c
> @@ -0,0 +1,39 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target vect_int } */
> +
> +#include "tree-vect.h"
> +
> +void __attribute__((noipa))
> +foo (int * __restrict__ a, signed char * __restrict__ b, int * __restrict__ c)
> +{
> + int t = *c;
> + for (int i = 0; i < 64; ++i)
> + {
> + b[i] = a[i] - t;
> + t = a[i];
> + }
> +}
> +
> +int a[64];
> +signed char b[64];
> +
> +int
> +main ()
> +{
> + check_vect ();
> + for (int i = 0; i < 64; ++i)
> + {
> + a[i] = i;
> + __asm__ volatile ("" ::: "memory");
> + }
> + int c = 7;
> + foo (a, b, &c);
> + for (int i = 1; i < 64; ++i)
> + if (b[i] != a[i] - a[i-1])
> + abort ();
> + if (b[0] != -7)
> + abort ();
> + return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-recurr-4.c b/gcc/testsuite/gcc.dg/vect/vect-recurr-4.c
> new file mode 100644
> index 00000000000..f6dbc494a62
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-recurr-4.c
> @@ -0,0 +1,42 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target vect_int } */
> +
> +#include "tree-vect.h"
> +
> +void __attribute__((noipa))
> +foo (int * __restrict__ a, int * __restrict__ b, int * __restrict__ c)
> +{
> + int t1 = *c;
> + int t2 = *c;
> + for (int i = 0; i < 64; i+=2)
> + {
> + b[i] = a[i] - t1;
> + t1 = a[i];
> + b[i+1] = a[i+1] - t2;
> + t2 = a[i+1];
> + }
> +}
> +
> +int a[64], b[64];
> +
> +int
> +main ()
> +{
> + check_vect ();
> + for (int i = 0; i < 64; ++i)
> + {
> + a[i] = i;
> + __asm__ volatile ("" ::: "memory");
> + }
> + int c = 7;
> + foo (a, b, &c);
> + for (int i = 2; i < 64; i+=2)
> + if (b[i] != a[i] - a[i-2]
> + || b[i+1] != a[i+1] - a[i-1])
> + abort ();
> + if (b[0] != -7 || b[1] != -6)
> + abort ();
> + return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-recurr-5.c b/gcc/testsuite/gcc.dg/vect/vect-recurr-5.c
> new file mode 100644
> index 00000000000..19c56df9e83
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-recurr-5.c
> @@ -0,0 +1,43 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target vect_int } */
> +
> +#include "tree-vect.h"
> +
> +void __attribute__((noipa))
> +foo (int * __restrict__ a, short * __restrict__ b, int * __restrict__ c)
> +{
> + int t1 = *c;
> + int t2 = *c;
> + for (int i = 0; i < 64; i+=2)
> + {
> + b[i] = a[i] - t1;
> + t1 = a[i];
> + b[i+1] = a[i+1] - t2;
> + t2 = a[i+1];
> + }
> +}
> +
> +int a[64];
> +short b[64];
> +
> +int
> +main ()
> +{
> + check_vect ();
> + for (int i = 0; i < 64; ++i)
> + {
> + a[i] = i;
> + __asm__ volatile ("" ::: "memory");
> + }
> + int c = 7;
> + foo (a, b, &c);
> + for (int i = 2; i < 64; i+=2)
> + if (b[i] != a[i] - a[i-2]
> + || b[i+1] != a[i+1] - a[i-1])
> + abort ();
> + if (b[0] != -7 || b[1] != -6)
> + abort ();
> + return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-recurr-6.c b/gcc/testsuite/gcc.dg/vect/vect-recurr-6.c
> new file mode 100644
> index 00000000000..e7712680853
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-recurr-6.c
> @@ -0,0 +1,39 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target vect_int } */
> +
> +#include "tree-vect.h"
> +
> +void __attribute__((noipa))
> +foo (int * __restrict__ a, int * __restrict__ b, int * __restrict__ c, int n)
> +{
> + int t = *c;
> + for (int i = 0; i < n; ++i)
> + {
> + b[i] = a[i] - t;
> + t = a[i];
> + }
> +}
> +
> +int a[64], b[64];
> +
> +int
> +main ()
> +{
> + check_vect ();
> + for (int i = 0; i < 64; ++i)
> + {
> + a[i] = i;
> + __asm__ volatile ("" ::: "memory");
> + }
> + int c = 7;
> + foo (a, b, &c, 63);
> + for (int i = 1; i < 63; ++i)
> + if (b[i] != a[i] - a[i-1])
> + abort ();
> + if (b[0] != -7)
> + abort ();
> + return 0;
> +}
> +
> +/* ??? We miss epilogue handling for first order recurrences. */
> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { target vect_fully_masked } } } */
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 98a943d8a4b..63e86540d12 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -529,6 +529,45 @@ vect_inner_phi_in_double_reduction_p (loop_vec_info loop_vinfo, gphi *phi)
> return false;
> }
>
> +/* Returns true if Phi is a first-order recurrence. A first-order
> + recurrence is a non-reduction recurrence relation in which the value of
> + the recurrence in the current loop iteration equals a value defined in
> + the previous iteration. */
> +
> +static bool
> +vect_phi_first_order_recurrence_p (loop_vec_info loop_vinfo, class loop *loop,
> + gphi *phi)
> +{
> + /* Ensure the loop latch definition is from within the loop. */
> + edge latch = loop_latch_edge (loop);
> + tree ldef = PHI_ARG_DEF_FROM_EDGE (phi, latch);
> + if (TREE_CODE (ldef) != SSA_NAME
> + || SSA_NAME_IS_DEFAULT_DEF (ldef)
> + || !flow_bb_inside_loop_p (loop, gimple_bb (SSA_NAME_DEF_STMT (ldef))))
> + return false;
> +
> + tree def = gimple_phi_result (phi);
> +
> + /* Ensure every use_stmt of the phi node is dominated by the latch
> + definition. */
> + imm_use_iterator imm_iter;
> + use_operand_p use_p;
> + FOR_EACH_IMM_USE_FAST (use_p, imm_iter, def)
> + if (!is_gimple_debug (USE_STMT (use_p))
> + && (SSA_NAME_DEF_STMT (ldef) == USE_STMT (use_p)
> + || !vect_stmt_dominates_stmt_p (SSA_NAME_DEF_STMT (ldef),
> + USE_STMT (use_p))))
> + return false;
> +
> + /* First-order recurrence autovectorization needs shuffle vector. */
> + tree scalar_type = TREE_TYPE (def);
> + tree vectype = get_vectype_for_scalar_type (loop_vinfo, scalar_type);
> + if (!vectype)
> + return false;
> +
> + return true;
> +}
> +
> /* Function vect_analyze_scalar_cycles_1.
>
> Examine the cross iteration def-use cycles of scalar variables
> @@ -666,6 +705,8 @@ vect_analyze_scalar_cycles_1 (loop_vec_info loop_vinfo, class loop *loop,
> }
> }
> }
> + else if (vect_phi_first_order_recurrence_p (loop_vinfo, loop, phi))
> + STMT_VINFO_DEF_TYPE (stmt_vinfo) = vect_first_order_recurrence;
> else
> if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> @@ -1810,7 +1851,8 @@ vect_analyze_loop_operations (loop_vec_info loop_vinfo)
>
> if ((STMT_VINFO_RELEVANT (stmt_info) == vect_used_in_scope
> || STMT_VINFO_LIVE_P (stmt_info))
> - && STMT_VINFO_DEF_TYPE (stmt_info) != vect_induction_def)
> + && STMT_VINFO_DEF_TYPE (stmt_info) != vect_induction_def
> + && STMT_VINFO_DEF_TYPE (stmt_info) != vect_first_order_recurrence)
> /* A scalar-dependence cycle that we don't support. */
> return opt_result::failure_at (phi,
> "not vectorized:"
> @@ -1831,6 +1873,11 @@ vect_analyze_loop_operations (loop_vec_info loop_vinfo)
> && ! PURE_SLP_STMT (stmt_info))
> ok = vectorizable_reduction (loop_vinfo,
> stmt_info, NULL, NULL, &cost_vec);
> + else if ((STMT_VINFO_DEF_TYPE (stmt_info)
> + == vect_first_order_recurrence)
> + && ! PURE_SLP_STMT (stmt_info))
> + ok = vectorizable_recurr (loop_vinfo, stmt_info, NULL, NULL,
> + &cost_vec);
> }
>
> /* SLP PHIs are tested by vect_slp_analyze_node_operations. */
> @@ -8290,6 +8337,178 @@ vectorizable_phi (vec_info *,
> return true;
> }
>
> +/* Vectorizes first order recurrences. An overview of the transformation
> + is described below. Suppose we have the following loop.
> +
> + int t = 0;
> + for (int i = 0; i < n; ++i)
> + {
> + b[i] = a[i] - t;
> + t = a[i];
> + }
> +
> + There is a first-order recurrence on 'a'. For this loop, the scalar IR
> + looks (simplified) like:
> +
> + scalar.preheader:
> + init = 0;
> +
> + scalar.body:
> + i = PHI <0(scalar.preheader), i+1(scalar.body)>
> + _2 = PHI <(init(scalar.preheader), <_1(scalar.body)>
> + _1 = a[i]
> + b[i] = _1 - _2
> + if (i < n) goto scalar.body
> +
> + In this example, _2 is a recurrence because it's value depends on the
> + previous iteration. We vectorize this as (VF = 4)
> +
> + vector.preheader:
> + vect_init = vect_cst(..., ..., ..., 0)
> +
> + vector.body
> + i = PHI <0(vector.preheader), i+4(vector.body)>
> + vect_1 = PHI <vect_init(vector.preheader), v2(vector.body)>
> + vect_2 = a[i, i+1, i+2, i+3];
> + vect_3 = vec_perm (vect_1, vect_2, { 3, 4, 5, 6 })
> + b[i, i+1, i+2, i+3] = vect_2 - vect_3
> + if (..) goto vector.body
> +
> + In this function, vectorizable_recurr, we code generate both the
> + vector PHI node and the permute since those together compute the
> + vectorized value of the scalar PHI. We do not yet have the
> + backedge value to fill in there nor into the vec_perm. Those
> + are filled in maybe_set_vectorized_backedge_value and
> + vect_schedule_scc.
> +
> + TODO: Since the scalar loop does not have a use of the recurrence
> + outside of the loop the natural way to implement peeling via
> + vectorizing the live value doesn't work. For now peeling of loops
> + with a recurrence is not implemented. For SLP the supported cases
> + are restricted to those requiring a single vector recurrence PHI. */
> +
> +bool
> +vectorizable_recurr (loop_vec_info loop_vinfo, stmt_vec_info stmt_info,
> + gimple **vec_stmt, slp_tree slp_node,
> + stmt_vector_for_cost *cost_vec)
> +{
> + if (!loop_vinfo || !is_a<gphi *> (stmt_info->stmt))
> + return false;
> +
> + gphi *phi = as_a<gphi *> (stmt_info->stmt);
> +
> + /* So far we only support first-order recurrence auto-vectorization. */
> + if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_first_order_recurrence)
> + return false;
> +
> + tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> + unsigned ncopies;
> + if (slp_node)
> + ncopies = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
> + else
> + ncopies = vect_get_num_copies (loop_vinfo, vectype);
> + poly_int64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
> + unsigned dist = slp_node ? SLP_TREE_LANES (slp_node) : 1;
> + /* We need to be able to make progress with a single vector. */
> + if (maybe_gt (dist * 2, nunits))
> + {
> + if (dump_enabled_p ())
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> + "first order recurrence exceeds half of "
> + "a vector\n");
> + return false;
> + }
> +
> + /* First-order recurrence autovectorization needs to handle permutation
> + with indices = [nunits-1, nunits, nunits+1, ...]. */
> + vec_perm_builder sel (nunits, 1, 3);
> + for (int i = 0; i < 3; ++i)
> + sel.quick_push (nunits - dist + i);
> + vec_perm_indices indices (sel, 2, nunits);
> +
> + if (!vec_stmt) /* transformation not required. */
> + {
> + if (!can_vec_perm_const_p (TYPE_MODE (vectype), TYPE_MODE (vectype),
> + indices))
> + return false;
> +
> + if (slp_node)
> + {
> + /* We eventually need to set a vector type on invariant
> + arguments. */
> + unsigned j;
> + slp_tree child;
> + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (slp_node), j, child)
> + if (!vect_maybe_update_slp_op_vectype
> + (child, SLP_TREE_VECTYPE (slp_node)))
> + {
> + if (dump_enabled_p ())
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> + "incompatible vector types for "
> + "invariants\n");
> + return false;
> + }
> + }
> + /* The recurrence costs the initialization vector and one permute
> + for each copy. */
> + unsigned prologue_cost = record_stmt_cost (cost_vec, 1, scalar_to_vec,
> + stmt_info, 0, vect_prologue);
> + unsigned inside_cost = record_stmt_cost (cost_vec, ncopies, vector_stmt,
> + stmt_info, 0, vect_body);
> + if (dump_enabled_p ())
> + dump_printf_loc (MSG_NOTE, vect_location,
> + "vectorizable_recurr: inside_cost = %d, "
> + "prologue_cost = %d .\n", inside_cost,
> + prologue_cost);
> +
> + STMT_VINFO_TYPE (stmt_info) = recurr_info_type;
> + return true;
> + }
> +
> + edge pe = loop_preheader_edge (LOOP_VINFO_LOOP (loop_vinfo));
> + basic_block bb = gimple_bb (phi);
> + tree preheader = PHI_ARG_DEF_FROM_EDGE (phi, pe);
> + tree vec_init = build_vector_from_val (vectype, preheader);
> + vec_init = vect_init_vector (loop_vinfo, stmt_info, vec_init, vectype, NULL);
> +
> + /* Create the vectorized first-order PHI node. */
> + tree vec_dest = vect_get_new_vect_var (vectype,
> + vect_simple_var, "vec_recur_");
> + gphi *new_phi = create_phi_node (vec_dest, bb);
> + add_phi_arg (new_phi, vec_init, pe, UNKNOWN_LOCATION);
> +
> + /* Insert shuffles the first-order recurrence autovectorization.
> + result = VEC_PERM <vec_recur, vect_1, index[nunits-1, nunits, ...]>. */
> + tree perm = vect_gen_perm_mask_checked (vectype, indices);
> +
> + /* Insert the required permute after the latch definition. The
> + second and later operands are tentative and will be updated when we have
> + vectorized the latch definition. */
> + edge le = loop_latch_edge (LOOP_VINFO_LOOP (loop_vinfo));
> + gimple_stmt_iterator gsi2
> + = gsi_for_stmt (SSA_NAME_DEF_STMT (PHI_ARG_DEF_FROM_EDGE (phi, le)));
> + gsi_next (&gsi2);
> +
> + for (unsigned i = 0; i < ncopies; ++i)
> + {
> + vec_dest = make_ssa_name (vectype);
> + gassign *vperm
> + = gimple_build_assign (vec_dest, VEC_PERM_EXPR,
> + i == 0 ? gimple_phi_result (new_phi) : NULL,
> + NULL, perm);
> + vect_finish_stmt_generation (loop_vinfo, stmt_info, vperm, &gsi2);
> +
> + if (slp_node)
> + SLP_TREE_VEC_STMTS (slp_node).quick_push (vperm);
> + else
> + STMT_VINFO_VEC_STMTS (stmt_info).safe_push (vperm);
> + }
> +
> + if (!slp_node)
> + *vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];
> + return true;
> +}
> +
> /* Return true if VECTYPE represents a vector that requires lowering
> by the vector lowering pass. */
>
> @@ -10242,27 +10461,53 @@ maybe_set_vectorized_backedge_value (loop_vec_info loop_vinfo,
> imm_use_iterator iter;
> use_operand_p use_p;
> FOR_EACH_IMM_USE_FAST (use_p, iter, def)
> - if (gphi *phi = dyn_cast <gphi *> (USE_STMT (use_p)))
> - if (gimple_bb (phi)->loop_father->header == gimple_bb (phi)
> - && (phi_info = loop_vinfo->lookup_stmt (phi))
> - && STMT_VINFO_RELEVANT_P (phi_info)
> - && VECTORIZABLE_CYCLE_DEF (STMT_VINFO_DEF_TYPE (phi_info))
> + {
> + gphi *phi = dyn_cast <gphi *> (USE_STMT (use_p));
> + if (!phi)
> + continue;
> + if (!(gimple_bb (phi)->loop_father->header == gimple_bb (phi)
> + && (phi_info = loop_vinfo->lookup_stmt (phi))
> + && STMT_VINFO_RELEVANT_P (phi_info)))
> + continue;
> + loop_p loop = gimple_bb (phi)->loop_father;
> + edge e = loop_latch_edge (loop);
> + if (PHI_ARG_DEF_FROM_EDGE (phi, e) != def)
> + continue;
> +
> + if (VECTORIZABLE_CYCLE_DEF (STMT_VINFO_DEF_TYPE (phi_info))
> && STMT_VINFO_REDUC_TYPE (phi_info) != FOLD_LEFT_REDUCTION
> && STMT_VINFO_REDUC_TYPE (phi_info) != EXTRACT_LAST_REDUCTION)
> {
> - loop_p loop = gimple_bb (phi)->loop_father;
> - edge e = loop_latch_edge (loop);
> - if (PHI_ARG_DEF_FROM_EDGE (phi, e) == def)
> + vec<gimple *> &phi_defs = STMT_VINFO_VEC_STMTS (phi_info);
> + vec<gimple *> &latch_defs = STMT_VINFO_VEC_STMTS (def_stmt_info);
> + gcc_assert (phi_defs.length () == latch_defs.length ());
> + for (unsigned i = 0; i < phi_defs.length (); ++i)
> + add_phi_arg (as_a <gphi *> (phi_defs[i]),
> + gimple_get_lhs (latch_defs[i]), e,
> + gimple_phi_arg_location (phi, e->dest_idx));
> + }
> + else if (STMT_VINFO_DEF_TYPE (phi_info) == vect_first_order_recurrence)
> + {
> + /* For first order recurrences we have to update both uses of
> + the latch definition, the one in the PHI node and the one
> + in the generated VEC_PERM_EXPR. */
> + vec<gimple *> &phi_defs = STMT_VINFO_VEC_STMTS (phi_info);
> + vec<gimple *> &latch_defs = STMT_VINFO_VEC_STMTS (def_stmt_info);
> + gcc_assert (phi_defs.length () == latch_defs.length ());
> + tree phidef = gimple_assign_rhs1 (phi_defs[0]);
> + gphi *vphi = as_a <gphi *> (SSA_NAME_DEF_STMT (phidef));
> + for (unsigned i = 0; i < phi_defs.length (); ++i)
> {
> - vec<gimple *> &phi_defs = STMT_VINFO_VEC_STMTS (phi_info);
> - vec<gimple *> &latch_defs = STMT_VINFO_VEC_STMTS (def_stmt_info);
> - gcc_assert (phi_defs.length () == latch_defs.length ());
> - for (unsigned i = 0; i < phi_defs.length (); ++i)
> - add_phi_arg (as_a <gphi *> (phi_defs[i]),
> - gimple_get_lhs (latch_defs[i]), e,
> - gimple_phi_arg_location (phi, e->dest_idx));
> + gassign *perm = as_a <gassign *> (phi_defs[i]);
> + if (i > 0)
> + gimple_assign_set_rhs1 (perm, gimple_get_lhs (latch_defs[i-1]));
> + gimple_assign_set_rhs2 (perm, gimple_get_lhs (latch_defs[i]));
> + update_stmt (perm);
> }
> + add_phi_arg (vphi, gimple_get_lhs (latch_defs.last ()), e,
> + gimple_phi_arg_location (phi, e->dest_idx));
> }
> + }
> }
>
> /* Vectorize STMT_INFO if relevant, inserting any new instructions before GSI.
> @@ -10671,6 +10916,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
> || STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def
> || STMT_VINFO_DEF_TYPE (stmt_info) == vect_double_reduction_def
> || STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle
> + || STMT_VINFO_DEF_TYPE (stmt_info) == vect_first_order_recurrence
> || STMT_VINFO_DEF_TYPE (stmt_info) == vect_internal_def)
> && ! PURE_SLP_STMT (stmt_info))
> {
> @@ -10696,7 +10942,8 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
> || STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def
> || STMT_VINFO_DEF_TYPE (stmt_info) == vect_double_reduction_def
> || STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle
> - || STMT_VINFO_DEF_TYPE (stmt_info) == vect_internal_def)
> + || STMT_VINFO_DEF_TYPE (stmt_info) == vect_internal_def
> + || STMT_VINFO_DEF_TYPE (stmt_info) == vect_first_order_recurrence)
> && ! PURE_SLP_STMT (stmt_info))
> maybe_set_vectorized_backedge_value (loop_vinfo, stmt_info);
> }
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 229f2663ebc..cea5d50da92 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -693,6 +693,7 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned char swap,
> case vect_reduction_def:
> case vect_induction_def:
> case vect_nested_cycle:
> + case vect_first_order_recurrence:
> break;
>
> default:
> @@ -1732,7 +1733,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
> }
> else if (def_type == vect_reduction_def
> || def_type == vect_double_reduction_def
> - || def_type == vect_nested_cycle)
> + || def_type == vect_nested_cycle
> + || def_type == vect_first_order_recurrence)
> {
> /* Else def types have to match. */
> stmt_vec_info other_info;
> @@ -1746,7 +1748,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
> }
> class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> /* Reduction initial values are not explicitely represented. */
> - if (!nested_in_vect_loop_p (loop, stmt_info))
> + if (def_type != vect_first_order_recurrence
> + && !nested_in_vect_loop_p (loop, stmt_info))
> skip_args[loop_preheader_edge (loop)->dest_idx] = true;
> /* Reduction chain backedge defs are filled manually.
> ??? Need a better way to identify a SLP reduction chain PHI.
> @@ -9187,11 +9190,34 @@ vect_schedule_scc (vec_info *vinfo, slp_tree node, slp_instance instance,
> child = SLP_TREE_CHILDREN (phi_node)[dest_idx];
> if (!child || SLP_TREE_DEF_TYPE (child) != vect_internal_def)
> continue;
> + unsigned n = SLP_TREE_VEC_STMTS (phi_node).length ();
> /* Simply fill all args. */
> - for (unsigned i = 0; i < SLP_TREE_VEC_STMTS (phi_node).length (); ++i)
> - add_phi_arg (as_a <gphi *> (SLP_TREE_VEC_STMTS (phi_node)[i]),
> - vect_get_slp_vect_def (child, i),
> - e, gimple_phi_arg_location (phi, dest_idx));
> + if (STMT_VINFO_DEF_TYPE (SLP_TREE_REPRESENTATIVE (phi_node))
> + != vect_first_order_recurrence)
> + for (unsigned i = 0; i < n; ++i)
> + add_phi_arg (as_a <gphi *> (SLP_TREE_VEC_STMTS (phi_node)[i]),
> + vect_get_slp_vect_def (child, i),
> + e, gimple_phi_arg_location (phi, dest_idx));
> + else
> + {
> + /* Unless it is a first order recurrence which needs
> + args filled in for both the PHI node and the permutes. */
> + gimple *perm = SLP_TREE_VEC_STMTS (phi_node)[0];
> + gimple *rphi = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (perm));
> + add_phi_arg (as_a <gphi *> (rphi),
> + vect_get_slp_vect_def (child, n - 1),
> + e, gimple_phi_arg_location (phi, dest_idx));
> + for (unsigned i = 0; i < n; ++i)
> + {
> + gimple *perm = SLP_TREE_VEC_STMTS (phi_node)[i];
> + if (i > 0)
> + gimple_assign_set_rhs1 (perm,
> + vect_get_slp_vect_def (child, i - 1));
> + gimple_assign_set_rhs2 (perm,
> + vect_get_slp_vect_def (child, i));
> + update_stmt (perm);
> + }
> + }
> }
> }
> }
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index c8d1efc45e5..4e0d75e0d75 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -11176,6 +11176,7 @@ vect_analyze_stmt (vec_info *vinfo,
> break;
>
> case vect_induction_def:
> + case vect_first_order_recurrence:
> gcc_assert (!bb_vinfo);
> break;
>
> @@ -11234,7 +11235,9 @@ vect_analyze_stmt (vec_info *vinfo,
> || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
> cost_vec)
> || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
> - stmt_info, NULL, node));
> + stmt_info, NULL, node)
> + || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
> + stmt_info, NULL, node, cost_vec));
> else
> {
> if (bb_vinfo)
> @@ -11404,6 +11407,12 @@ vect_transform_stmt (vec_info *vinfo,
> gcc_assert (done);
> break;
>
> + case recurr_info_type:
> + done = vectorizable_recurr (as_a <loop_vec_info> (vinfo),
> + stmt_info, &vec_stmt, slp_node, NULL);
> + gcc_assert (done);
> + break;
> +
> case phi_info_type:
> done = vectorizable_phi (vinfo, stmt_info, &vec_stmt, slp_node, NULL);
> gcc_assert (done);
> @@ -11804,6 +11813,9 @@ vect_is_simple_use (tree operand, vec_info *vinfo, enum vect_def_type *dt,
> case vect_nested_cycle:
> dump_printf (MSG_NOTE, "nested cycle\n");
> break;
> + case vect_first_order_recurrence:
> + dump_printf (MSG_NOTE, "first order recurrence\n");
> + break;
> case vect_unknown_def_type:
> dump_printf (MSG_NOTE, "unknown\n");
> break;
> @@ -11852,7 +11864,8 @@ vect_is_simple_use (tree operand, vec_info *vinfo, enum vect_def_type *dt,
> || *dt == vect_induction_def
> || *dt == vect_reduction_def
> || *dt == vect_double_reduction_def
> - || *dt == vect_nested_cycle)
> + || *dt == vect_nested_cycle
> + || *dt == vect_first_order_recurrence)
> {
> *vectype = STMT_VINFO_VECTYPE (def_stmt_info);
> gcc_assert (*vectype != NULL_TREE);
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index 4870c754499..016961da851 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -65,6 +65,7 @@ enum vect_def_type {
> vect_reduction_def,
> vect_double_reduction_def,
> vect_nested_cycle,
> + vect_first_order_recurrence,
> vect_unknown_def_type
> };
>
> @@ -1027,6 +1028,7 @@ enum stmt_vec_info_type {
> cycle_phi_info_type,
> lc_phi_info_type,
> phi_info_type,
> + recurr_info_type,
> loop_exit_ctrl_vec_info_type
> };
>
> @@ -2331,6 +2333,8 @@ extern bool vectorizable_lc_phi (loop_vec_info, stmt_vec_info,
> gimple **, slp_tree);
> extern bool vectorizable_phi (vec_info *, stmt_vec_info, gimple **, slp_tree,
> stmt_vector_for_cost *);
> +extern bool vectorizable_recurr (loop_vec_info, stmt_vec_info,
> + gimple **, slp_tree, stmt_vector_for_cost *);
> extern bool vect_emulated_vector_p (tree);
> extern bool vect_can_vectorize_without_simd_p (tree_code);
> extern bool vect_can_vectorize_without_simd_p (code_helper);
next prev parent reply other threads:[~2022-10-17 8:48 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-10 11:03 Richard Biener
2022-10-10 11:13 ` juzhe.zhong
2022-10-10 13:57 ` Andrew Stubbs
2022-10-10 14:08 ` 钟居哲
2022-10-11 7:01 ` Richard Biener
2022-10-11 8:42 ` Richard Sandiford
2022-10-14 7:07 ` Richard Biener
2022-10-14 7:20 ` juzhe.zhong
2022-10-14 9:42 ` Andrew Stubbs
2022-10-14 9:46 ` Richard Biener
2022-10-17 8:48 ` Richard Sandiford [this message]
2022-10-11 8:34 ` juzhe.zhong
2022-10-17 12:14 ` Richard Biener
2022-10-12 9:48 ` Richard Sandiford
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=mpt5ygibztp.fsf@arm.com \
--to=richard.sandiford@arm.com \
--cc=ams@codesourcery.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=juzhe.zhong@rivai.ai \
--cc=rguenther@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).