From: "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai>
To: rguenther <rguenther@suse.de>
Cc: gcc-patches <gcc-patches@gcc.gnu.org>,
richard.sandiford <richard.sandiford@arm.com>,
pan2.li <pan2.li@intel.com>
Subject: Re: Re: [PATCH V4] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE
Date: Tue, 27 Jun 2023 16:09:28 +0800 [thread overview]
Message-ID: <83AC7519E4D8FEE2+2023062716092805152420@rivai.ai> (raw)
In-Reply-To: <nycvar.YFH.7.77.849.2306270743260.4723@jbgna.fhfr.qr>
[-- Attachment #1: Type: text/plain, Size: 8431 bytes --]
Hi, Richi.
After several tries, I found a case that is "close to" have CSE opportunity in RVV for VL vectors:
void __attribute__((noinline,noclone))
foo (uint16_t *out, uint16_t *res)
{
int mask[] = { 0, 1, 0, 1, 0, 1, 0, 1 };
int i;
for (i = 0; i < 8; ++i)
{
if (mask[i])
out[i] = 33;
}
uint16_t o0 = out[0];
uint16_t o7 = out[3];
uint16_t o14 = out[6];
uint16_t o15 = out[7];
res[0] = o0;
res[2] = o7;
res[4] = o14;
res[6] = o15;
}
The Gimple IR:
_64 = .SELECT_VL (ivtmp_31, POLY_INT_CST [4, 4]);
vect__1.15_54 = .LEN_MASK_LOAD (vectp_mask.13_21, 32B, _64, { -1, ... }, 0);
mask__7.16_56 = vect__1.15_54 != { 0, ... };
.LEN_MASK_STORE (vectp_out.17_34, 16B, _64, mask__7.16_56, { 33, ... }, 0);
You can see the "len" is always variable produced by SELECT_VL so it failed to have CSE opportunity.
And I tried in ARM SVE: https://godbolt.org/z/63a6WcT9o
It also fail to have CSE opportunity.
It seems that it's difficult to have such CSE opportunity in VL vectors.
juzhe.zhong@rivai.ai
From: Richard Biener
Date: 2023-06-27 15:47
To: juzhe.zhong@rivai.ai
CC: gcc-patches; richard.sandiford; pan2.li
Subject: Re: Re: [PATCH V4] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE
On Tue, 27 Jun 2023, juzhe.zhong@rivai.ai wrote:
> Hi, Richi.
>
> When I try vector_cst_encoded_nelts (mask), the testcase I append is 2 but actual actual nunits is 8 in that case,
> then I failed to walk all elements analysis.
>
> >> The most simple thing would be to make this all conditional
> >> to constant TYPE_VECTOR_SUBPARTS. Which is also why I was
> >> asking for VL vector testcases.
> Ok, I understand your point, but RVV doesn't use LEN_MASK_STORE in intrinsics. I am not sure how to reproduce VL vectors
> in C code.
The original motivation was from unrolled vector code for the
conditional masking case. Does RVV allow MASK_STORE from
intrinsics?
Otherwise how about
for (i = 0; i < 8; ++i)
a[i] = 42;
for (i = 2; i < 6; ++i)
b[i] = a[i];
with disabling complete unrolling -fdisable-tree-cunrolli both
loops should get vectorized, hopefully not iterating(?) Then
we end up with the overlapping CSE opportunity, no? Maybe
it needs support for fixed size vectors and using VL vectors
just for the epilog.
Richard.
> Thanks.
>
>
> juzhe.zhong@rivai.ai
>
> From: Richard Biener
> Date: 2023-06-27 15:33
> To: Ju-Zhe Zhong
> CC: gcc-patches; richard.sandiford; pan2.li
> Subject: Re: [PATCH V4] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE
> On Tue, 27 Jun 2023, juzhe.zhong@rivai.ai wrote:
>
> > From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
> >
> > Hi, Richi.
> >
> > I tried to understand your last email and to refactor the do-while loop using VECTOR_CST_NELTS.
> >
> > This patch works fine for LEN_MASK_STORE and compiler can CSE redundant store.
> > I have appended testcase in this patch to test VN for LEN_MASK_STORE.
> >
> > I am not sure whether I am on the same page with you.
> >
> > Feel free to correct me, Thanks.
> >
> > gcc/ChangeLog:
> >
> > * tree-ssa-sccvn.cc (vn_reference_lookup_3): Add LEN_MASK_STORE and fix LEN_STORE
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c: New test.
> >
> > ---
> > .../rvv/autovec/partial/len_maskstore_vn-1.c | 30 +++++++++++++++++++
> > gcc/tree-ssa-sccvn.cc | 24 +++++++++++----
> > 2 files changed, 49 insertions(+), 5 deletions(-)
> > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c
> >
> > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c
> > new file mode 100644
> > index 00000000000..0b2d03693dc
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c
> > @@ -0,0 +1,30 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv32gcv_zvl256b -mabi=ilp32d --param riscv-autovec-preference=fixed-vlmax -O3 -fdump-tree-fre5" } */
> > +
> > +void __attribute__((noinline,noclone))
> > +foo (int *out, int *res)
> > +{
> > + int mask[] = { 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1 };
> > + int i;
> > + for (i = 0; i < 16; ++i)
> > + {
> > + if (mask[i])
> > + out[i] = i;
> > + }
> > + int o0 = out[0];
> > + int o7 = out[7];
> > + int o14 = out[14];
> > + int o15 = out[15];
> > + res[0] = o0;
> > + res[2] = o7;
> > + res[4] = o14;
> > + res[6] = o15;
> > +}
> > +
> > +/* Vectorization produces .LEN_MASK_STORE, unrolling will unroll the two
> > + vector iterations. FRE5 after that should be able to CSE
> > + out[7] and out[15], but leave out[0] and out[14] alone. */
> > +/* { dg-final { scan-tree-dump " = o0_\[0-9\]+;" "fre5" } } */
> > +/* { dg-final { scan-tree-dump " = 7;" "fre5" } } */
> > +/* { dg-final { scan-tree-dump " = o14_\[0-9\]+;" "fre5" } } */
> > +/* { dg-final { scan-tree-dump " = 15;" "fre5" } } */
> > diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
> > index 11061a374a2..242d82d6274 100644
> > --- a/gcc/tree-ssa-sccvn.cc
> > +++ b/gcc/tree-ssa-sccvn.cc
> > @@ -3304,6 +3304,16 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void *data_,
> > if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias))
> > return (void *)-1;
> > break;
> > + case IFN_LEN_MASK_STORE:
> > + len = gimple_call_arg (call, 2);
> > + bias = gimple_call_arg (call, 5);
> > + if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias))
> > + return (void *)-1;
> > + mask = gimple_call_arg (call, internal_fn_mask_index (fn));
> > + mask = vn_valueize (mask);
> > + if (TREE_CODE (mask) != VECTOR_CST)
> > + return (void *)-1;
> > + break;
> > default:
> > return (void *)-1;
> > }
> > @@ -3344,11 +3354,17 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void *data_,
> > tree vectype = TREE_TYPE (def_rhs);
> > unsigned HOST_WIDE_INT elsz
> > = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (vectype)));
> > + /* Set initial len value is the UINT_MAX, so mask_idx < actual_len
> > + is always true for MASK_STORE. */
> > + unsigned actual_len = UINT_MAX;
> > + if (len)
> > + actual_len = tree_to_uhwi (len) + tree_to_shwi (bias);
> > + unsigned nunits
> > + = MIN (actual_len, VECTOR_CST_NELTS (mask).coeffs[0]);
>
> No, that's not correct and what I meant. There's
> vector_cst_encoded_nelts (mask), but for example for an
> all-ones mask that would be 1. You'd then also not access
> VECTOR_CST_ELT but VECTOR_CST_ENCODED_ELT. What I'm not sure
> is how to recover the implicit walk over all elements for the
> purpose of computing start/length and how generally this will
> work for variable-length vectors where enumerating "all"
> elements touched is required for correctness.
>
> The most simple thing would be to make this all conditional
> to constant TYPE_VECTOR_SUBPARTS. Which is also why I was
> asking for VL vector testcases.
>
> Richard.
>
> > if (mask)
> > {
> > HOST_WIDE_INT start = 0, length = 0;
> > - unsigned mask_idx = 0;
> > - do
> > + for (unsigned mask_idx = 0; mask_idx < nunits; mask_idx++)
> > {
> > if (integer_zerop (VECTOR_CST_ELT (mask, mask_idx)))
> > {
> > @@ -3371,9 +3387,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void *data_,
> > }
> > else
> > length += elsz;
> > - mask_idx++;
> > }
> > - while (known_lt (mask_idx, TYPE_VECTOR_SUBPARTS (vectype)));
> > if (length != 0)
> > {
> > pd.rhs_off = start;
> > @@ -3389,7 +3403,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void *data_,
> > {
> > pd.offset = offset2i;
> > pd.size = (tree_to_uhwi (len)
> > - + -tree_to_shwi (bias)) * BITS_PER_UNIT;
> > + + tree_to_shwi (bias)) * BITS_PER_UNIT;
> > if (BYTES_BIG_ENDIAN)
> > pd.rhs_off = pd.size - tree_to_uhwi (TYPE_SIZE (vectype));
> > else
> >
>
>
--
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)
next prev parent reply other threads:[~2023-06-27 8:09 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-27 6:47 juzhe.zhong
2023-06-27 7:33 ` Richard Biener
2023-06-27 7:41 ` juzhe.zhong
2023-06-27 7:47 ` Richard Biener
2023-06-27 8:01 ` juzhe.zhong
2023-06-27 8:28 ` Richard Biener
2023-06-27 8:09 ` juzhe.zhong [this message]
2023-06-27 8:34 ` Richard Biener
2023-06-27 8:47 ` juzhe.zhong
2023-06-27 8:56 ` Richard Biener
2023-06-27 8:59 ` juzhe.zhong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=83AC7519E4D8FEE2+2023062716092805152420@rivai.ai \
--to=juzhe.zhong@rivai.ai \
--cc=gcc-patches@gcc.gnu.org \
--cc=pan2.li@intel.com \
--cc=rguenther@suse.de \
--cc=richard.sandiford@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).