From: "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai>
To: rguenther <rguenther@suse.de>
Cc: gcc-patches <gcc-patches@gcc.gnu.org>,
richard.sandiford <richard.sandiford@arm.com>,
pan2.li <pan2.li@intel.com>
Subject: Re: Re: [PATCH V4] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE
Date: Tue, 27 Jun 2023 15:41:31 +0800 [thread overview]
Message-ID: <2A71A0EC48A425B7+2023062715413073068911@rivai.ai> (raw)
In-Reply-To: <nycvar.YFH.7.77.849.2306270729530.4723@jbgna.fhfr.qr>
[-- Attachment #1: Type: text/plain, Size: 6233 bytes --]
Hi, Richi.
When I try vector_cst_encoded_nelts (mask), the testcase I append is 2 but actual actual nunits is 8 in that case,
then I failed to walk all elements analysis.
>> The most simple thing would be to make this all conditional
>> to constant TYPE_VECTOR_SUBPARTS. Which is also why I was
>> asking for VL vector testcases.
Ok, I understand your point, but RVV doesn't use LEN_MASK_STORE in intrinsics. I am not sure how to reproduce VL vectors
in C code.
Thanks.
juzhe.zhong@rivai.ai
From: Richard Biener
Date: 2023-06-27 15:33
To: Ju-Zhe Zhong
CC: gcc-patches; richard.sandiford; pan2.li
Subject: Re: [PATCH V4] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE
On Tue, 27 Jun 2023, juzhe.zhong@rivai.ai wrote:
> From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
>
> Hi, Richi.
>
> I tried to understand your last email and to refactor the do-while loop using VECTOR_CST_NELTS.
>
> This patch works fine for LEN_MASK_STORE and compiler can CSE redundant store.
> I have appended testcase in this patch to test VN for LEN_MASK_STORE.
>
> I am not sure whether I am on the same page with you.
>
> Feel free to correct me, Thanks.
>
> gcc/ChangeLog:
>
> * tree-ssa-sccvn.cc (vn_reference_lookup_3): Add LEN_MASK_STORE and fix LEN_STORE
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c: New test.
>
> ---
> .../rvv/autovec/partial/len_maskstore_vn-1.c | 30 +++++++++++++++++++
> gcc/tree-ssa-sccvn.cc | 24 +++++++++++----
> 2 files changed, 49 insertions(+), 5 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c
>
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c
> new file mode 100644
> index 00000000000..0b2d03693dc
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c
> @@ -0,0 +1,30 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv32gcv_zvl256b -mabi=ilp32d --param riscv-autovec-preference=fixed-vlmax -O3 -fdump-tree-fre5" } */
> +
> +void __attribute__((noinline,noclone))
> +foo (int *out, int *res)
> +{
> + int mask[] = { 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1 };
> + int i;
> + for (i = 0; i < 16; ++i)
> + {
> + if (mask[i])
> + out[i] = i;
> + }
> + int o0 = out[0];
> + int o7 = out[7];
> + int o14 = out[14];
> + int o15 = out[15];
> + res[0] = o0;
> + res[2] = o7;
> + res[4] = o14;
> + res[6] = o15;
> +}
> +
> +/* Vectorization produces .LEN_MASK_STORE, unrolling will unroll the two
> + vector iterations. FRE5 after that should be able to CSE
> + out[7] and out[15], but leave out[0] and out[14] alone. */
> +/* { dg-final { scan-tree-dump " = o0_\[0-9\]+;" "fre5" } } */
> +/* { dg-final { scan-tree-dump " = 7;" "fre5" } } */
> +/* { dg-final { scan-tree-dump " = o14_\[0-9\]+;" "fre5" } } */
> +/* { dg-final { scan-tree-dump " = 15;" "fre5" } } */
> diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
> index 11061a374a2..242d82d6274 100644
> --- a/gcc/tree-ssa-sccvn.cc
> +++ b/gcc/tree-ssa-sccvn.cc
> @@ -3304,6 +3304,16 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void *data_,
> if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias))
> return (void *)-1;
> break;
> + case IFN_LEN_MASK_STORE:
> + len = gimple_call_arg (call, 2);
> + bias = gimple_call_arg (call, 5);
> + if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias))
> + return (void *)-1;
> + mask = gimple_call_arg (call, internal_fn_mask_index (fn));
> + mask = vn_valueize (mask);
> + if (TREE_CODE (mask) != VECTOR_CST)
> + return (void *)-1;
> + break;
> default:
> return (void *)-1;
> }
> @@ -3344,11 +3354,17 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void *data_,
> tree vectype = TREE_TYPE (def_rhs);
> unsigned HOST_WIDE_INT elsz
> = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (vectype)));
> + /* Set initial len value is the UINT_MAX, so mask_idx < actual_len
> + is always true for MASK_STORE. */
> + unsigned actual_len = UINT_MAX;
> + if (len)
> + actual_len = tree_to_uhwi (len) + tree_to_shwi (bias);
> + unsigned nunits
> + = MIN (actual_len, VECTOR_CST_NELTS (mask).coeffs[0]);
No, that's not correct and what I meant. There's
vector_cst_encoded_nelts (mask), but for example for an
all-ones mask that would be 1. You'd then also not access
VECTOR_CST_ELT but VECTOR_CST_ENCODED_ELT. What I'm not sure
is how to recover the implicit walk over all elements for the
purpose of computing start/length and how generally this will
work for variable-length vectors where enumerating "all"
elements touched is required for correctness.
The most simple thing would be to make this all conditional
to constant TYPE_VECTOR_SUBPARTS. Which is also why I was
asking for VL vector testcases.
Richard.
> if (mask)
> {
> HOST_WIDE_INT start = 0, length = 0;
> - unsigned mask_idx = 0;
> - do
> + for (unsigned mask_idx = 0; mask_idx < nunits; mask_idx++)
> {
> if (integer_zerop (VECTOR_CST_ELT (mask, mask_idx)))
> {
> @@ -3371,9 +3387,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void *data_,
> }
> else
> length += elsz;
> - mask_idx++;
> }
> - while (known_lt (mask_idx, TYPE_VECTOR_SUBPARTS (vectype)));
> if (length != 0)
> {
> pd.rhs_off = start;
> @@ -3389,7 +3403,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void *data_,
> {
> pd.offset = offset2i;
> pd.size = (tree_to_uhwi (len)
> - + -tree_to_shwi (bias)) * BITS_PER_UNIT;
> + + tree_to_shwi (bias)) * BITS_PER_UNIT;
> if (BYTES_BIG_ENDIAN)
> pd.rhs_off = pd.size - tree_to_uhwi (TYPE_SIZE (vectype));
> else
>
--
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)
next prev parent reply other threads:[~2023-06-27 7:41 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-27 6:47 juzhe.zhong
2023-06-27 7:33 ` Richard Biener
2023-06-27 7:41 ` juzhe.zhong [this message]
2023-06-27 7:47 ` Richard Biener
2023-06-27 8:01 ` juzhe.zhong
2023-06-27 8:28 ` Richard Biener
2023-06-27 8:09 ` juzhe.zhong
2023-06-27 8:34 ` Richard Biener
2023-06-27 8:47 ` juzhe.zhong
2023-06-27 8:56 ` Richard Biener
2023-06-27 8:59 ` juzhe.zhong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2A71A0EC48A425B7+2023062715413073068911@rivai.ai \
--to=juzhe.zhong@rivai.ai \
--cc=gcc-patches@gcc.gnu.org \
--cc=pan2.li@intel.com \
--cc=rguenther@suse.de \
--cc=richard.sandiford@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).