Hi, Richi. When I try vector_cst_encoded_nelts (mask), the testcase I append is 2 but actual actual nunits is 8 in that case, then I failed to walk all elements analysis. >> The most simple thing would be to make this all conditional >> to constant TYPE_VECTOR_SUBPARTS. Which is also why I was >> asking for VL vector testcases. Ok, I understand your point, but RVV doesn't use LEN_MASK_STORE in intrinsics. I am not sure how to reproduce VL vectors in C code. Thanks. juzhe.zhong@rivai.ai From: Richard Biener Date: 2023-06-27 15:33 To: Ju-Zhe Zhong CC: gcc-patches; richard.sandiford; pan2.li Subject: Re: [PATCH V4] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE On Tue, 27 Jun 2023, juzhe.zhong@rivai.ai wrote: > From: Ju-Zhe Zhong > > Hi, Richi. > > I tried to understand your last email and to refactor the do-while loop using VECTOR_CST_NELTS. > > This patch works fine for LEN_MASK_STORE and compiler can CSE redundant store. > I have appended testcase in this patch to test VN for LEN_MASK_STORE. > > I am not sure whether I am on the same page with you. > > Feel free to correct me, Thanks. > > gcc/ChangeLog: > > * tree-ssa-sccvn.cc (vn_reference_lookup_3): Add LEN_MASK_STORE and fix LEN_STORE > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c: New test. > > --- > .../rvv/autovec/partial/len_maskstore_vn-1.c | 30 +++++++++++++++++++ > gcc/tree-ssa-sccvn.cc | 24 +++++++++++---- > 2 files changed, 49 insertions(+), 5 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c > > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c > new file mode 100644 > index 00000000000..0b2d03693dc > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c > @@ -0,0 +1,30 @@ > +/* { dg-do compile } */ > +/* { dg-options "-march=rv32gcv_zvl256b -mabi=ilp32d --param riscv-autovec-preference=fixed-vlmax -O3 -fdump-tree-fre5" } */ > + > +void __attribute__((noinline,noclone)) > +foo (int *out, int *res) > +{ > + int mask[] = { 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1 }; > + int i; > + for (i = 0; i < 16; ++i) > + { > + if (mask[i]) > + out[i] = i; > + } > + int o0 = out[0]; > + int o7 = out[7]; > + int o14 = out[14]; > + int o15 = out[15]; > + res[0] = o0; > + res[2] = o7; > + res[4] = o14; > + res[6] = o15; > +} > + > +/* Vectorization produces .LEN_MASK_STORE, unrolling will unroll the two > + vector iterations. FRE5 after that should be able to CSE > + out[7] and out[15], but leave out[0] and out[14] alone. */ > +/* { dg-final { scan-tree-dump " = o0_\[0-9\]+;" "fre5" } } */ > +/* { dg-final { scan-tree-dump " = 7;" "fre5" } } */ > +/* { dg-final { scan-tree-dump " = o14_\[0-9\]+;" "fre5" } } */ > +/* { dg-final { scan-tree-dump " = 15;" "fre5" } } */ > diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc > index 11061a374a2..242d82d6274 100644 > --- a/gcc/tree-ssa-sccvn.cc > +++ b/gcc/tree-ssa-sccvn.cc > @@ -3304,6 +3304,16 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void *data_, > if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias)) > return (void *)-1; > break; > + case IFN_LEN_MASK_STORE: > + len = gimple_call_arg (call, 2); > + bias = gimple_call_arg (call, 5); > + if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias)) > + return (void *)-1; > + mask = gimple_call_arg (call, internal_fn_mask_index (fn)); > + mask = vn_valueize (mask); > + if (TREE_CODE (mask) != VECTOR_CST) > + return (void *)-1; > + break; > default: > return (void *)-1; > } > @@ -3344,11 +3354,17 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void *data_, > tree vectype = TREE_TYPE (def_rhs); > unsigned HOST_WIDE_INT elsz > = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (vectype))); > + /* Set initial len value is the UINT_MAX, so mask_idx < actual_len > + is always true for MASK_STORE. */ > + unsigned actual_len = UINT_MAX; > + if (len) > + actual_len = tree_to_uhwi (len) + tree_to_shwi (bias); > + unsigned nunits > + = MIN (actual_len, VECTOR_CST_NELTS (mask).coeffs[0]); No, that's not correct and what I meant. There's vector_cst_encoded_nelts (mask), but for example for an all-ones mask that would be 1. You'd then also not access VECTOR_CST_ELT but VECTOR_CST_ENCODED_ELT. What I'm not sure is how to recover the implicit walk over all elements for the purpose of computing start/length and how generally this will work for variable-length vectors where enumerating "all" elements touched is required for correctness. The most simple thing would be to make this all conditional to constant TYPE_VECTOR_SUBPARTS. Which is also why I was asking for VL vector testcases. Richard. > if (mask) > { > HOST_WIDE_INT start = 0, length = 0; > - unsigned mask_idx = 0; > - do > + for (unsigned mask_idx = 0; mask_idx < nunits; mask_idx++) > { > if (integer_zerop (VECTOR_CST_ELT (mask, mask_idx))) > { > @@ -3371,9 +3387,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void *data_, > } > else > length += elsz; > - mask_idx++; > } > - while (known_lt (mask_idx, TYPE_VECTOR_SUBPARTS (vectype))); > if (length != 0) > { > pd.rhs_off = start; > @@ -3389,7 +3403,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void *data_, > { > pd.offset = offset2i; > pd.size = (tree_to_uhwi (len) > - + -tree_to_shwi (bias)) * BITS_PER_UNIT; > + + tree_to_shwi (bias)) * BITS_PER_UNIT; > if (BYTES_BIG_ENDIAN) > pd.rhs_off = pd.size - tree_to_uhwi (TYPE_SIZE (vectype)); > else > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)