Hi, Richi. After several tries, I found a case that is "close to" have CSE opportunity in RVV for VL vectors: void __attribute__((noinline,noclone)) foo (uint16_t *out, uint16_t *res) { int mask[] = { 0, 1, 0, 1, 0, 1, 0, 1 }; int i; for (i = 0; i < 8; ++i) { if (mask[i]) out[i] = 33; } uint16_t o0 = out[0]; uint16_t o7 = out[3]; uint16_t o14 = out[6]; uint16_t o15 = out[7]; res[0] = o0; res[2] = o7; res[4] = o14; res[6] = o15; } The Gimple IR: _64 = .SELECT_VL (ivtmp_31, POLY_INT_CST [4, 4]); vect__1.15_54 = .LEN_MASK_LOAD (vectp_mask.13_21, 32B, _64, { -1, ... }, 0); mask__7.16_56 = vect__1.15_54 != { 0, ... }; .LEN_MASK_STORE (vectp_out.17_34, 16B, _64, mask__7.16_56, { 33, ... }, 0); You can see the "len" is always variable produced by SELECT_VL so it failed to have CSE opportunity. And I tried in ARM SVE: https://godbolt.org/z/63a6WcT9o It also fail to have CSE opportunity. It seems that it's difficult to have such CSE opportunity in VL vectors. juzhe.zhong@rivai.ai From: Richard Biener Date: 2023-06-27 15:47 To: juzhe.zhong@rivai.ai CC: gcc-patches; richard.sandiford; pan2.li Subject: Re: Re: [PATCH V4] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE On Tue, 27 Jun 2023, juzhe.zhong@rivai.ai wrote: > Hi, Richi. > > When I try vector_cst_encoded_nelts (mask), the testcase I append is 2 but actual actual nunits is 8 in that case, > then I failed to walk all elements analysis. > > >> The most simple thing would be to make this all conditional > >> to constant TYPE_VECTOR_SUBPARTS. Which is also why I was > >> asking for VL vector testcases. > Ok, I understand your point, but RVV doesn't use LEN_MASK_STORE in intrinsics. I am not sure how to reproduce VL vectors > in C code. The original motivation was from unrolled vector code for the conditional masking case. Does RVV allow MASK_STORE from intrinsics? Otherwise how about for (i = 0; i < 8; ++i) a[i] = 42; for (i = 2; i < 6; ++i) b[i] = a[i]; with disabling complete unrolling -fdisable-tree-cunrolli both loops should get vectorized, hopefully not iterating(?) Then we end up with the overlapping CSE opportunity, no? Maybe it needs support for fixed size vectors and using VL vectors just for the epilog. Richard. > Thanks. > > > juzhe.zhong@rivai.ai > > From: Richard Biener > Date: 2023-06-27 15:33 > To: Ju-Zhe Zhong > CC: gcc-patches; richard.sandiford; pan2.li > Subject: Re: [PATCH V4] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE > On Tue, 27 Jun 2023, juzhe.zhong@rivai.ai wrote: > > > From: Ju-Zhe Zhong > > > > Hi, Richi. > > > > I tried to understand your last email and to refactor the do-while loop using VECTOR_CST_NELTS. > > > > This patch works fine for LEN_MASK_STORE and compiler can CSE redundant store. > > I have appended testcase in this patch to test VN for LEN_MASK_STORE. > > > > I am not sure whether I am on the same page with you. > > > > Feel free to correct me, Thanks. > > > > gcc/ChangeLog: > > > > * tree-ssa-sccvn.cc (vn_reference_lookup_3): Add LEN_MASK_STORE and fix LEN_STORE > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c: New test. > > > > --- > > .../rvv/autovec/partial/len_maskstore_vn-1.c | 30 +++++++++++++++++++ > > gcc/tree-ssa-sccvn.cc | 24 +++++++++++---- > > 2 files changed, 49 insertions(+), 5 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c > > > > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c > > new file mode 100644 > > index 00000000000..0b2d03693dc > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c > > @@ -0,0 +1,30 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-march=rv32gcv_zvl256b -mabi=ilp32d --param riscv-autovec-preference=fixed-vlmax -O3 -fdump-tree-fre5" } */ > > + > > +void __attribute__((noinline,noclone)) > > +foo (int *out, int *res) > > +{ > > + int mask[] = { 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1 }; > > + int i; > > + for (i = 0; i < 16; ++i) > > + { > > + if (mask[i]) > > + out[i] = i; > > + } > > + int o0 = out[0]; > > + int o7 = out[7]; > > + int o14 = out[14]; > > + int o15 = out[15]; > > + res[0] = o0; > > + res[2] = o7; > > + res[4] = o14; > > + res[6] = o15; > > +} > > + > > +/* Vectorization produces .LEN_MASK_STORE, unrolling will unroll the two > > + vector iterations. FRE5 after that should be able to CSE > > + out[7] and out[15], but leave out[0] and out[14] alone. */ > > +/* { dg-final { scan-tree-dump " = o0_\[0-9\]+;" "fre5" } } */ > > +/* { dg-final { scan-tree-dump " = 7;" "fre5" } } */ > > +/* { dg-final { scan-tree-dump " = o14_\[0-9\]+;" "fre5" } } */ > > +/* { dg-final { scan-tree-dump " = 15;" "fre5" } } */ > > diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc > > index 11061a374a2..242d82d6274 100644 > > --- a/gcc/tree-ssa-sccvn.cc > > +++ b/gcc/tree-ssa-sccvn.cc > > @@ -3304,6 +3304,16 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void *data_, > > if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias)) > > return (void *)-1; > > break; > > + case IFN_LEN_MASK_STORE: > > + len = gimple_call_arg (call, 2); > > + bias = gimple_call_arg (call, 5); > > + if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias)) > > + return (void *)-1; > > + mask = gimple_call_arg (call, internal_fn_mask_index (fn)); > > + mask = vn_valueize (mask); > > + if (TREE_CODE (mask) != VECTOR_CST) > > + return (void *)-1; > > + break; > > default: > > return (void *)-1; > > } > > @@ -3344,11 +3354,17 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void *data_, > > tree vectype = TREE_TYPE (def_rhs); > > unsigned HOST_WIDE_INT elsz > > = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (vectype))); > > + /* Set initial len value is the UINT_MAX, so mask_idx < actual_len > > + is always true for MASK_STORE. */ > > + unsigned actual_len = UINT_MAX; > > + if (len) > > + actual_len = tree_to_uhwi (len) + tree_to_shwi (bias); > > + unsigned nunits > > + = MIN (actual_len, VECTOR_CST_NELTS (mask).coeffs[0]); > > No, that's not correct and what I meant. There's > vector_cst_encoded_nelts (mask), but for example for an > all-ones mask that would be 1. You'd then also not access > VECTOR_CST_ELT but VECTOR_CST_ENCODED_ELT. What I'm not sure > is how to recover the implicit walk over all elements for the > purpose of computing start/length and how generally this will > work for variable-length vectors where enumerating "all" > elements touched is required for correctness. > > The most simple thing would be to make this all conditional > to constant TYPE_VECTOR_SUBPARTS. Which is also why I was > asking for VL vector testcases. > > Richard. > > > if (mask) > > { > > HOST_WIDE_INT start = 0, length = 0; > > - unsigned mask_idx = 0; > > - do > > + for (unsigned mask_idx = 0; mask_idx < nunits; mask_idx++) > > { > > if (integer_zerop (VECTOR_CST_ELT (mask, mask_idx))) > > { > > @@ -3371,9 +3387,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void *data_, > > } > > else > > length += elsz; > > - mask_idx++; > > } > > - while (known_lt (mask_idx, TYPE_VECTOR_SUBPARTS (vectype))); > > if (length != 0) > > { > > pd.rhs_off = start; > > @@ -3389,7 +3403,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void *data_, > > { > > pd.offset = offset2i; > > pd.size = (tree_to_uhwi (len) > > - + -tree_to_shwi (bias)) * BITS_PER_UNIT; > > + + tree_to_shwi (bias)) * BITS_PER_UNIT; > > if (BYTES_BIG_ENDIAN) > > pd.rhs_off = pd.size - tree_to_uhwi (TYPE_SIZE (vectype)); > > else > > > > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)