From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2001:67c:2178:6::1d]) by sourceware.org (Postfix) with ESMTPS id B83AA3858D33 for ; Tue, 27 Jun 2023 07:47:31 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B83AA3858D33 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id DC3C01F8BD; Tue, 27 Jun 2023 07:47:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1687852050; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=nZLBm5Rn4lJrrxlUXksTawscIvvnV7QpWWYPHHQS27E=; b=c1SRViv1DRDxgBCM9Hfu/HOQwNJp6KC/fwdNYwJM9gwZ7Va9McXuvKOa9jiQu2NB4ordJw M/lznakHsqfff9aRat0fF5o+qdrNeolYa37L5l8fEOVz+HxtxF/1a8rCfnRlxkvQ+6Rodm ddYW3piAWqDT+DYxfcu5yPKhFFncdF8= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1687852050; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=nZLBm5Rn4lJrrxlUXksTawscIvvnV7QpWWYPHHQS27E=; b=As2XfoqA8FFRX9jdsuYJlIRXX6bFt02vW5BzaFxTX0I0QhO8eRWvkfDt7I5Bt+ziQ29hCf TcO7LjP9/szB5JAQ== Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id C492D2C141; Tue, 27 Jun 2023 07:47:30 +0000 (UTC) Date: Tue, 27 Jun 2023 07:47:30 +0000 (UTC) From: Richard Biener To: "juzhe.zhong@rivai.ai" cc: gcc-patches , "richard.sandiford" , "pan2.li" Subject: Re: Re: [PATCH V4] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE In-Reply-To: <2A71A0EC48A425B7+2023062715413073068911@rivai.ai> Message-ID: References: <20230627064737.16257-1-juzhe.zhong@rivai.ai>, <2A71A0EC48A425B7+2023062715413073068911@rivai.ai> User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tue, 27 Jun 2023, juzhe.zhong@rivai.ai wrote: > Hi, Richi. > > When I try vector_cst_encoded_nelts (mask), the testcase I append is 2 but actual actual nunits is 8 in that case, > then I failed to walk all elements analysis. > > >> The most simple thing would be to make this all conditional > >> to constant TYPE_VECTOR_SUBPARTS. Which is also why I was > >> asking for VL vector testcases. > Ok, I understand your point, but RVV doesn't use LEN_MASK_STORE in intrinsics. I am not sure how to reproduce VL vectors > in C code. The original motivation was from unrolled vector code for the conditional masking case. Does RVV allow MASK_STORE from intrinsics? Otherwise how about for (i = 0; i < 8; ++i) a[i] = 42; for (i = 2; i < 6; ++i) b[i] = a[i]; with disabling complete unrolling -fdisable-tree-cunrolli both loops should get vectorized, hopefully not iterating(?) Then we end up with the overlapping CSE opportunity, no? Maybe it needs support for fixed size vectors and using VL vectors just for the epilog. Richard. > Thanks. > > > juzhe.zhong@rivai.ai > > From: Richard Biener > Date: 2023-06-27 15:33 > To: Ju-Zhe Zhong > CC: gcc-patches; richard.sandiford; pan2.li > Subject: Re: [PATCH V4] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE > On Tue, 27 Jun 2023, juzhe.zhong@rivai.ai wrote: > > > From: Ju-Zhe Zhong > > > > Hi, Richi. > > > > I tried to understand your last email and to refactor the do-while loop using VECTOR_CST_NELTS. > > > > This patch works fine for LEN_MASK_STORE and compiler can CSE redundant store. > > I have appended testcase in this patch to test VN for LEN_MASK_STORE. > > > > I am not sure whether I am on the same page with you. > > > > Feel free to correct me, Thanks. > > > > gcc/ChangeLog: > > > > * tree-ssa-sccvn.cc (vn_reference_lookup_3): Add LEN_MASK_STORE and fix LEN_STORE > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c: New test. > > > > --- > > .../rvv/autovec/partial/len_maskstore_vn-1.c | 30 +++++++++++++++++++ > > gcc/tree-ssa-sccvn.cc | 24 +++++++++++---- > > 2 files changed, 49 insertions(+), 5 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c > > > > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c > > new file mode 100644 > > index 00000000000..0b2d03693dc > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c > > @@ -0,0 +1,30 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-march=rv32gcv_zvl256b -mabi=ilp32d --param riscv-autovec-preference=fixed-vlmax -O3 -fdump-tree-fre5" } */ > > + > > +void __attribute__((noinline,noclone)) > > +foo (int *out, int *res) > > +{ > > + int mask[] = { 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1 }; > > + int i; > > + for (i = 0; i < 16; ++i) > > + { > > + if (mask[i]) > > + out[i] = i; > > + } > > + int o0 = out[0]; > > + int o7 = out[7]; > > + int o14 = out[14]; > > + int o15 = out[15]; > > + res[0] = o0; > > + res[2] = o7; > > + res[4] = o14; > > + res[6] = o15; > > +} > > + > > +/* Vectorization produces .LEN_MASK_STORE, unrolling will unroll the two > > + vector iterations. FRE5 after that should be able to CSE > > + out[7] and out[15], but leave out[0] and out[14] alone. */ > > +/* { dg-final { scan-tree-dump " = o0_\[0-9\]+;" "fre5" } } */ > > +/* { dg-final { scan-tree-dump " = 7;" "fre5" } } */ > > +/* { dg-final { scan-tree-dump " = o14_\[0-9\]+;" "fre5" } } */ > > +/* { dg-final { scan-tree-dump " = 15;" "fre5" } } */ > > diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc > > index 11061a374a2..242d82d6274 100644 > > --- a/gcc/tree-ssa-sccvn.cc > > +++ b/gcc/tree-ssa-sccvn.cc > > @@ -3304,6 +3304,16 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void *data_, > > if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias)) > > return (void *)-1; > > break; > > + case IFN_LEN_MASK_STORE: > > + len = gimple_call_arg (call, 2); > > + bias = gimple_call_arg (call, 5); > > + if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias)) > > + return (void *)-1; > > + mask = gimple_call_arg (call, internal_fn_mask_index (fn)); > > + mask = vn_valueize (mask); > > + if (TREE_CODE (mask) != VECTOR_CST) > > + return (void *)-1; > > + break; > > default: > > return (void *)-1; > > } > > @@ -3344,11 +3354,17 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void *data_, > > tree vectype = TREE_TYPE (def_rhs); > > unsigned HOST_WIDE_INT elsz > > = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (vectype))); > > + /* Set initial len value is the UINT_MAX, so mask_idx < actual_len > > + is always true for MASK_STORE. */ > > + unsigned actual_len = UINT_MAX; > > + if (len) > > + actual_len = tree_to_uhwi (len) + tree_to_shwi (bias); > > + unsigned nunits > > + = MIN (actual_len, VECTOR_CST_NELTS (mask).coeffs[0]); > > No, that's not correct and what I meant. There's > vector_cst_encoded_nelts (mask), but for example for an > all-ones mask that would be 1. You'd then also not access > VECTOR_CST_ELT but VECTOR_CST_ENCODED_ELT. What I'm not sure > is how to recover the implicit walk over all elements for the > purpose of computing start/length and how generally this will > work for variable-length vectors where enumerating "all" > elements touched is required for correctness. > > The most simple thing would be to make this all conditional > to constant TYPE_VECTOR_SUBPARTS. Which is also why I was > asking for VL vector testcases. > > Richard. > > > if (mask) > > { > > HOST_WIDE_INT start = 0, length = 0; > > - unsigned mask_idx = 0; > > - do > > + for (unsigned mask_idx = 0; mask_idx < nunits; mask_idx++) > > { > > if (integer_zerop (VECTOR_CST_ELT (mask, mask_idx))) > > { > > @@ -3371,9 +3387,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void *data_, > > } > > else > > length += elsz; > > - mask_idx++; > > } > > - while (known_lt (mask_idx, TYPE_VECTOR_SUBPARTS (vectype))); > > if (length != 0) > > { > > pd.rhs_off = start; > > @@ -3389,7 +3403,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void *data_, > > { > > pd.offset = offset2i; > > pd.size = (tree_to_uhwi (len) > > - + -tree_to_shwi (bias)) * BITS_PER_UNIT; > > + + tree_to_shwi (bias)) * BITS_PER_UNIT; > > if (BYTES_BIG_ENDIAN) > > pd.rhs_off = pd.size - tree_to_uhwi (TYPE_SIZE (vectype)); > > else > > > > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)