public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai>
To: rguenther <rguenther@suse.de>
Cc: gcc-patches <gcc-patches@gcc.gnu.org>,
	 richard.sandiford <richard.sandiford@arm.com>,
	 pan2.li <pan2.li@intel.com>
Subject: Re: Re: [PATCH V4] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE
Date: Tue, 27 Jun 2023 16:47:08 +0800	[thread overview]
Message-ID: <58295AB5B5EEE187+2023062716470792471023@rivai.ai> (raw)
In-Reply-To: <nycvar.YFH.7.77.849.2306270830130.4723@jbgna.fhfr.qr>

[-- Attachment #1: Type: text/plain, Size: 10199 bytes --]

Hi, Richi.  After reading your emails.

Is that correct that I put supporting LEN_MASK_STORE into SCCVN on hold for now ?

Go ahead to the next RVV auto-vectorization support patterns in middle-end (for example I sent to add optabs and internal fn for LEN_MASK_GATHER_LOAD).
Then, after I finish all the RVV auto-vectorization patterns in middle-end, I come back to take a look at LEN_MASK_STORE in SCCVN?

Thanks.


juzhe.zhong@rivai.ai
 
From: Richard Biener
Date: 2023-06-27 16:34
To: juzhe.zhong@rivai.ai
CC: gcc-patches; richard.sandiford; pan2.li
Subject: Re: Re: [PATCH V4] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE
On Tue, 27 Jun 2023, juzhe.zhong@rivai.ai wrote:
 
> Hi, Richi.
> After several tries, I found a case that is "close to" have CSE opportunity in RVV for VL vectors:
> 
> void __attribute__((noinline,noclone))
> foo (uint16_t *out, uint16_t *res)
> {
>   int mask[] = { 0, 1, 0, 1, 0, 1, 0, 1 };
>   int i;
>   for (i = 0; i < 8; ++i)
>     {
>       if (mask[i])
>         out[i] = 33;
>     }
>   uint16_t o0 = out[0];
>   uint16_t o7 = out[3];
>   uint16_t o14 = out[6];
>   uint16_t o15 = out[7];
>   res[0] = o0;
>   res[2] = o7;
>   res[4] = o14;
>   res[6] = o15;
> }
> 
> The Gimple IR:
>   _64 = .SELECT_VL (ivtmp_31, POLY_INT_CST [4, 4]);
>   vect__1.15_54 = .LEN_MASK_LOAD (vectp_mask.13_21, 32B, _64, { -1, ... }, 0);
>   mask__7.16_56 = vect__1.15_54 != { 0, ... };
>   .LEN_MASK_STORE (vectp_out.17_34, 16B, _64, mask__7.16_56, { 33, ... }, 0);
> 
> You can see the "len" is always variable produced by SELECT_VL so it failed to have CSE opportunity.
> And I tried in ARM SVE: https://godbolt.org/z/63a6WcT9o
> It also fail to have CSE opportunity.
> 
> It seems that it's difficult to have such CSE opportunity in VL vectors.
 
Ah.  Nice example.  This shows we fail to constant fold
[vec_unpack_lo_expr] { -1, -1, ..., 0, 0, 0, ... } which means we
fail to fold the .MASK_LOAD from 'mask' (that's not something we
support, see my previous answer) and that means the .MASK_STORE mask
doesn't end up constant.
 
I understant that SVE cannot easily generate all constant [masks]
so we probably shouldn't aggressively perform elimination on
constant foldings but for actual constant folding of dependent stmts
it might be nice to have the constant folded masks.
 
I will open a bugreport with this testcase.
 
Richard.
 
> 
> juzhe.zhong@rivai.ai
>  
> From: Richard Biener
> Date: 2023-06-27 15:47
> To: juzhe.zhong@rivai.ai
> CC: gcc-patches; richard.sandiford; pan2.li
> Subject: Re: Re: [PATCH V4] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE
> On Tue, 27 Jun 2023, juzhe.zhong@rivai.ai wrote:
>  
> > Hi, Richi.
> > 
> > When I try vector_cst_encoded_nelts (mask), the testcase I append is 2 but actual actual nunits is 8 in that case,
> > then I failed to walk all elements analysis.
> > 
> > >> The most simple thing would be to make this all conditional
> > >> to constant TYPE_VECTOR_SUBPARTS.  Which is also why I was
> > >> asking for VL vector testcases.
> > Ok, I understand your point, but RVV doesn't use LEN_MASK_STORE in intrinsics. I am not sure how to reproduce VL vectors
> > in C code.
>  
> The original motivation was from unrolled vector code for the
> conditional masking case.  Does RVV allow MASK_STORE from
> intrinsics?
>  
> Otherwise how about
>  
> for (i = 0; i < 8; ++i)
>    a[i] = 42;
> for (i = 2; i < 6; ++i)
>    b[i] = a[i];
>  
> with disabling complete unrolling -fdisable-tree-cunrolli both
> loops should get vectorized, hopefully not iterating(?)  Then
> we end up with the overlapping CSE opportunity, no?  Maybe
> it needs support for fixed size vectors and using VL vectors
> just for the epilog.
>  
> Richard.
>  
>  
> > Thanks.
> > 
> > 
> > juzhe.zhong@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-06-27 15:33
> > To: Ju-Zhe Zhong
> > CC: gcc-patches; richard.sandiford; pan2.li
> > Subject: Re: [PATCH V4] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE
> > On Tue, 27 Jun 2023, juzhe.zhong@rivai.ai wrote:
> >  
> > > From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
> > > 
> > > Hi, Richi.
> > > 
> > > I tried to understand your last email and to refactor the do-while loop using VECTOR_CST_NELTS.
> > > 
> > > This patch works fine for LEN_MASK_STORE and compiler can CSE redundant store.
> > > I have appended testcase in this patch to test VN for LEN_MASK_STORE.
> > > 
> > > I am not sure whether I am on the same page with you.
> > > 
> > > Feel free to correct me, Thanks.
> > > 
> > > gcc/ChangeLog:
> > > 
> > >         * tree-ssa-sccvn.cc (vn_reference_lookup_3): Add LEN_MASK_STORE and fix LEN_STORE
> > > 
> > > gcc/testsuite/ChangeLog:
> > > 
> > >         * gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c: New test.
> > > 
> > > ---
> > >  .../rvv/autovec/partial/len_maskstore_vn-1.c  | 30 +++++++++++++++++++
> > >  gcc/tree-ssa-sccvn.cc                         | 24 +++++++++++----
> > >  2 files changed, 49 insertions(+), 5 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c
> > > 
> > > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c
> > > new file mode 100644
> > > index 00000000000..0b2d03693dc
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c
> > > @@ -0,0 +1,30 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-march=rv32gcv_zvl256b -mabi=ilp32d --param riscv-autovec-preference=fixed-vlmax -O3 -fdump-tree-fre5" } */
> > > +
> > > +void __attribute__((noinline,noclone))
> > > +foo (int *out, int *res)
> > > +{
> > > +  int mask[] = { 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1 };
> > > +  int i;
> > > +  for (i = 0; i < 16; ++i)
> > > +    {
> > > +      if (mask[i])
> > > +        out[i] = i;
> > > +    }
> > > +  int o0 = out[0];
> > > +  int o7 = out[7];
> > > +  int o14 = out[14];
> > > +  int o15 = out[15];
> > > +  res[0] = o0;
> > > +  res[2] = o7;
> > > +  res[4] = o14;
> > > +  res[6] = o15;
> > > +}
> > > +
> > > +/* Vectorization produces .LEN_MASK_STORE, unrolling will unroll the two
> > > +   vector iterations.  FRE5 after that should be able to CSE
> > > +   out[7] and out[15], but leave out[0] and out[14] alone.  */
> > > +/* { dg-final { scan-tree-dump " = o0_\[0-9\]+;" "fre5" } } */
> > > +/* { dg-final { scan-tree-dump " = 7;" "fre5" } } */
> > > +/* { dg-final { scan-tree-dump " = o14_\[0-9\]+;" "fre5" } } */
> > > +/* { dg-final { scan-tree-dump " = 15;" "fre5" } } */
> > > diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
> > > index 11061a374a2..242d82d6274 100644
> > > --- a/gcc/tree-ssa-sccvn.cc
> > > +++ b/gcc/tree-ssa-sccvn.cc
> > > @@ -3304,6 +3304,16 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void *data_,
> > >    if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias))
> > >      return (void *)-1;
> > >    break;
> > > + case IFN_LEN_MASK_STORE:
> > > +   len = gimple_call_arg (call, 2);
> > > +   bias = gimple_call_arg (call, 5);
> > > +   if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias))
> > > +     return (void *)-1;
> > > +   mask = gimple_call_arg (call, internal_fn_mask_index (fn));
> > > +   mask = vn_valueize (mask);
> > > +   if (TREE_CODE (mask) != VECTOR_CST)
> > > +     return (void *)-1;
> > > +   break;
> > >  default:
> > >    return (void *)-1;
> > >  }
> > > @@ -3344,11 +3354,17 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void *data_,
> > >        tree vectype = TREE_TYPE (def_rhs);
> > >        unsigned HOST_WIDE_INT elsz
> > >  = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (vectype)));
> > > +       /* Set initial len value is the UINT_MAX, so mask_idx < actual_len
> > > + is always true for MASK_STORE.  */
> > > +       unsigned actual_len = UINT_MAX;
> > > +       if (len)
> > > + actual_len = tree_to_uhwi (len) + tree_to_shwi (bias);
> > > +       unsigned nunits
> > > + = MIN (actual_len, VECTOR_CST_NELTS (mask).coeffs[0]);
> >  
> > No, that's not correct and what I meant.  There's
> > vector_cst_encoded_nelts (mask), but for example for an
> > all-ones mask that would be 1.  You'd then also not access
> > VECTOR_CST_ELT but VECTOR_CST_ENCODED_ELT.  What I'm not sure
> > is how to recover the implicit walk over all elements for the
> > purpose of computing start/length and how generally this will
> > work for variable-length vectors where enumerating "all"
> > elements touched is required for correctness.
> >  
> > The most simple thing would be to make this all conditional
> > to constant TYPE_VECTOR_SUBPARTS.  Which is also why I was
> > asking for VL vector testcases.
> >  
> > Richard.
> >  
> > >        if (mask)
> > >  {
> > >    HOST_WIDE_INT start = 0, length = 0;
> > > -   unsigned mask_idx = 0;
> > > -   do
> > > +   for (unsigned mask_idx = 0; mask_idx < nunits; mask_idx++)
> > >      {
> > >        if (integer_zerop (VECTOR_CST_ELT (mask, mask_idx)))
> > >  {
> > > @@ -3371,9 +3387,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void *data_,
> > >  }
> > >        else
> > >  length += elsz;
> > > -       mask_idx++;
> > >      }
> > > -   while (known_lt (mask_idx, TYPE_VECTOR_SUBPARTS (vectype)));
> > >    if (length != 0)
> > >      {
> > >        pd.rhs_off = start;
> > > @@ -3389,7 +3403,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void *data_,
> > >  {
> > >    pd.offset = offset2i;
> > >    pd.size = (tree_to_uhwi (len)
> > > -      + -tree_to_shwi (bias)) * BITS_PER_UNIT;
> > > +      + tree_to_shwi (bias)) * BITS_PER_UNIT;
> > >    if (BYTES_BIG_ENDIAN)
> > >      pd.rhs_off = pd.size - tree_to_uhwi (TYPE_SIZE (vectype));
> > >    else
> > > 
> >  
> > 
>  
> 
 
-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)
 

  reply	other threads:[~2023-06-27  8:47 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-27  6:47 juzhe.zhong
2023-06-27  7:33 ` Richard Biener
2023-06-27  7:41   ` juzhe.zhong
2023-06-27  7:47     ` Richard Biener
2023-06-27  8:01       ` juzhe.zhong
2023-06-27  8:28         ` Richard Biener
2023-06-27  8:09       ` juzhe.zhong
2023-06-27  8:34         ` Richard Biener
2023-06-27  8:47           ` juzhe.zhong [this message]
2023-06-27  8:56             ` Richard Biener
2023-06-27  8:59               ` juzhe.zhong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=58295AB5B5EEE187+2023062716470792471023@rivai.ai \
    --to=juzhe.zhong@rivai.ai \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=pan2.li@intel.com \
    --cc=rguenther@suse.de \
    --cc=richard.sandiford@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).