public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Biener <rguenther@suse.de>
To: "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai>
Cc: gcc-patches <gcc-patches@gcc.gnu.org>,
	 "richard.sandiford" <richard.sandiford@arm.com>
Subject: Re: Re: [PATCH] VECT: Support loop len control on EXTRACT_LAST vectorization
Date: Wed, 9 Aug 2023 12:21:22 +0000 (UTC)	[thread overview]
Message-ID: <nycvar.YFH.7.77.849.2308091201530.12935@jbgna.fhfr.qr> (raw)
In-Reply-To: <42D594BE5999C7CD+2023080919153159922921@rivai.ai>

On Wed, 9 Aug 2023, juzhe.zhong@rivai.ai wrote:

> Hi, Richi.
> 
> >> that should be
> 
> >>   || (!LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
> >>       && !LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
> 
> >> I think.  It seems to imply that SLP isn't supported with
> >> masking/lengthing.
> 
> Oh, yes.  At first glance, the original code is quite suspicious and your comments make sense to me.
> 
> >> Hum, how does CFN_EXTRACT_LAST handle both mask and length transparently?
> >> Don't you need some CFN_LEN_EXTRACT_LAST instead?
> 
> I think CFN_EXTRACT_LAST always has either loop mask or loop len.
> 
> When both mask and length are not needed, IMHO, I think current BIT_FIELD_REF flow is good enough:
> https://godbolt.org/z/Yr5M9hcc6
> 
> So I think we don't need CFN_LEN_EXTRACT_LAST. 
> 
> Instead, I think we will need CFN_LEN_FOLD_EXTRACT_LAST in the next patch.
> 
> Feel free to correct me it I am wrong.

Richard S. should know best, but I don't think FOLD_EXTRACT_LAST is
any different from EXTRACT_LAST (the value for EXTRACT_LAST with
all zeros mask seems unspecified?).
Note the expanders are documented
as to receive 'mask' operands, not 'len' ones (and we'd miss BIAS).

As for SLP support the loop mask should have all SLP lanes
consistently masked/unmasked (same for 'len' I suppose), but we
want to extract a specific SLP lane only.  For masks I think
producing a mask that has all 'i' SLP lanes enabled and AND
that to the mask would select the proper lane for EXTRACT_LAST.
Not sure how to handle this for 'len' - I guess since 'len'
covers all SLP lanes as well we could just subtract
SLP_TREE_LANES (node) - slp_index from it?  I'll note we don't
handle ncopies > 1 which I think we could with using FOLD_EXTRACT_LAST?

Richard.

> Thanks.
> 
> 
> juzhe.zhong@rivai.ai
>  
> From: Richard Biener
> Date: 2023-08-09 19:00
> To: Ju-Zhe Zhong
> CC: gcc-patches; richard.sandiford
> Subject: Re: [PATCH] VECT: Support loop len control on EXTRACT_LAST vectorization
> On Wed, 9 Aug 2023, juzhe.zhong@rivai.ai wrote:
>  
> > From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
> > 
> > Hi, this patch is adding loop len control on extract_last autovectorization.
> > 
> > Consider this following case:
> > 
> > #include <stdint.h>
> > 
> > #define EXTRACT_LAST(TYPE) \
> >   TYPE __attribute__ ((noinline, noclone)) \
> >   test_##TYPE (TYPE *x, int n, TYPE value) \
> >   { \
> >     TYPE last; \
> >     for (int j = 0; j < n; ++j) \
> >       { \
> > last = x[j]; \
> > x[j] = last * value; \
> >       } \
> >     return last; \
> >   }
> > 
> > #define TEST_ALL(T) \
> >   T (uint8_t) \
> > 
> > TEST_ALL (EXTRACT_LAST)
> > 
> > ARM SVE IR:
> > 
> > Preheader:
> >   max_mask_34 = .WHILE_ULT (0, bnd.5_6, { 0, ... });
> > 
> > Loop:
> >   ...
> >   # loop_mask_22 = PHI <next_mask_35(4), max_mask_34(3)>
> >   ...
> >   vect_last_12.8_23 = .MASK_LOAD (_7, 8B, loop_mask_22);
> >   vect__4.9_27 = vect_last_12.8_23 * vect_cst__26;
> >   .MASK_STORE (_7, 8B, loop_mask_22, vect__4.9_27);
> >   ...
> >   next_mask_35 = .WHILE_ULT (_1, bnd.5_6, { 0, ... });
> >   ...
> > 
> > Epilogue:
> >   _25 = .EXTRACT_LAST (loop_mask_22, vect_last_12.8_23);
> > 
> > For RVV since we prefer len in loop control, after this patch for RVV:
> > 
> > Loop:
> >   ...
> >   loop_len_22 = SELECT_VL;
> >   vect_last_12.8_23 = .MASK_LOAD (_7, 8B, loop_len_22);
> >   vect__4.9_27 = vect_last_12.8_23 * vect_cst__26;
> >   .MASK_STORE (_7, 8B, loop_len_22, vect__4.9_27);
> >   ...
> > 
> > Epilogue:
> >   _25 = .EXTRACT_LAST (loop_len_22, vect_last_12.8_23);
> > 
> > This patch didn't add a new pattern for length loop control of extract_last.
> > Instead we reuse current extract_last.
> > 
> > Here is the code:
> > 
> > Step 1 - Enable length and record length for extract_last:
> > 
> > +       machine_mode vec_mode = TYPE_MODE (vectype);
> > +       if (get_len_load_store_mode (vec_mode, true).exists (&vec_mode))
> > + vect_record_loop_len (loop_vinfo,
> > +       &LOOP_VINFO_LENS (loop_vinfo), 1,
> > +       vectype, 1);
> > +       else
> > + vect_record_loop_mask (loop_vinfo,
> > +        &LOOP_VINFO_MASKS (loop_vinfo), 1,
> > +        vectype, NULL);
> > 
> > We use 'get_len_load_store_mode' to check whether targets support loop len control or not.
> > If yes, record a loop len.
> > 
> > Step 2 - Build EXTRACT_LAST with len:
> > 
> > -   tree mask = vect_get_loop_mask (loop_vinfo, gsi,
> > -   &LOOP_VINFO_MASKS (loop_vinfo),
> > -   1, vectype, 0);
> > +   tree control;
> > +   if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
> > +     control = vect_get_loop_len (loop_vinfo, gsi,
> > + &LOOP_VINFO_LENS (loop_vinfo), 1,
> > + vectype, 0, 0);
> > +   else
> > +     control = vect_get_loop_mask (loop_vinfo, gsi,
> > +   &LOOP_VINFO_MASKS (loop_vinfo), 1,
> > +   vectype, 0);
> >    tree scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
> > -   mask, vec_lhs_phi);
> > +   control, vec_lhs_phi);
> > 
> > Reuse the current codes (build EXTRACT_LAST with mask), build length instead if
> > 'LOOP_VINFO_FULLY_WITH_LENGTH_P' is true.
> > 
> > This patch has been fully tested in RISC-V port.
> > 
> > Bootstrap and Regression on X86 passed.
> > 
> > Ok for trunk ?
> > 
> > gcc/ChangeLog:
> > 
> >         * tree-vect-loop.cc (vectorizable_live_operation): Add length control.
> > 
> > ---
> >  gcc/tree-vect-loop.cc | 40 ++++++++++++++++++++++++++++------------
> >  1 file changed, 28 insertions(+), 12 deletions(-)
> > 
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > index 00058c3c13e..fde098cafde 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -10311,9 +10311,15 @@ vectorizable_live_operation (vec_info *vinfo,
> >    else
> >      {
> >        gcc_assert (ncopies == 1 && !slp_node);
> > -       vect_record_loop_mask (loop_vinfo,
> > -      &LOOP_VINFO_MASKS (loop_vinfo),
> > -      1, vectype, NULL);
> > +       machine_mode vec_mode = TYPE_MODE (vectype);
> > +       if (get_len_load_store_mode (vec_mode, true).exists (&vec_mode))
> > + vect_record_loop_len (loop_vinfo,
> > +       &LOOP_VINFO_LENS (loop_vinfo), 1,
> > +       vectype, 1);
> > +       else
> > + vect_record_loop_mask (loop_vinfo,
> > +        &LOOP_VINFO_MASKS (loop_vinfo), 1,
> > +        vectype, NULL);
> >      }
> >  }
> >        /* ???  Enable for loop costing as well.  */
> > @@ -10339,7 +10345,9 @@ vectorizable_live_operation (vec_info *vinfo,
> >    gimple *vec_stmt;
> >    if (slp_node)
> >      {
> > -      gcc_assert (!loop_vinfo || !LOOP_VINFO_FULLY_MASKED_P (loop_vinfo));
> > +      gcc_assert (!loop_vinfo
> > +   || !LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
> > +   || !LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo));
>  
> that should be
>  
>   || (!LOOP_VINFO_FULLY_MASKED_P (loop_vinfo) 
>       && !LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
>  
> I think.  It seems to imply that SLP isn't supported with 
> masking/lengthing.
>  
> >  
> >        /* Get the correct slp vectorized stmt.  */
> >        vec_lhs = SLP_TREE_VEC_DEFS (slp_node)[vec_entry];
> > @@ -10383,21 +10391,29 @@ vectorizable_live_operation (vec_info *vinfo,
> >  
> >        gimple_seq stmts = NULL;
> >        tree new_tree;
> > -      if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
> > +      if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
> > +   || LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
> >  {
> >    /* Emit:
> >  
> > -        SCALAR_RES = EXTRACT_LAST <VEC_LHS, MASK>
> > +        SCALAR_RES = EXTRACT_LAST <VEC_LHS, CONTROL>
> >  
> > -      where VEC_LHS is the vectorized live-out result and MASK is
> > -      the loop mask for the final iteration.  */
> > +      where VEC_LHS is the vectorized live-out result and CONTROL can
> > +      be either the loop mask for the final iteration or the loop len
> > +      for the final iteration.  */
> >    gcc_assert (ncopies == 1 && !slp_node);
> >    tree scalar_type = TREE_TYPE (STMT_VINFO_VECTYPE (stmt_info));
> > -   tree mask = vect_get_loop_mask (loop_vinfo, gsi,
> > -   &LOOP_VINFO_MASKS (loop_vinfo),
> > -   1, vectype, 0);
> > +   tree control;
> > +   if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
> > +     control = vect_get_loop_len (loop_vinfo, gsi,
> > + &LOOP_VINFO_LENS (loop_vinfo), 1,
> > + vectype, 0, 0);
> > +   else
> > +     control = vect_get_loop_mask (loop_vinfo, gsi,
> > +   &LOOP_VINFO_MASKS (loop_vinfo), 1,
> > +   vectype, 0);
> >    tree scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
> > -   mask, vec_lhs_phi);
> > +   control, vec_lhs_phi);
>  
> Hum, how does CFN_EXTRACT_LAST handle both mask and length transparently?
> Don't you need some CFN_LEN_EXTRACT_LAST instead?
>  
> >    /* Convert the extracted vector element to the scalar type.  */
> >    new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
> > 
>  
> Otherwise looks OK to me.
>  
> Thanks,
> Richard.
>  
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

  reply	other threads:[~2023-08-09 12:21 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-09  6:36 juzhe.zhong
2023-08-09 11:00 ` Richard Biener
2023-08-09 11:15   ` juzhe.zhong
2023-08-09 12:21     ` Richard Biener [this message]
2023-08-09 12:34       ` juzhe.zhong
2023-08-09 13:30     ` Richard Sandiford
2023-08-09 14:11       ` 钟居哲
2023-08-09 14:17         ` Richard Biener
2023-08-09 14:35           ` 钟居哲

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=nycvar.YFH.7.77.849.2308091201530.12935@jbgna.fhfr.qr \
    --to=rguenther@suse.de \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=juzhe.zhong@rivai.ai \
    --cc=richard.sandiford@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).