Re: Re: [PATCH V11] VECT: Add decrement IV support in Loop Vectorizer

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai>
To: richard.sandiford <richard.sandiford@arm.com>
Cc: gcc-patches <gcc-patches@gcc.gnu.org>,  rguenther <rguenther@suse.de>
Subject: Re: Re: [PATCH V11] VECT: Add decrement IV support in Loop Vectorizer
Date: Fri, 19 May 2023 18:37:04 +0800	[thread overview]
Message-ID: <39BD0264F732B3DE+2023051918370378242996@rivai.ai> (raw)
In-Reply-To: <mpt5y8otybb.fsf@arm.com>

[-- Attachment #1: Type: text/plain, Size: 7397 bytes --]

>> I don't think this is a property of decrementing IVs.  IIUC it's really
>> a property of rgl->factor == 1 && factor == 1, where factor would need
>> to be passed in by the caller.  Because of that, it should probably be
>> a separate patch.
Is it right that I just post this part code as a seperate patch then merge it?

>> That is, current LOAD_LEN targets have two properties (IIRC):
>> (1) all vectors used in a given piece of vector code have the same byte size
>> (2) lengths are measured in bytes rather than elements
>> For all cases, including SVE, the number of controls needed for a scalar
>> statement is equal to the number of vectors needed for that scalar
>> statement.
>> Because of (1), on current LOADL_LEN targets, the number of controls
>> needed for a scalar statement is also proportional to the total number
>> of bytes occupied by the vectors generated for that scalar statement.
>> And because of (2), the total number of bytes is the only thing that
>> matters, so all users of a particular control can use the same control
>> value.
>> E.g. on current LOAD_LEN targets, 2xV16QI and 2xV8HI would use the same
>> control (with no adjustment).  2xV16QI means 32 elements, while 2xV8HI
>> means 16 elements.  V16QI's nscalars_per_iter would therefore be double
>> V8HI's, but V8HI's factor would be double V16QI's (2 vs 1), so things
>> even out.
>> The code structurally supports targets that count in elements rather
>> than bytes, so that factor==1 for all element types.  See the
>> "rgl->factor == 1 && factor == 1" case in:
 >>  if (rgl->max_nscalars_per_iter < nscalars_per_iter)  >>   {  >>     /* For now, we only support cases in which all loads and stores fall back	 to VnQI or none do.  */
   >>    gcc_assert (!rgl->max_nscalars_per_iter>> 		  || (rgl->factor == 1 && factor == 1)
		  || (rgl->max_nscalars_per_iter * rgl->factor
>> 		      == nscalars_per_iter * factor));
 >>      rgl->max_nscalars_per_iter = nscalars_per_iter; >>      rgl->type = vectype; >>      rgl->factor = factor;  >>   }>> But it hasn't been tested, since no current target uses it.
>> I think the above part of the patch shows that the current "factor is
>> always 1" path is in fact broken, and the patch is a correctness fix on
>> targets that measure in elements rather than bytes.
>> So I think the above part of the patch should go in ahead of the IV changes.
>> But the test should be based on factor rather than TYPE_VECTOR_SUBPARTS.Since the length control measured by bytes instead of bytes is not appropriate for RVV.You mean I can't support RVV auto-vectorization in upstream GCC middle-end and I can only support it in my downstream, is that right? 


juzhe.zhong@rivai.ai
 
From: Richard Sandiford
Date: 2023-05-19 18:23
To: juzhe.zhong
CC: gcc-patches; rguenther
Subject: Re: [PATCH V11] VECT: Add decrement IV support in Loop Vectorizer
Thanks for the update.  I'll split this review into two pieces.
Second piece to follow (not sure when, but hopefully soon).
 
juzhe.zhong@rivai.ai writes:
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index ed0166fedab..6f49bdee009 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -10364,12 +10375,14 @@ vect_record_loop_len (loop_vec_info loop_vinfo, vec_loop_lens *lens,
>     rgroup that operates on NVECTORS vectors, where 0 <= INDEX < NVECTORS.  */
>  
>  tree
> -vect_get_loop_len (loop_vec_info loop_vinfo, vec_loop_lens *lens,
> -    unsigned int nvectors, unsigned int index)
> +vect_get_loop_len (loop_vec_info loop_vinfo, gimple_stmt_iterator *gsi,
> +    vec_loop_lens *lens, unsigned int nvectors, tree vectype,
> +    unsigned int index)
>  {
>    rgroup_controls *rgl = &(*lens)[nvectors - 1];
>    bool use_bias_adjusted_len =
>      LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) != 0;
> +  tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
>  
>    /* Populate the rgroup's len array, if this is the first time we've
>       used it.  */
> @@ -10400,6 +10413,26 @@ vect_get_loop_len (loop_vec_info loop_vinfo, vec_loop_lens *lens,
>  
>    if (use_bias_adjusted_len)
>      return rgl->bias_adjusted_ctrl;
> +  else if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
> +    {
> +      tree loop_len = rgl->controls[index];
> +      poly_int64 nunits1 = TYPE_VECTOR_SUBPARTS (rgl->type);
> +      poly_int64 nunits2 = TYPE_VECTOR_SUBPARTS (vectype);
> +      if (maybe_ne (nunits1, nunits2))
> + {
> +   /* A loop len for data type X can be reused for data type Y
> +      if X has N times more elements than Y and if Y's elements
> +      are N times bigger than X's.  */
> +   gcc_assert (multiple_p (nunits1, nunits2));
> +   unsigned int factor = exact_div (nunits1, nunits2).to_constant ();
> +   gimple_seq seq = NULL;
> +   loop_len = gimple_build (&seq, RDIV_EXPR, iv_type, loop_len,
> +    build_int_cst (iv_type, factor));
> +   if (seq)
> +     gsi_insert_seq_before (gsi, seq, GSI_SAME_STMT);
> + }
> +      return loop_len;
> +    }
 
I don't think this is a property of decrementing IVs.  IIUC it's really
a property of rgl->factor == 1 && factor == 1, where factor would need
to be passed in by the caller.  Because of that, it should probably be
a separate patch.
 
That is, current LOAD_LEN targets have two properties (IIRC):
 
(1) all vectors used in a given piece of vector code have the same byte size
(2) lengths are measured in bytes rather than elements
 
For all cases, including SVE, the number of controls needed for a scalar
statement is equal to the number of vectors needed for that scalar
statement.
 
Because of (1), on current LOADL_LEN targets, the number of controls
needed for a scalar statement is also proportional to the total number
of bytes occupied by the vectors generated for that scalar statement.
And because of (2), the total number of bytes is the only thing that
matters, so all users of a particular control can use the same control
value.
 
E.g. on current LOAD_LEN targets, 2xV16QI and 2xV8HI would use the same
control (with no adjustment).  2xV16QI means 32 elements, while 2xV8HI
means 16 elements.  V16QI's nscalars_per_iter would therefore be double
V8HI's, but V8HI's factor would be double V16QI's (2 vs 1), so things
even out.
 
The code structurally supports targets that count in elements rather
than bytes, so that factor==1 for all element types.  See the
"rgl->factor == 1 && factor == 1" case in:
 
  if (rgl->max_nscalars_per_iter < nscalars_per_iter)
    {
      /* For now, we only support cases in which all loads and stores fall back
to VnQI or none do.  */
      gcc_assert (!rgl->max_nscalars_per_iter
  || (rgl->factor == 1 && factor == 1)
  || (rgl->max_nscalars_per_iter * rgl->factor
      == nscalars_per_iter * factor));
      rgl->max_nscalars_per_iter = nscalars_per_iter;
      rgl->type = vectype;
      rgl->factor = factor;
    }
 
But it hasn't been tested, since no current target uses it.
 
I think the above part of the patch shows that the current "factor is
always 1" path is in fact broken, and the patch is a correctness fix on
targets that measure in elements rather than bytes.
 
So I think the above part of the patch should go in ahead of the IV changes.
But the test should be based on factor rather than TYPE_VECTOR_SUBPARTS.
 
Thanks,
Richard

next prev parent reply	other threads:[~2023-05-19 10:37 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-16 10:23 juzhe.zhong
2023-05-18 11:27 ` Li, Pan2
2023-05-19 10:23 ` Richard Sandiford
2023-05-19 10:37   ` juzhe.zhong [this message]
2023-05-19 10:58     ` Richard Sandiford
2023-05-22 10:12       ` Richard Biener
2023-05-22 10:18         ` juzhe.zhong
2023-05-19 10:56   ` juzhe.zhong
2023-05-19 11:07     ` Richard Sandiford

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=39BD0264F732B3DE+2023051918370378242996@rivai.ai \
    --to=juzhe.zhong@rivai.ai \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=rguenther@suse.de \
    --cc=richard.sandiford@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).