public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai>
To: jeffreyalaw <jeffreyalaw@gmail.com>,
	 gcc-patches <gcc-patches@gcc.gnu.org>
Cc: richard.sandiford <richard.sandiford@arm.com>,
	 rguenther <rguenther@suse.de>
Subject: Re: Re: [PATCH V4] VECT: Add decrement IV iteration loop control by variable amount support
Date: Tue, 9 May 2023 18:52:40 +0800	[thread overview]
Message-ID: <316D3FB218A3B4F8+20230509185239581802111@rivai.ai> (raw)
In-Reply-To: <62a49c62-8632-baff-c3d6-c4277fd669ca@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 8160 bytes --]

Hi,Richards. Would you mind reviewing this patch?

Thanks.


juzhe.zhong@rivai.ai
 
From: Jeff Law
Date: 2023-05-07 23:19
To: juzhe.zhong; gcc-patches
CC: richard.sandiford; rguenther
Subject: Re: [PATCH V4] VECT: Add decrement IV iteration loop control by variable amount support
 
 
On 5/4/23 07:25, juzhe.zhong@rivai.ai wrote:
> From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
> 
> This patch is fixing V3 patch:
> https://patchwork.sourceware.org/project/gcc/patch/20230407014741.139387-1-juzhe.zhong@rivai.ai/
> 
> Fix issues according to Richard Sandiford && Richard Biener.
> 
> 1. Rename WHILE_LEN pattern into SELECT_VL according to Richard Sandiford.
> 2. Support multiple-rgroup for non-SLP auto-vectorization.
> 
>     For vec_pack_trunc pattern (multi-rgroup of non-SLP), we generate the total length:
> 
>       _36 = MIN_EXPR <ivtmp_34, POLY_INT_CST [8, 8]>;
> 
>       First length (MIN (X, VF/N)):
>         loop_len_15 = MIN_EXPR <_36, POLY_INT_CST [2, 2]>;
> 
>       Second length (X - MIN (X, 1 * VF/N)):
>         loop_len_16 = _36 - loop_len_15;
> 
>       Third length (X - MIN (X, 2 * VF/N)):
>         _38 = MIN_EXPR <_36, POLY_INT_CST [4, 4]>;
>         loop_len_17 = _36 - _38;
> 
>       Forth length (X - MIN (X, 3 * VF/N)):
>         _39 = MIN_EXPR <_36, POLY_INT_CST [6, 6]>;
>         loop_len_18 = _36 - _39;
> 
> The reason that I use MIN_EXPR instead of SELECT_VL to calculate total length since using SELECT_VL
> to adapt induction IV consumes more instructions than just using MIN_EXPR. Also, during testing,
> I found it's hard to adjust length correctly according to SELECT_VL.
> 
> So, this patch we only use SELECT_VL for single-rgroup with single length control.
> 
> 3. Fix document of select_vl for Richard Biener (remove mode N).
> 4. Fix comments of vect_set_loop_controls_by_select_vl according to Richard Biener.
> 5. Keep loop_vinfo as first parameter for "vect_get_loop_len".
> 6. make requirement of get_while_len_data_ref_ptr outside, let it to be gated at the caller site.
> 
> More comments from Richard Biener:
>>> So it's not actually saturating.  The saturating operation is done by .WHILE_LEN?
> I define the outcome of SELECT_VL (n, vf)  (WHILE_LEN) = IN_RANGE (0, min (n, vf)) will make
> the loop control counter never underflow zero.
> 
>>> I see.  I wonder if it makes sense to leave .WHILE_LEN aside for a start,
>>> the above scheme should also work for single rgroups, no?
>>> As said, it _looks_ like you can progress without .WHILE_LEN and using
>>> .WHILE_LEN is a pure optimization?
> Yes, SELECT_VL (WHILE_LEN) is pure optimization for single-rgroup and allow
> target adjust any length = INRANGE (0, min (n, vf)) each iteration.
> 
> Let me known if I missed something for the V3 patch.
So at a high level this is pretty good.  I think there's some 
improvements we should make in the documentation and comments, but I'm 
comfortable with most of the implementation details.
 
 
 
> 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index cc4a93a8763..99cf0cdbdca 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -4974,6 +4974,40 @@ for (i = 1; i < operand3; i++)
>     operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
>   @end smallexample
>   
> +@cindex @code{select_vl@var{m}} instruction pattern
> +@item @code{select_vl@var{m}}
> +Set operand 0 to the number of active elements in vector will be updated value.
This reads rather poorly.  Is this still accurate?
 
Set operand 0 to the number of active elements in a vector to be updated 
in a loop iteration based on the total number of elements to be updated, 
the vectorization factor and vector properties of the target.
 
 
> +operand 1 is the total elements need to be updated value.
operand 1 is the total elements in the vector to be updated.
 
 
> +
> +The output of this pattern is not only used as IV of loop control counter, but also
> +is used as the IV of address calculation with multiply/shift operation. This allow
> +us dynamic adjust the number of elements is processed in each iteration of the loop.
This allows dynamic adjustment of the number of elements processed each 
loop iteration. -- is that still accurate and does it read better?
 
 
> @@ -47,7 +47,9 @@ along with GCC; see the file COPYING3.  If not see
>      so that we can free them all at once.  */
>   static bitmap_obstack loop_renamer_obstack;
>   
> -/* Creates an induction variable with value BASE + STEP * iteration in LOOP.
> +/* Creates an induction variable with value BASE (+/-) STEP * iteration in LOOP.
> +   If CODE is PLUS_EXPR, the induction variable is BASE + STEP * iteration.
> +   If CODE is MINUS_EXPR, the induction variable is BASE - STEP * iteration.
>      It is expected that neither BASE nor STEP are shared with other expressions
>      (unless the sharing rules allow this).  Use VAR as a base var_decl for it
>      (if NULL, a new temporary will be created).  The increment will occur at
It's been pretty standard to stick with just PLUS_EXPR for this stuff 
and instead negate the constant to produce the same effect as 
MINUS_EXPR.  Is there a reason we're not continuing that practice? 
Sorry if you've answered this already -- if you have, you can just point 
me at the prior discussion and I'll read it.
 
 
 
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index 44bd5f2c805..d63ded5d4f0 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -385,6 +385,48 @@ vect_maybe_permute_loop_masks (gimple_seq *seq, rgroup_controls *dest_rgm,
>     return false;
>   }
>   
> +/* Try to use permutes to define the lens in DEST_RGM using the lens
> +   in SRC_RGM, given that the former has twice as many lens as the
> +   latter.  Return true on success, adding any new statements to SEQ.  */
I would suggest not using "permute" in this description.  When I read 
permute in the context of vectorization, I think of a vector permute to 
scramble elements within a vector.
 
This looks like you're just adjusting how many vector elements you're 
operating on.
 
> + {
> +   /* For SLP, we can't allow non-VF number of elements to be processed
> +      in non-final iteration. We force the number of elements to be
> +    processed in each non-final iteration is VF elements. If we allow
> +    non-VF elements processing in non-final iteration will make SLP too
> +    complicated and produce inferior codegen.
Looks like you may have mixed up spaces and tabs in the above comment. 
Just a nit, but let's go ahead and get it fixed.
 
> @@ -703,6 +1040,10 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
>   
>     bool use_masks_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
>     tree compare_type = LOOP_VINFO_RGROUP_COMPARE_TYPE (loop_vinfo);
> +  tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
> +  bool use_vl_p = !use_masks_p
> +   && direct_internal_fn_supported_p (IFN_SELECT_VL, iv_type,
> +      OPTIMIZE_FOR_SPEED);
When you break a line with a logical like this, go ahead and add 
parenthesis and make sure the logical aligns just after the paren.  ie
 
bool use_vl_p = (!use_masks_p
&& direct....
 
 
Alternately, compute the direct_itnernal_fn_supported_p into its own 
boolean and then you don't need as much line wrapping.
 
In general, don't be afraid to use extra temporaries if doing so 
improves readability.
 
 
 
 
 
> +   else if (loop_lens && loop_lens->length () == 1
> +    && direct_internal_fn_supported_p (
> +      IFN_SELECT_VL, LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo),
> +      OPTIMIZE_FOR_SPEED)
> +    && memory_access_type != VMAT_INVARIANT)
This looks like a good example of code that would be easier to read if 
the call to direct_internal-fn_supported_p was saved into a temporary. 
Similarly for the instance you added in vectorizable_load.
 
 
I'd like to get this patch wrapped up soon.   But I also want to give 
both Richards a chance to chime in with their concerns.
 
Thanks,
 
Jeff
 

  parent reply	other threads:[~2023-05-09 10:52 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-04 13:25 juzhe.zhong
2023-05-05 23:41 ` 钟居哲
2023-05-09 12:59   ` Richard Sandiford
2023-05-09 13:27     ` juzhe.zhong
2023-05-09 14:34       ` 钟居哲
2023-05-07 15:19 ` Jeff Law
2023-05-07 21:54   ` 钟居哲
2023-05-09 10:52   ` juzhe.zhong [this message]
2023-05-08  5:35 ` Kewen.Lin
2023-05-08  6:27   ` juzhe.zhong
2023-05-08  7:55     ` Kewen.Lin
2023-05-08  8:25       ` juzhe.zhong
2023-05-10 16:45 ` Richard Sandiford
2023-05-10 21:00   ` 钟居哲
2023-05-10 21:28     ` Richard Sandiford
2023-05-10 22:51       ` 钟居哲
2023-05-11  4:50         ` Richard Sandiford
2023-05-11  5:14           ` juzhe.zhong
2023-05-11 10:11   ` juzhe.zhong
2023-05-11 11:04     ` Richard Sandiford
2023-05-11 11:21       ` juzhe.zhong
2023-05-11 11:29         ` Richard Sandiford
2023-05-11 12:08           ` juzhe.zhong
2023-05-11 12:42             ` Richard Sandiford
2023-05-11 23:12               ` 钟居哲

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=316D3FB218A3B4F8+20230509185239581802111@rivai.ai \
    --to=juzhe.zhong@rivai.ai \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jeffreyalaw@gmail.com \
    --cc=rguenther@suse.de \
    --cc=richard.sandiford@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).