Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Richard Sandiford <richard.sandiford@arm.com>
To: "juzhe.zhong\@rivai.ai" <juzhe.zhong@rivai.ai>
Cc: gcc-patches <gcc-patches@gcc.gnu.org>,
	 rguenther <rguenther@suse.de>,
	 jeffreyalaw <jeffreyalaw@gmail.com>
Subject: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization
Date: Thu, 20 Apr 2023 10:11:02 +0100	[thread overview]
Message-ID: <mptzg72dil5.fsf@arm.com> (raw)
In-Reply-To: <0AAC8392021A279F+2023042016574579562748@rivai.ai> (juzhe's message of "Thu, 20 Apr 2023 16:57:46 +0800")

"juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai> writes:
> Thanks Richard reminding me. I originally think community does not allow me support variable amount IV and let me do this in RISC-V backend.

No, I think that part should and needs to be done in the middle-end,
since if the initial IVs are incorrect, it's very difficult to fix
them up later.

But with the patch as originally presented, WHILE_LEN was just a
simple minimum operation, with only the final iteration being partial.
It didn't make sense IMO for that to be its own IFN.  It was only later
that you said that non-final iterations might be partial too.

And there was pushback against WHILE_LEN having an effect on global
state, rather than being a simple "how many elements should I process?"
calculation.  That last bit -- the global effect of VSETVL -- was the bit
that needed to be kept local to the RISC-V backend.

Thanks,
Richard

> It seems that I can do that in middle-end. Thank you so much. I will update the patch. Really appreciate it!
>
>
>
> juzhe.zhong@rivai.ai
>  
> From: Richard Sandiford
> Date: 2023-04-20 16:52
> To: 钟居哲
> CC: gcc-patches; rguenther; Jeff Law
> Subject: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization
> 钟居哲 <juzhe.zhong@rivai.ai> writes:
>> Hi, Richards.
>> Since GCC 14 is open and this patch has been boostraped && tested on X86.
>> Is this patch supporting variable IV OK for the trunk ?
>  
> Doesn't the patch need updating based on the previous discussion?
> I thought the outcome was that WHILE_LEN isn't a simple MIN operation
> (contrary to the documentation in the patch) and that pointer IVs
> would also need to be updated by a variable amount, given that even
> non-final iterations might process fewer than VF elements.
>  
> Thanks,
> Richard
>  
>> juzhe.zhong@rivai.ai
>>  
>> From: juzhe.zhong
>> Date: 2023-04-07 09:47
>> To: gcc-patches
>> CC: richard.sandiford; rguenther; jeffreyalaw; Juzhe-Zhong
>> Subject: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization
>> From: Juzhe-Zhong <juzhe.zhong@rivai.ai>
>>  
>> This patch is to add WHILE_LEN pattern.
>> It's inspired by RVV ISA simple "vvaddint32.s" example:
>> https://github.com/riscv/riscv-v-spec/blob/master/example/vvaddint32.s
>>  
>> More details are in "vect_set_loop_controls_by_while_len" implementation
>> and comments.
>>  
>> Consider such following case:
>> #define N 16
>> int src[N];
>> int dest[N];
>>  
>> void
>> foo (int n)
>> {
>>   for (int i = 0; i < n; i++)
>>     dest[i] = src[i];
>> }
>>  
>> -march=rv64gcv -O3 --param riscv-autovec-preference=scalable -fno-vect-cost-model -fno-tree-loop-distribute-patterns:
>>  
>> foo:        
>>         ble     a0,zero,.L1
>>         lui     a4,%hi(.LANCHOR0)
>>         addi    a4,a4,%lo(.LANCHOR0)
>>         addi    a3,a4,64
>>         csrr    a2,vlenb
>> .L3:
>>         vsetvli a5,a0,e32,m1,ta,ma
>>         vle32.v v1,0(a4)
>>         sub     a0,a0,a5
>>         vse32.v v1,0(a3)
>>         add     a4,a4,a2
>>         add     a3,a3,a2
>>         bne     a0,zero,.L3
>> .L1:
>>         ret
>>  
>> gcc/ChangeLog:
>>  
>>         * doc/md.texi: Add WHILE_LEN support.
>>         * internal-fn.cc (while_len_direct): Ditto.
>>         (expand_while_len_optab_fn): Ditto.
>>         (direct_while_len_optab_supported_p): Ditto.
>>         * internal-fn.def (WHILE_LEN): Ditto.
>>         * optabs.def (OPTAB_D): Ditto.
>>         * tree-ssa-loop-manip.cc (create_iv): Ditto.
>>         * tree-ssa-loop-manip.h (create_iv): Ditto.
>>         * tree-vect-loop-manip.cc (vect_set_loop_controls_by_while_len): Ditto.
>>         (vect_set_loop_condition_partial_vectors): Ditto.
>>         * tree-vect-loop.cc (vect_get_loop_len): Ditto.
>>         * tree-vect-stmts.cc (vectorizable_store): Ditto.
>>         (vectorizable_load): Ditto.
>>         * tree-vectorizer.h (vect_get_loop_len): Ditto.
>>  
>> ---
>> gcc/doc/md.texi             |  14 +++
>> gcc/internal-fn.cc          |  29 ++++++
>> gcc/internal-fn.def         |   1 +
>> gcc/optabs.def              |   1 +
>> gcc/tree-ssa-loop-manip.cc  |   4 +-
>> gcc/tree-ssa-loop-manip.h   |   2 +-
>> gcc/tree-vect-loop-manip.cc | 186 ++++++++++++++++++++++++++++++++++--
>> gcc/tree-vect-loop.cc       |  35 +++++--
>> gcc/tree-vect-stmts.cc      |   9 +-
>> gcc/tree-vectorizer.h       |   4 +-
>> 10 files changed, 264 insertions(+), 21 deletions(-)
>>  
>> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
>> index 8e3113599fd..72178ab014c 100644
>> --- a/gcc/doc/md.texi
>> +++ b/gcc/doc/md.texi
>> @@ -4965,6 +4965,20 @@ for (i = 1; i < operand3; i++)
>>    operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
>> @end smallexample
>> +@cindex @code{while_len@var{m}@var{n}} instruction pattern
>> +@item @code{while_len@var{m}@var{n}}
>> +Set operand 0 to the number of active elements in vector will be updated value.
>> +operand 1 is the total elements need to be updated value.
>> +operand 2 is the vectorization factor.
>> +The operation is equivalent to:
>> +
>> +@smallexample
>> +operand0 = MIN (operand1, operand2);
>> +operand2 can be const_poly_int or poly_int related to vector mode size.
>> +Some target like RISC-V has a standalone instruction to get MIN (n, MODE SIZE) so
>> +that we can reduce a use of general purpose register.
>> +@end smallexample
>> +
>> @cindex @code{check_raw_ptrs@var{m}} instruction pattern
>> @item @samp{check_raw_ptrs@var{m}}
>> Check whether, given two pointers @var{a} and @var{b} and a length @var{len},
>> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
>> index 6e81dc05e0e..5f44def90d3 100644
>> --- a/gcc/internal-fn.cc
>> +++ b/gcc/internal-fn.cc
>> @@ -127,6 +127,7 @@ init_internal_fns ()
>> #define cond_binary_direct { 1, 1, true }
>> #define cond_ternary_direct { 1, 1, true }
>> #define while_direct { 0, 2, false }
>> +#define while_len_direct { 0, 0, false }
>> #define fold_extract_direct { 2, 2, false }
>> #define fold_left_direct { 1, 1, false }
>> #define mask_fold_left_direct { 1, 1, false }
>> @@ -3702,6 +3703,33 @@ expand_while_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
>>      emit_move_insn (lhs_rtx, ops[0].value);
>> }
>> +/* Expand WHILE_LEN call STMT using optab OPTAB.  */
>> +static void
>> +expand_while_len_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
>> +{
>> +  expand_operand ops[3];
>> +  tree rhs_type[2];
>> +
>> +  tree lhs = gimple_call_lhs (stmt);
>> +  tree lhs_type = TREE_TYPE (lhs);
>> +  rtx lhs_rtx = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
>> +  create_output_operand (&ops[0], lhs_rtx, TYPE_MODE (lhs_type));
>> +
>> +  for (unsigned int i = 0; i < gimple_call_num_args (stmt); ++i)
>> +    {
>> +      tree rhs = gimple_call_arg (stmt, i);
>> +      rhs_type[i] = TREE_TYPE (rhs);
>> +      rtx rhs_rtx = expand_normal (rhs);
>> +      create_input_operand (&ops[i + 1], rhs_rtx, TYPE_MODE (rhs_type[i]));
>> +    }
>> +
>> +  insn_code icode = direct_optab_handler (optab, TYPE_MODE (rhs_type[0]));
>> +
>> +  expand_insn (icode, 3, ops);
>> +  if (!rtx_equal_p (lhs_rtx, ops[0].value))
>> +    emit_move_insn (lhs_rtx, ops[0].value);
>> +}
>> +
>> /* Expand a call to a convert-like optab using the operands in STMT.
>>     FN has a single output operand and NARGS input operands.  */
>> @@ -3843,6 +3871,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
>> #define direct_scatter_store_optab_supported_p convert_optab_supported_p
>> #define direct_len_store_optab_supported_p direct_optab_supported_p
>> #define direct_while_optab_supported_p convert_optab_supported_p
>> +#define direct_while_len_optab_supported_p direct_optab_supported_p
>> #define direct_fold_extract_optab_supported_p direct_optab_supported_p
>> #define direct_fold_left_optab_supported_p direct_optab_supported_p
>> #define direct_mask_fold_left_optab_supported_p direct_optab_supported_p
>> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
>> index 7fe742c2ae7..3a933abff5d 100644
>> --- a/gcc/internal-fn.def
>> +++ b/gcc/internal-fn.def
>> @@ -153,6 +153,7 @@ DEF_INTERNAL_OPTAB_FN (VEC_SET, 0, vec_set, vec_set)
>> DEF_INTERNAL_OPTAB_FN (LEN_STORE, 0, len_store, len_store)
>> DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CONST | ECF_NOTHROW, while_ult, while)
>> +DEF_INTERNAL_OPTAB_FN (WHILE_LEN, ECF_CONST | ECF_NOTHROW, while_len, while_len)
>> DEF_INTERNAL_OPTAB_FN (CHECK_RAW_PTRS, ECF_CONST | ECF_NOTHROW,
>>        check_raw_ptrs, check_ptrs)
>> DEF_INTERNAL_OPTAB_FN (CHECK_WAR_PTRS, ECF_CONST | ECF_NOTHROW,
>> diff --git a/gcc/optabs.def b/gcc/optabs.def
>> index 695f5911b30..f5938bd2c24 100644
>> --- a/gcc/optabs.def
>> +++ b/gcc/optabs.def
>> @@ -476,3 +476,4 @@ OPTAB_DC (vec_series_optab, "vec_series$a", VEC_SERIES)
>> OPTAB_D (vec_shl_insert_optab, "vec_shl_insert_$a")
>> OPTAB_D (len_load_optab, "len_load_$a")
>> OPTAB_D (len_store_optab, "len_store_$a")
>> +OPTAB_D (while_len_optab, "while_len$a")
>> diff --git a/gcc/tree-ssa-loop-manip.cc b/gcc/tree-ssa-loop-manip.cc
>> index 09acc1c94cc..cdbf280e249 100644
>> --- a/gcc/tree-ssa-loop-manip.cc
>> +++ b/gcc/tree-ssa-loop-manip.cc
>> @@ -59,14 +59,14 @@ static bitmap_obstack loop_renamer_obstack;
>> void
>> create_iv (tree base, tree step, tree var, class loop *loop,
>>    gimple_stmt_iterator *incr_pos, bool after,
>> -    tree *var_before, tree *var_after)
>> +    tree *var_before, tree *var_after, enum tree_code code)
>> {
>>    gassign *stmt;
>>    gphi *phi;
>>    tree initial, step1;
>>    gimple_seq stmts;
>>    tree vb, va;
>> -  enum tree_code incr_op = PLUS_EXPR;
>> +  enum tree_code incr_op = code;
>>    edge pe = loop_preheader_edge (loop);
>>    if (var != NULL_TREE)
>> diff --git a/gcc/tree-ssa-loop-manip.h b/gcc/tree-ssa-loop-manip.h
>> index d49273a3987..da755320a3a 100644
>> --- a/gcc/tree-ssa-loop-manip.h
>> +++ b/gcc/tree-ssa-loop-manip.h
>> @@ -23,7 +23,7 @@ along with GCC; see the file COPYING3.  If not see
>> typedef void (*transform_callback)(class loop *, void *);
>> extern void create_iv (tree, tree, tree, class loop *, gimple_stmt_iterator *,
>> -        bool, tree *, tree *);
>> +        bool, tree *, tree *, enum tree_code = PLUS_EXPR);
>> extern void rewrite_into_loop_closed_ssa (bitmap, unsigned);
>> extern void verify_loop_closed_ssa (bool, class loop * = NULL);
>> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
>> index f60fa50e8f4..f3cd6c51d2e 100644
>> --- a/gcc/tree-vect-loop-manip.cc
>> +++ b/gcc/tree-vect-loop-manip.cc
>> @@ -682,6 +682,173 @@ vect_set_loop_controls_directly (class loop *loop, loop_vec_info loop_vinfo,
>>    return next_ctrl;
>> }
>> +/* Helper for vect_set_loop_condition_partial_vectors.  Generate definitions
>> +   for all the rgroup controls in RGC and return a control that is nonzero
>> +   when the loop needs to iterate.  Add any new preheader statements to
>> +   PREHEADER_SEQ.  Use LOOP_COND_GSI to insert code before the exit gcond.
>> +
>> +   RGC belongs to loop LOOP.  The loop originally iterated NITERS
>> +   times and has been vectorized according to LOOP_VINFO.
>> +
>> +   Unlike vect_set_loop_controls_directly which is iterating from 0-based IV
>> +   to TEST_LIMIT - bias.
>> +
>> +   In vect_set_loop_controls_by_while_len, we are iterating from start at
>> +   IV = TEST_LIMIT - bias and keep subtract IV by the length calculated by
>> +   IFN_WHILE_LEN pattern.
>> +
>> +   Note: the cost of the code generated by this function is modeled
>> +   by vect_estimate_min_profitable_iters, so changes here may need
>> +   corresponding changes there.
>> +
>> +   1. Single rgroup, the Gimple IR should be:
>> +
>> + <bb 3>
>> + _19 = (unsigned long) n_5(D);
>> + ...
>> +
>> + <bb 4>:
>> + ...
>> + # ivtmp_20 = PHI <ivtmp_21(4), _19(3)>
>> + ...
>> + _22 = .WHILE_LEN (ivtmp_20, vf);
>> + ...
>> + vector statement (use _22);
>> + ...
>> + ivtmp_21 = ivtmp_20 - _22;
>> + ...
>> + if (ivtmp_21 != 0)
>> +   goto <bb 4>; [75.00%]
>> + else
>> +   goto <bb 5>; [25.00%]
>> +
>> + <bb 5>
>> + return;
>> +
>> +   Note: IFN_WHILE_LEN will guarantee "ivtmp_21 = ivtmp_20 - _22" never
>> +   underflow 0.
>> +
>> +   2. Multiple rgroup, the Gimple IR should be:
>> +
>> + <bb 3>
>> + _70 = (unsigned long) bnd.7_52;
>> + _71 = _70 * 2;
>> + _72 = MAX_EXPR <_71, 4>;
>> + _73 = _72 + 18446744073709551612;
>> + ...
>> +
>> + <bb 4>:
>> + ...
>> + # ivtmp_74 = PHI <ivtmp_75(6), _73(12)>
>> + # ivtmp_77 = PHI <ivtmp_78(6), _71(12)>
>> + _76 = .WHILE_LEN (ivtmp_74, vf * nitems_per_ctrl);
>> + _79 = .WHILE_LEN (ivtmp_77, vf * nitems_per_ctrl);
>> + ...
>> + vector statement (use _79);
>> + ...
>> + vector statement (use _76);
>> + ...
>> + _65 = _79 / 2;
>> + vector statement (use _65);
>> + ...
>> + _68 = _76 / 2;
>> + vector statement (use _68);
>> + ...
>> + ivtmp_78 = ivtmp_77 - _79;
>> + ivtmp_75 = ivtmp_74 - _76;
>> + ...
>> + if (ivtmp_78 != 0)
>> +   goto <bb 4>; [75.00%]
>> + else
>> +   goto <bb 5>; [25.00%]
>> +
>> + <bb 5>
>> + return;
>> +
>> +*/
>> +
>> +static tree
>> +vect_set_loop_controls_by_while_len (class loop *loop, loop_vec_info loop_vinfo,
>> +      gimple_seq *preheader_seq,
>> +      gimple_seq *header_seq,
>> +      rgroup_controls *rgc, tree niters)
>> +{
>> +  tree compare_type = LOOP_VINFO_RGROUP_COMPARE_TYPE (loop_vinfo);
>> +  tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
>> +  /* We are not allowing masked approach in WHILE_LEN.  */
>> +  gcc_assert (!LOOP_VINFO_FULLY_MASKED_P (loop_vinfo));
>> +
>> +  tree ctrl_type = rgc->type;
>> +  unsigned int nitems_per_iter = rgc->max_nscalars_per_iter * rgc->factor;
>> +  poly_uint64 nitems_per_ctrl = TYPE_VECTOR_SUBPARTS (ctrl_type) * rgc->factor;
>> +  poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
>> +
>> +  /* Calculate the maximum number of item values that the rgroup
>> +     handles in total, the number that it handles for each iteration
>> +     of the vector loop.  */
>> +  tree nitems_total = niters;
>> +  if (nitems_per_iter != 1)
>> +    {
>> +      /* We checked before setting LOOP_VINFO_USING_PARTIAL_VECTORS_P that
>> + these multiplications don't overflow.  */
>> +      tree compare_factor = build_int_cst (compare_type, nitems_per_iter);
>> +      nitems_total = gimple_build (preheader_seq, MULT_EXPR, compare_type,
>> +    nitems_total, compare_factor);
>> +    }
>> +
>> +  /* Convert the comparison value to the IV type (either a no-op or
>> +     a promotion).  */
>> +  nitems_total = gimple_convert (preheader_seq, iv_type, nitems_total);
>> +
>> +  /* Create an induction variable that counts the number of items
>> +     processed.  */
>> +  tree index_before_incr, index_after_incr;
>> +  gimple_stmt_iterator incr_gsi;
>> +  bool insert_after;
>> +  standard_iv_increment_position (loop, &incr_gsi, &insert_after);
>> +
>> +  /* Test the decremented IV, which will never underflow 0 since we have
>> +     IFN_WHILE_LEN to gurantee that.  */
>> +  tree test_limit = nitems_total;
>> +
>> +  /* Provide a definition of each control in the group.  */
>> +  tree ctrl;
>> +  unsigned int i;
>> +  FOR_EACH_VEC_ELT_REVERSE (rgc->controls, i, ctrl)
>> +    {
>> +      /* Previous controls will cover BIAS items.  This control covers the
>> + next batch.  */
>> +      poly_uint64 bias = nitems_per_ctrl * i;
>> +      tree bias_tree = build_int_cst (iv_type, bias);
>> +
>> +      /* Rather than have a new IV that starts at TEST_LIMIT and goes down to
>> + BIAS, prefer to use the same TEST_LIMIT - BIAS based IV for each
>> + control and adjust the bound down by BIAS.  */
>> +      tree this_test_limit = test_limit;
>> +      if (i != 0)
>> + {
>> +   this_test_limit = gimple_build (preheader_seq, MAX_EXPR, iv_type,
>> +   this_test_limit, bias_tree);
>> +   this_test_limit = gimple_build (preheader_seq, MINUS_EXPR, iv_type,
>> +   this_test_limit, bias_tree);
>> + }
>> +
>> +      /* Create decrement IV.  */
>> +      create_iv (this_test_limit, ctrl, NULL_TREE, loop, &incr_gsi,
>> + insert_after, &index_before_incr, &index_after_incr,
>> + MINUS_EXPR);
>> +
>> +      poly_uint64 final_vf = vf * nitems_per_iter;
>> +      tree vf_step = build_int_cst (iv_type, final_vf);
>> +      tree res_len = gimple_build (header_seq, IFN_WHILE_LEN, iv_type,
>> +    index_before_incr, vf_step);
>> +      gassign *assign = gimple_build_assign (ctrl, res_len);
>> +      gimple_seq_add_stmt (header_seq, assign);
>> +    }
>> +
>> +  return index_after_incr;
>> +}
>> +
>> /* Set up the iteration condition and rgroup controls for LOOP, given
>>     that LOOP_VINFO_USING_PARTIAL_VECTORS_P is true for the vectorized
>>     loop.  LOOP_VINFO describes the vectorization of LOOP.  NITERS is
>> @@ -703,6 +870,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
>>    bool use_masks_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
>>    tree compare_type = LOOP_VINFO_RGROUP_COMPARE_TYPE (loop_vinfo);
>> +  tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
>>    unsigned int compare_precision = TYPE_PRECISION (compare_type);
>>    tree orig_niters = niters;
>> @@ -757,12 +925,18 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
>> bool might_wrap_p = vect_rgroup_iv_might_wrap_p (loop_vinfo, rgc);
>> /* Set up all controls for this group.  */
>> - test_ctrl = vect_set_loop_controls_directly (loop, loop_vinfo,
>> -      &preheader_seq,
>> -      &header_seq,
>> -      loop_cond_gsi, rgc,
>> -      niters, niters_skip,
>> -      might_wrap_p);
>> + if (direct_internal_fn_supported_p (IFN_WHILE_LEN, iv_type,
>> +     OPTIMIZE_FOR_SPEED))
>> +   test_ctrl
>> +     = vect_set_loop_controls_by_while_len (loop, loop_vinfo,
>> +    &preheader_seq, &header_seq,
>> +    rgc, niters);
>> + else
>> +   test_ctrl
>> +     = vect_set_loop_controls_directly (loop, loop_vinfo, &preheader_seq,
>> +        &header_seq, loop_cond_gsi, rgc,
>> +        niters, niters_skip,
>> +        might_wrap_p);
>>        }
>>    /* Emit all accumulated statements.  */
>> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
>> index 1ba9f18d73e..5bffd9a6322 100644
>> --- a/gcc/tree-vect-loop.cc
>> +++ b/gcc/tree-vect-loop.cc
>> @@ -10360,12 +10360,14 @@ vect_record_loop_len (loop_vec_info loop_vinfo, vec_loop_lens *lens,
>>     rgroup that operates on NVECTORS vectors, where 0 <= INDEX < NVECTORS.  */
>> tree
>> -vect_get_loop_len (loop_vec_info loop_vinfo, vec_loop_lens *lens,
>> -    unsigned int nvectors, unsigned int index)
>> +vect_get_loop_len (gimple_stmt_iterator *gsi, loop_vec_info loop_vinfo,
>> +    vec_loop_lens *lens, unsigned int nvectors, tree vectype,
>> +    unsigned int index)
>> {
>>    rgroup_controls *rgl = &(*lens)[nvectors - 1];
>> -  bool use_bias_adjusted_len =
>> -    LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) != 0;
>> +  bool use_bias_adjusted_len
>> +    = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) != 0;
>> +  tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
>>    /* Populate the rgroup's len array, if this is the first time we've
>>       used it.  */
>> @@ -10386,8 +10388,8 @@ vect_get_loop_len (loop_vec_info loop_vinfo, vec_loop_lens *lens,
>>   if (use_bias_adjusted_len)
>>     {
>>       gcc_assert (i == 0);
>> -       tree adjusted_len =
>> - make_temp_ssa_name (len_type, NULL, "adjusted_loop_len");
>> +       tree adjusted_len
>> + = make_temp_ssa_name (len_type, NULL, "adjusted_loop_len");
>>       SSA_NAME_DEF_STMT (adjusted_len) = gimple_build_nop ();
>>       rgl->bias_adjusted_ctrl = adjusted_len;
>>     }
>> @@ -10396,6 +10398,27 @@ vect_get_loop_len (loop_vec_info loop_vinfo, vec_loop_lens *lens,
>>    if (use_bias_adjusted_len)
>>      return rgl->bias_adjusted_ctrl;
>> +  else if (direct_internal_fn_supported_p (IFN_WHILE_LEN, iv_type,
>> +    OPTIMIZE_FOR_SPEED))
>> +    {
>> +      tree loop_len = rgl->controls[index];
>> +      poly_int64 nunits1 = TYPE_VECTOR_SUBPARTS (rgl->type);
>> +      poly_int64 nunits2 = TYPE_VECTOR_SUBPARTS (vectype);
>> +      if (maybe_ne (nunits1, nunits2))
>> + {
>> +   /* A loop len for data type X can be reused for data type Y
>> +      if X has N times more elements than Y and if Y's elements
>> +      are N times bigger than X's.  */
>> +   gcc_assert (multiple_p (nunits1, nunits2));
>> +   unsigned int factor = exact_div (nunits1, nunits2).to_constant ();
>> +   gimple_seq seq = NULL;
>> +   loop_len = gimple_build (&seq, RDIV_EXPR, iv_type, loop_len,
>> +    build_int_cst (iv_type, factor));
>> +   if (seq)
>> +     gsi_insert_seq_before (gsi, seq, GSI_SAME_STMT);
>> + }
>> +      return loop_len;
>> +    }
>>    else
>>      return rgl->controls[index];
>> }
>> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
>> index efa2d0daa52..708c8a1d806 100644
>> --- a/gcc/tree-vect-stmts.cc
>> +++ b/gcc/tree-vect-stmts.cc
>> @@ -8653,8 +8653,9 @@ vectorizable_store (vec_info *vinfo,
>>       else if (loop_lens)
>> {
>>   tree final_len
>> -     = vect_get_loop_len (loop_vinfo, loop_lens,
>> - vec_num * ncopies, vec_num * j + i);
>> +     = vect_get_loop_len (gsi, loop_vinfo, loop_lens,
>> + vec_num * ncopies, vectype,
>> + vec_num * j + i);
>>   tree ptr = build_int_cst (ref_type, align * BITS_PER_UNIT);
>>   machine_mode vmode = TYPE_MODE (vectype);
>>   opt_machine_mode new_ovmode
>> @@ -10009,8 +10010,8 @@ vectorizable_load (vec_info *vinfo,
>>     else if (loop_lens && memory_access_type != VMAT_INVARIANT)
>>       {
>> tree final_len
>> -   = vect_get_loop_len (loop_vinfo, loop_lens,
>> -        vec_num * ncopies,
>> +   = vect_get_loop_len (gsi, loop_vinfo, loop_lens,
>> +        vec_num * ncopies, vectype,
>>        vec_num * j + i);
>> tree ptr = build_int_cst (ref_type,
>>   align * BITS_PER_UNIT);
>> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
>> index 9cf2fb23fe3..e5cf38caf4b 100644
>> --- a/gcc/tree-vectorizer.h
>> +++ b/gcc/tree-vectorizer.h
>> @@ -2293,8 +2293,8 @@ extern tree vect_get_loop_mask (gimple_stmt_iterator *, vec_loop_masks *,
>> unsigned int, tree, unsigned int);
>> extern void vect_record_loop_len (loop_vec_info, vec_loop_lens *, unsigned int,
>>   tree, unsigned int);
>> -extern tree vect_get_loop_len (loop_vec_info, vec_loop_lens *, unsigned int,
>> -        unsigned int);
>> +extern tree vect_get_loop_len (gimple_stmt_iterator *, loop_vec_info,
>> +        vec_loop_lens *, unsigned int, tree, unsigned int);
>> extern gimple_seq vect_gen_len (tree, tree, tree, tree);
>> extern stmt_vec_info info_for_reduction (vec_info *, stmt_vec_info);
>> extern bool reduction_fn_for_scalar_code (code_helper, internal_fn *);
>

next prev parent reply	other threads:[~2023-04-20  9:11 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-07  1:47 juzhe.zhong
2023-04-07  3:23 ` Li, Pan2
2023-04-11 12:12 ` juzhe.zhong
2023-04-11 12:44   ` Richard Sandiford
2023-04-12  7:00     ` Richard Biener
2023-04-12  8:00       ` juzhe.zhong
2023-04-12  8:42         ` Richard Biener
2023-04-12  9:15           ` juzhe.zhong
2023-04-12  9:29             ` Richard Biener
2023-04-12  9:42               ` Robin Dapp
2023-04-12 11:17               ` Richard Sandiford
2023-04-12 11:37                 ` juzhe.zhong
2023-04-12 12:24                   ` Richard Sandiford
2023-04-12 14:18                     ` 钟居哲
2023-04-13  6:47                       ` Richard Biener
2023-04-13  9:54                         ` juzhe.zhong
2023-04-18  9:32                           ` Richard Sandiford
2023-04-12 12:56                   ` Kewen.Lin
2023-04-12 13:22                     ` 钟居哲
2023-04-13  7:29                       ` Kewen.Lin
2023-04-13 13:44                         ` 钟居哲
2023-04-14  2:54                           ` Kewen.Lin
2023-04-14  3:09                             ` juzhe.zhong
2023-04-14  5:40                               ` Kewen.Lin
2023-04-14  3:39                             ` juzhe.zhong
2023-04-14  6:31                               ` Kewen.Lin
2023-04-14  6:39                                 ` juzhe.zhong
2023-04-14  7:41                                   ` Kewen.Lin
2023-04-14  6:52                               ` Richard Biener
2023-04-12 11:42                 ` Richard Biener
     [not found]           ` <2023041217154958074655@rivai.ai>
2023-04-12  9:20             ` juzhe.zhong
2023-04-19 21:53 ` 钟居哲
2023-04-20  8:52   ` Richard Sandiford
2023-04-20  8:57     ` juzhe.zhong
2023-04-20  9:11       ` Richard Sandiford [this message]
2023-04-20  9:19         ` juzhe.zhong
2023-04-20  9:22           ` Richard Sandiford
2023-04-20  9:50             ` Richard Biener
2023-04-20  9:54               ` Richard Sandiford
2023-04-20 10:38                 ` juzhe.zhong
2023-04-20 12:05                   ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mptzg72dil5.fsf@arm.com \
    --to=richard.sandiford@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jeffreyalaw@gmail.com \
    --cc=juzhe.zhong@rivai.ai \
    --cc=rguenther@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).