public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Sandiford <richard.sandiford@arm.com>
To: "juzhe.zhong" <juzhe.zhong@rivai.ai>
Cc: "gcc-patches\@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
	 "kito.cheng\@gmail.com" <kito.cheng@gmail.com>,
	 "palmer\@dabbelt.com" <palmer@dabbelt.com>,
	 "richard.guenther\@gmail.com" <richard.guenther@gmail.com>
Subject: Re: [PATCH V6] VECT: Add decrement IV support in Loop Vectorizer
Date: Fri, 12 May 2023 14:25:59 +0100	[thread overview]
Message-ID: <mpt5y8xadg8.fsf@arm.com> (raw)
In-Reply-To: <A80AA82F40C90097+853F3055-CCC0-44A5-B9C8-A931F855E0FB@rivai.ai> (juzhe zhong's message of "Fri, 12 May 2023 20:41:35 +0800")

"juzhe.zhong" <juzhe.zhong@rivai.ai> writes:
> Hi, Richard.  For "can iterate more than once", is it correct use the condition
> "LOOP_LENS ().length >1".     

No, that says whether any LOAD_LENs or STORE_LENs operate on multiple
vectors, rather than just single vectors.

I meant: whether the vector loop body might be executed more than once
(i.e. whether the branch-back condition can be true).

This is true for a scalar loop that goes from 0 to some unbounded
variable n.  It's false for a scalar loop that goes from 0 to 6,
if the vectors are known to have at least 8 elements.

Thanks,
Richard

> ---- Replied Message ----
>
> From      Richard Sandiford<richard.sandiford@arm.com>
>
> Date      05/12/2023 19:39
>
> To        juzhe.zhong<juzhe.zhong@rivai.ai>
>
> Cc        gcc-patches@gcc.gnu.org<gcc-patches@gcc.gnu.org>,
>           kito.cheng@gmail.com<kito.cheng@gmail.com>,
>           palmer@dabbelt.com<palmer@dabbelt.com>,
>           richard.guenther@gmail.com<richard.guenther@gmail.com>
>
> Subject   Re: [PATCH V6] VECT: Add decrement IV support in Loop Vectorizer
>
> "juzhe.zhong" <juzhe.zhong@rivai.ai> writes:
>> Thanks Richard.
>>  I will do that as you suggested. I have a question for the first patch. How
> to
>> enable decrement IV? Should I add a target hook or something to let target
>> decide whether enable decrement IV?
>
> At the moment, the only other targets that use IFN_LOAD_LEN and
> IFN_STORE_LEN are PowerPC and s390.  Both targets default to
> --param vect-partial-vector-usage=1 (i.e. use partial vectors
> for epilogues only).
>
> So I think the condition should be that the loop:
>
>  (a) uses length "controls"; and
>  (b) can iterate more than once
>
> No target checks should be needed.
>
> Thanks,
> Richard
>
>> ---- Replied Message ----
>>
>> From      Richard Sandiford<richard.sandiford@arm.com>
>>
>> Date      05/12/2023 19:08
>>
>> To        juzhe.zhong@rivai.ai<juzhe.zhong@rivai.ai>
>>
>> Cc        gcc-patches@gcc.gnu.org<gcc-patches@gcc.gnu.org>,
>>           kito.cheng@gmail.com<kito.cheng@gmail.com>,
>>           palmer@dabbelt.com<palmer@dabbelt.com>,
>>           richard.guenther@gmail.com<richard.guenther@gmail.com>
>>
>> Subject   Re: [PATCH V6] VECT: Add decrement IV support in Loop Vectorizer
>>
>> juzhe.zhong@rivai.ai writes:
>>> From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
>>>
>>> 1. Fix document description according Jeff && Richard.
>>> 2. Add LOOP_VINFO_USING_SELECT_VL_P for single rgroup.
>>> 3. Add LOOP_VINFO_USING_SLP_ADJUSTED_LEN_P for SLP multiple rgroup.
>>>
>>> Fix bugs for V5 after testing:
>>> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618209.html
>>>
>>> gcc/ChangeLog:
>>>
>>>         * doc/md.texi: Add seletc_vl pattern.
>>>         * internal-fn.def (SELECT_VL): New ifn.
>>>         * optabs.def (OPTAB_D): New optab.
>>>         * tree-vect-loop-manip.cc (vect_adjust_loop_lens): New function.
>>>         (vect_set_loop_controls_by_select_vl): Ditto.
>>>         (vect_set_loop_condition_partial_vectors): Add loop control for
>> decrement IV.
>>>         * tree-vect-loop.cc (vect_get_loop_len): Adjust loop len for SLP.
>>>         * tree-vect-stmts.cc (get_select_vl_data_ref_ptr): New function.
>>>         (vectorizable_store): Support data reference IV added by outcome of
>> SELECT_VL.
>>>         (vectorizable_load): Ditto.
>>>         * tree-vectorizer.h (LOOP_VINFO_USING_SELECT_VL_P): New macro.
>>>         (LOOP_VINFO_USING_SLP_ADJUSTED_LEN_P): Ditto.
>>>         (vect_get_loop_len): Adjust loop len for SLP.
>>>
>>> ---
>>>  gcc/doc/md.texi             |  36 ++++
>>>  gcc/internal-fn.def         |   1 +
>>>  gcc/optabs.def              |   1 +
>>>  gcc/tree-vect-loop-manip.cc | 380 +++++++++++++++++++++++++++++++++++-
>>>  gcc/tree-vect-loop.cc       |  31 ++-
>>>  gcc/tree-vect-stmts.cc      |  79 +++++++-
>>>  gcc/tree-vectorizer.h       |  12 +-
>>>  7 files changed, 526 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
>>> index 8ebce31ba78..a94ffc4456d 100644
>>> --- a/gcc/doc/md.texi
>>> +++ b/gcc/doc/md.texi
>>> @@ -4974,6 +4974,42 @@ for (i = 1; i < operand3; i++)
>>>    operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
>>>  @end smallexample
>>>  
>>> +@cindex @code{select_vl@var{m}} instruction pattern
>>> +@item @code{select_vl@var{m}}
>>> +Set operand 0 to the number of active elements in a vector to be updated
>>> +in a loop iteration based on the total number of elements to be updated,
>>> +the vectorization factor and vector properties of the target.
>>> +operand 1 is the total elements in the vector to be updated.
>>> +operand 2 is the vectorization factor.
>>> +The value of operand 0 is target dependent and flexible in each iteration.
>>> +The operation of this pattern can be:
>>> +
>>> +@smallexample
>>> +Case 1:
>>> +operand0 = MIN (operand1, operand2);
>>> +operand2 can be const_poly_int or poly_int related to vector mode size.
>>> +Some target like RISC-V has a standalone instruction to get MIN (n, MODE
>> SIZE) so
>>> +that we can reduce a use of general purpose register.
>>> +
>>> +In this case, only the last iteration of the loop is partial iteration.
>>> +@end smallexample
>>> +
>>> +@smallexample
>>> +Case 2:
>>> +if (operand1 <= operand2)
>>> +  operand0 = operand1;
>>> +else if (operand1 < 2 * operand2)
>>> +  operand0 = ceil (operand1 / 2);
>>> +else
>>> +  operand0 = operand2;
>>> +
>>> +This case will evenly distribute work over the last 2 iterations of a
>> stripmine loop.
>>> +@end smallexample
>>> +
>>> +The output of this pattern is not only used as IV of loop control counter,
>> but also
>>> +is used as the IV of address calculation with multiply/shift operation.
> This
>> allows
>>> +dynamic adjustment of the number of elements processed each loop iteration.
>>> +
>>
>> I don't think we need to restrict the definition to the two RVV cases.
>> How about:
>>
>> -----------------------------------------------------------------------
>> Set operand 0 to the number of scalar iterations that should be handled
>> by one iteration of a vector loop.  Operand 1 is the total number of
>> scalar iterations that the loop needs to process and operand 2 is a
>> maximum bound on the result (also known as the maximum ``vectorization
>> factor'').
>>
>> The maximum value of operand 0 is given by:
>> @smallexample
>> operand0 = MIN (operand1, operand2)
>> @end smallexample
>> However, targets might choose a lower value than this, based on
>> target-specific criteria.  Each iteration of the vector loop might
>> therefore process a different number of scalar iterations, which in turn
>> means that induction variables will have a variable step.  Because of
>> this, it is generally not useful to define this instruction if it will
>> always calculate the maximum value.
>>
>> This optab is only useful on targets that implement @samp{len_load_@var{m}}
>> and/or @samp{len_store_@var{m}}.
>> -----------------------------------------------------------------------
>>
>>>  @cindex @code{check_raw_ptrs@var{m}} instruction pattern
>>>  @item @samp{check_raw_ptrs@var{m}}
>>>  Check whether, given two pointers @var{a} and @var{b} and a length @var
>> {len},
>>> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
>>> index 7fe742c2ae7..6f6fa7d37f9 100644
>>> --- a/gcc/internal-fn.def
>>> +++ b/gcc/internal-fn.def
>>> @@ -153,6 +153,7 @@ DEF_INTERNAL_OPTAB_FN (VEC_SET, 0, vec_set, vec_set)
>>>  DEF_INTERNAL_OPTAB_FN (LEN_STORE, 0, len_store, len_store)
>>>  
>>>  DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CONST | ECF_NOTHROW, while_ult,
> while)
>>> +DEF_INTERNAL_OPTAB_FN (SELECT_VL, ECF_CONST | ECF_NOTHROW, select_vl,
>> binary)
>>>  DEF_INTERNAL_OPTAB_FN (CHECK_RAW_PTRS, ECF_CONST | ECF_NOTHROW,
>>>                 check_raw_ptrs, check_ptrs)
>>>  DEF_INTERNAL_OPTAB_FN (CHECK_WAR_PTRS, ECF_CONST | ECF_NOTHROW,
>>> diff --git a/gcc/optabs.def b/gcc/optabs.def
>>> index 695f5911b30..b637471b76e 100644
>>> --- a/gcc/optabs.def
>>> +++ b/gcc/optabs.def
>>> @@ -476,3 +476,4 @@ OPTAB_DC (vec_series_optab, "vec_series$a", VEC_SERIES)
>>>  OPTAB_D (vec_shl_insert_optab, "vec_shl_insert_$a")
>>>  OPTAB_D (len_load_optab, "len_load_$a")
>>>  OPTAB_D (len_store_optab, "len_store_$a")
>>> +OPTAB_D (select_vl_optab, "select_vl$a")
>>> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
>>> index ff6159e08d5..81334f4f171 100644
>>> --- a/gcc/tree-vect-loop-manip.cc
>>> +++ b/gcc/tree-vect-loop-manip.cc
>>> @@ -385,6 +385,353 @@ vect_maybe_permute_loop_masks (gimple_seq *seq,
>> rgroup_controls *dest_rgm,
>>>    return false;
>>>  }
>>>  
>>> +/* Try to use adjust loop lens for non-SLP multiple-rgroups.
>>> +
>>> +     _36 = MIN_EXPR <ivtmp_34, POLY_INT_CST [8, 8]>;
>>> +
>>> +     First length (MIN (X, VF/N)):
>>> +       loop_len_15 = MIN_EXPR <_36, POLY_INT_CST [2, 2]>;
>>> +
>>> +     Second length (X - MIN (X, 1 * VF/N)):
>>> +       loop_len_16 = _36 - loop_len_15;
>>> +
>>> +     Third length (X - MIN (X, 2 * VF/N)):
>>> +       _38 = MIN_EXPR <_36, POLY_INT_CST [4, 4]>;
>>> +       loop_len_17 = _36 - _38;
>>> +
>>> +     Forth length (X - MIN (X, 3 * VF/N)):
>>> +       _39 = MIN_EXPR <_36, POLY_INT_CST [6, 6]>;
>>> +       loop_len_18 = _36 - _39;  */
>>> +
>>> +static void
>>> +vect_adjust_loop_lens (tree iv_type, gimple_seq *seq, rgroup_controls
>> *dest_rgm,
>>> +               rgroup_controls *src_rgm)
>>> +{
>>> +  tree ctrl_type = dest_rgm->type;
>>> +  poly_uint64 nitems_per_ctrl
>>> +    = TYPE_VECTOR_SUBPARTS (ctrl_type) * dest_rgm->factor;
>>> +
>>> +  for (unsigned int i = 0; i < dest_rgm->controls.length (); ++i)
>>> +    {
>>> +      tree src = src_rgm->controls[i / dest_rgm->controls.length ()];
>>> +      tree dest = dest_rgm->controls[i];
>>> +      gassign *stmt;
>>> +      if (i == 0)
>>> +    {
>>> +      /* MIN (X, VF*I/N) capped to the range [0, VF/N].  */
>>> +      tree factor = build_int_cst (iv_type, nitems_per_ctrl);
>>> +      stmt = gimple_build_assign (dest, MIN_EXPR, src, factor);
>>> +      gimple_seq_add_stmt (seq, stmt);
>>> +    }
>>> +      else
>>> +    {
>>> +      /* (X - MIN (X, VF*I/N)) capped to the range [0, VF/N].  */
>>> +      tree factor = build_int_cst (iv_type, nitems_per_ctrl * i);
>>> +      tree temp = make_ssa_name (iv_type);
>>> +      stmt = gimple_build_assign (temp, MIN_EXPR, src, factor);
>>> +      gimple_seq_add_stmt (seq, stmt);
>>> +      stmt = gimple_build_assign (dest, MINUS_EXPR, src, temp);
>>> +      gimple_seq_add_stmt (seq, stmt);
>>> +    }
>>> +    }
>>> +}
>>> +
>>> +/* Helper for vect_set_loop_condition_partial_vectors.  Generate
> definitions
>>> +   for all the rgroup controls in RGC and return a control that is nonzero
>>> +   when the loop needs to iterate.  Add any new preheader statements to
>>> +   PREHEADER_SEQ.  Use LOOP_COND_GSI to insert code before the exit gcond.
>>> +
>>> +   RGC belongs to loop LOOP.  The loop originally iterated NITERS
>>> +   times and has been vectorized according to LOOP_VINFO.
>>> +
>>> +   Unlike vect_set_loop_controls_directly which is iterating from 0-based
> IV
>>> +   to TEST_LIMIT - bias.
>>> +
>>> +   In vect_set_loop_controls_by_select_vl, we are iterating from start at
>>> +   IV = TEST_LIMIT - bias and keep subtract IV by the length calculated by
>>> +   IFN_SELECT_VL pattern.
>>> +
>>> +   1. Single rgroup, the Gimple IR should be:
>>> +
>>> +    # vectp_B.6_8 = PHI <vectp_B.6_13(6), &B(5)>
>>> +    # vectp_B.8_16 = PHI <vectp_B.8_17(6), &B(5)>
>>> +    # vectp_A.11_19 = PHI <vectp_A.11_20(6), &A(5)>
>>> +    # vectp_A.13_22 = PHI <vectp_A.13_23(6), &A(5)>
>>> +    # ivtmp_26 = PHI <ivtmp_27(6), _25(5)>
>>> +    _28 = .SELECT_VL (ivtmp_26, POLY_INT_CST [4, 4]);
>>> +    ivtmp_15 = _28 * 4;
>>> +    vect__1.10_18 = .LEN_LOAD (vectp_B.8_16, 128B, _28, 0);
>>> +    _1 = B[i_10];
>>> +    .LEN_STORE (vectp_A.13_22, 128B, _28, vect__1.10_18, 0);
>>> +    i_7 = i_10 + 1;
>>> +    vectp_B.8_17 = vectp_B.8_16 + ivtmp_15;
>>> +    vectp_A.13_23 = vectp_A.13_22 + ivtmp_15;
>>> +    ivtmp_27 = ivtmp_26 - _28;
>>> +    if (ivtmp_27 != 0)
>>> +      goto <bb 6>; [83.33%]
>>> +    else
>>> +      goto <bb 7>; [16.67%]
>>> +
>>> +   Note: We use the outcome of .SELECT_VL to adjust both loop control IV
> and
>>> +   data reference pointer IV.
>>> +
>>> +   1). The result of .SELECT_VL:
>>> +       _28 = .SELECT_VL (ivtmp_26, POLY_INT_CST [4, 4]);
>>> +       The _28 is not necessary to be VF in any iteration, instead, we
> allow
>>> +       _28 to be any value as long as _28 <= VF. Such flexible SELECT_VL
>>> +       pattern allows target have various flexible optimizations in vector
>>> +       loop iterations. Target like RISC-V has special application vector
>>> +       length calculation instruction which will distribute even workload
>>> +       in the last 2 iterations.
>>> +
>>> +       Other example is that we can allow even generate _28 <= VF / 2 so
>>> +       that some machine can run vector codes in low power mode.
>>> +
>>> +   2). Loop control IV:
>>> +       ivtmp_27 = ivtmp_26 - _28;
>>> +       if (ivtmp_27 != 0)
>>> +     goto <bb 6>; [83.33%]
>>> +       else
>>> +     goto <bb 7>; [16.67%]
>>> +
>>> +       This is the saturating-subtraction towards zero, the outcome of
>>> +       .SELECT_VL wil make ivtmp_27 never underflow zero.
>>> +
>>> +   3). Data reference pointer IV:
>>> +       ivtmp_15 = _28 * 4;
>>> +       vectp_B.8_17 = vectp_B.8_16 + ivtmp_15;
>>> +       vectp_A.13_23 = vectp_A.13_22 + ivtmp_15;
>>> +
>>> +       The pointer IV is adjusted accurately according to the .SELECT_VL.
>>> +
>>> +   2. Multiple rgroup, the Gimple IR should be:
>>> +
>>> +    # i_23 = PHI <i_20(6), 0(11)>
>>> +    # vectp_f.8_51 = PHI <vectp_f.8_52(6), f_15(D)(11)>
>>> +    # vectp_d.10_59 = PHI <vectp_d.10_60(6), d_18(D)(11)>
>>> +    # ivtmp_70 = PHI <ivtmp_71(6), _69(11)>
>>> +    # ivtmp_73 = PHI <ivtmp_74(6), _67(11)>
>>> +    _72 = MIN_EXPR <ivtmp_70, 16>;
>>> +    _75 = MIN_EXPR <ivtmp_73, 16>;
>>> +    _1 = i_23 * 2;
>>> +    _2 = (long unsigned int) _1;
>>> +    _3 = _2 * 2;
>>> +    _4 = f_15(D) + _3;
>>> +    _5 = _2 + 1;
>>> +    _6 = _5 * 2;
>>> +    _7 = f_15(D) + _6;
>>> +    .LEN_STORE (vectp_f.8_51, 128B, _75, { 1, 2, 1, 2, 1, 2, 1, 2 }, 0);
>>> +    vectp_f.8_56 = vectp_f.8_51 + 16;
>>> +    .LEN_STORE (vectp_f.8_56, 128B, _72, { 1, 2, 1, 2, 1, 2, 1, 2 }, 0);
>>> +    _8 = (long unsigned int) i_23;
>>> +    _9 = _8 * 4;
>>> +    _10 = d_18(D) + _9;
>>> +    _61 = _75 / 2;
>>> +    .LEN_STORE (vectp_d.10_59, 128B, _61, { 3, 3, 3, 3 }, 0);
>>> +    vectp_d.10_63 = vectp_d.10_59 + 16;
>>> +    _64 = _72 / 2;
>>> +    .LEN_STORE (vectp_d.10_63, 128B, _64, { 3, 3, 3, 3 }, 0);
>>> +    i_20 = i_23 + 1;
>>> +    vectp_f.8_52 = vectp_f.8_56 + 16;
>>> +    vectp_d.10_60 = vectp_d.10_63 + 16;
>>> +    ivtmp_74 = ivtmp_73 - _75;
>>> +    ivtmp_71 = ivtmp_70 - _72;
>>> +    if (ivtmp_74 != 0)
>>> +      goto <bb 6>; [83.33%]
>>> +    else
>>> +      goto <bb 13>; [16.67%]
>>
>> In the gimple examples, I think it would help to quote only the relevant
>> parts and use ellipsis to hide things that don't directly matter.
>> E.g. in the above samples, the old scalar code isn't relevant, whereas
>> it's difficult to follow the example without knowing how _69 and _67
>> relate to each other.  It would also help to say which scalar loop
>> is being vectorised here.
>>
>>> +
>>> +   Note: We DO NOT use .SELECT_VL in SLP auto-vectorization for multiple
>>> +   rgroups. Instead, we use MIN_EXPR to guarantee we always use VF as the
>>> +   iteration amount for mutiple rgroups.
>>> +
>>> +   The analysis of the flow of multiple rgroups:
>>> +    _72 = MIN_EXPR <ivtmp_70, 16>;
>>> +    _75 = MIN_EXPR <ivtmp_73, 16>;
>>> +    ...
>>> +    .LEN_STORE (vectp_f.8_51, 128B, _75, { 1, 2, 1, 2, 1, 2, 1, 2 }, 0);
>>> +    vectp_f.8_56 = vectp_f.8_51 + 16;
>>> +    .LEN_STORE (vectp_f.8_56, 128B, _72, { 1, 2, 1, 2, 1, 2, 1, 2 }, 0);
>>> +    ...
>>> +    _61 = _75 / 2;
>>> +    .LEN_STORE (vectp_d.10_59, 128B, _61, { 3, 3, 3, 3 }, 0);
>>> +    vectp_d.10_63 = vectp_d.10_59 + 16;
>>> +    _64 = _72 / 2;
>>> +    .LEN_STORE (vectp_d.10_63, 128B, _64, { 3, 3, 3, 3 }, 0);
>>> +
>>> +  We use _72 = MIN_EXPR <ivtmp_70, 16>; to generate the number of the
>> elements
>>> +  to be processed in each iteration.
>>> +
>>> +  The related STOREs:
>>> +    _72 = MIN_EXPR <ivtmp_70, 16>;
>>> +    .LEN_STORE (vectp_f.8_56, 128B, _72, { 1, 2, 1, 2, 1, 2, 1, 2 }, 0);
>>> +    _64 = _72 / 2;
>>> +    .LEN_STORE (vectp_d.10_63, 128B, _64, { 3, 3, 3, 3 }, 0);
>>> +  Since these 2 STOREs store 2 vectors that the second vector is half
>> elements
>>> +  of the first vector. So the length of second STORE will be _64 = _72 / 2;
>>> +  It's similar to the VIEW_CONVERT of handling masks in SLP.
>>
>>
>>
>>> +
>>> +  3. Multiple rgroups for non-SLP auto-vectorization.
>>> +
>>> +     # ivtmp_26 = PHI <ivtmp_27(4), _25(3)>
>>> +     # ivtmp.35_10 = PHI <ivtmp.35_11(4), ivtmp.35_1(3)>
>>> +     # ivtmp.36_2 = PHI <ivtmp.36_8(4), ivtmp.36_23(3)>
>>> +     _28 = MIN_EXPR <ivtmp_26, POLY_INT_CST [8, 8]>;
>>> +     loop_len_15 = MIN_EXPR <_28, POLY_INT_CST [4, 4]>;
>>> +     loop_len_16   = _28 - loop_len_15;
>>> +     _29 = (void *) ivtmp.35_10;
>>> +     _7 = &MEM <vector([4,4]) int> [(int *)_29];
>>> +     vect__1.25_17 = .LEN_LOAD (_7, 128B, loop_len_15, 0);
>>> +     _33 = _29 + POLY_INT_CST [16, 16];
>>> +     _34 = &MEM <vector([4,4]) int> [(int *)_33];
>>> +     vect__1.26_19 = .LEN_LOAD (_34, 128B, loop_len_16, 0);
>>> +     vect__2.27_20 = VEC_PACK_TRUNC_EXPR <vect__1.25_17, vect__1.26_19>;
>>> +     _30 = (void *) ivtmp.36_2;
>>> +     _31 = &MEM <vector([8,8]) short int> [(short int *)_30];
>>> +     .LEN_STORE (_31, 128B, _28, vect__2.27_20, 0);
>>> +     ivtmp_27 = ivtmp_26 - _28;
>>> +     ivtmp.35_11 = ivtmp.35_10 + POLY_INT_CST [32, 32];
>>> +     ivtmp.36_8 = ivtmp.36_2 + POLY_INT_CST [16, 16];
>>> +     if (ivtmp_27 != 0)
>>> +       goto <bb 4>; [83.33%]
>>> +     else
>>> +       goto <bb 5>; [16.67%]
>>> +
>>> +     The total length: _28 = MIN_EXPR <ivtmp_26, POLY_INT_CST [8, 8]>;
>>> +
>>> +     The length of first half vector:
>>> +       loop_len_15 = MIN_EXPR <_28, POLY_INT_CST [4, 4]>;
>>> +
>>> +     The length of second half vector:
>>> +       loop_len_15 = MIN_EXPR <_28, POLY_INT_CST [4, 4]>;
>>> +       loop_len_16 = _28 - loop_len_15;
>>> +
>>> +     1). _28 always <= POLY_INT_CST [8, 8].
>>> +     2). When _28 <= POLY_INT_CST [4, 4], second half vector is not
>> processed.
>>> +     3). When _28 > POLY_INT_CST [4, 4], second half vector is processed.
>>> +*/
>>> +
>>> +static tree
>>> +vect_set_loop_controls_by_select_vl (class loop *loop, loop_vec_info
>> loop_vinfo,
>>> +                     gimple_seq *preheader_seq,
>>> +                     gimple_seq *header_seq,
>>> +                     rgroup_controls *rgc, tree niters)
>>> +{
>>> +  tree compare_type = LOOP_VINFO_RGROUP_COMPARE_TYPE (loop_vinfo);
>>> +  tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
>>> +  /* We are not allowing masked approach in SELECT_VL.  */
>>> +  gcc_assert (!LOOP_VINFO_FULLY_MASKED_P (loop_vinfo));
>>> +
>>> +  tree ctrl_type = rgc->type;
>>> +  unsigned int nitems_per_iter = rgc->max_nscalars_per_iter * rgc->factor;
>>> +  poly_uint64 nitems_per_ctrl = TYPE_VECTOR_SUBPARTS (ctrl_type) * rgc->
>> factor;
>>> +  poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
>>> +
>>> +  /* Calculate the maximum number of item values that the rgroup
>>> +     handles in total, the number that it handles for each iteration
>>> +     of the vector loop.  */
>>> +  tree nitems_total = niters;
>>> +  if (nitems_per_iter != 1)
>>> +    {
>>> +      /* We checked before setting LOOP_VINFO_USING_PARTIAL_VECTORS_P that
>>> +     these multiplications don't overflow.  */
>>> +      tree compare_factor = build_int_cst (compare_type, nitems_per_iter);
>>> +      nitems_total = gimple_build (preheader_seq, MULT_EXPR, compare_type,
>>> +                   nitems_total, compare_factor);
>>> +    }
>>> +
>>> +  /* Convert the comparison value to the IV type (either a no-op or
>>> +     a promotion).  */
>>> +  nitems_total = gimple_convert (preheader_seq, iv_type, nitems_total);
>>> +
>>> +  /* Create an induction variable that counts the number of items
>>> +     processed.  */
>>> +  tree index_before_incr, index_after_incr;
>>> +  gimple_stmt_iterator incr_gsi;
>>> +  bool insert_after;
>>> +  standard_iv_increment_position (loop, &incr_gsi, &insert_after);
>>> +
>>> +  /* Test the decremented IV, which will never underflow 0 since we have
>>> +     IFN_SELECT_VL to gurantee that.  */
>>> +  tree test_limit = nitems_total;
>>> +
>>> +  /* Provide a definition of each control in the group.  */
>>> +  tree ctrl;
>>> +  unsigned int i;
>>> +  FOR_EACH_VEC_ELT_REVERSE (rgc->controls, i, ctrl)
>>> +    {
>>> +      /* Previous controls will cover BIAS items.  This control covers the
>>> +     next batch.  */
>>> +      poly_uint64 bias = nitems_per_ctrl * i;
>>> +      tree bias_tree = build_int_cst (iv_type, bias);
>>> +
>>> +      /* Rather than have a new IV that starts at TEST_LIMIT and goes down
>> to
>>> +     BIAS, prefer to use the same TEST_LIMIT - BIAS based IV for each
>>> +     control and adjust the bound down by BIAS.  */
>>> +      tree this_test_limit = test_limit;
>>> +      if (i != 0)
>>> +    {
>>> +      this_test_limit = gimple_build (preheader_seq, MAX_EXPR, iv_type,
>>> +                      this_test_limit, bias_tree);
>>> +      this_test_limit = gimple_build (preheader_seq, MINUS_EXPR, iv_type,
>>> +                      this_test_limit, bias_tree);
>>> +    }
>>> +
>>> +      /* Create decrement IV.  */
>>> +      create_iv (this_test_limit, MINUS_EXPR, ctrl, NULL_TREE, loop, &
>> incr_gsi,
>>> +         insert_after, &index_before_incr, &index_after_incr);
>>> +
>>> +      poly_uint64 final_vf = vf * nitems_per_iter;
>>> +      tree vf_step = build_int_cst (iv_type, final_vf);
>>> +      tree res_len;
>>> +      if (LOOP_VINFO_LENS (loop_vinfo).length () == 1)
>>> +    {
>>> +      res_len = gimple_build (header_seq, IFN_SELECT_VL, iv_type,
>>> +                  index_before_incr, vf_step);
>>> +      LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo) = true;
>>
>> The middle of this loop seems too "deep down" to be setting this.
>> I think it would make sense to do it after:
>>
>>  /* If we still have the option of using partial vectors,
>>     check whether we can generate the necessary loop controls.  */
>>  if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
>>      && !vect_verify_full_masking (loop_vinfo)
>>      && !vect_verify_loop_lens (loop_vinfo))
>>    LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
>>
>> in vect_analyze_loop_2.
>>
>> I think it'd help for review purposes to split this patch into two
>> (independently tested) pieces:
>>
>> (1) Your cases 2 and 3, where AIUI the main change is to use
>>    a decrementing loop control IV that counts scalars.  This can
>>    be done for any loop that:
>>
>>    (a) uses length "controls"; and
>>    (b) can iterate more than once
>>
>>    Initially this patch would handle case 1 in the same way.
>>
>>    Conceptually, I think it would make sense for this case to use:
>>
>>    - a signed control IV
>>    - with a constant VF step
>>    - and a loop-back test for > 0
>>
>>    in cases where we can prove that that doesn't overflow.  But I
>>    accept that using:
>>
>>    - an unsigned control IV
>>    - with a variable step
>>    - and a loop-back test for != 0
>>
>>    is more general.  So it's OK to handle just that case.  The
>>    optimisation to use signed control IVs could be left to future work.
>>
>> (2) Add SELECT_VL, where AIUI the main change (relative to (1))
>>    is to use a variable step for other IVs too.
>>
>> This is just for review purposes, and to help to separate concepts.
>> SELECT_VL is still an important part of the end result.
>>
>> Thanks,
>> Richard
>>
>>> +    }
>>> +      else
>>> +    {
>>> +      /* For SLP, we can't allow non-VF number of elements to be processed
>>> +         in non-final iteration. We force the number of elements to be
>>> +         processed in each non-final iteration is VF elements. If we allow
>>> +         non-VF elements processing in non-final iteration will make SLP
> too
>>> +         complicated and produce inferior codegen.
>>> +
>>> +           For example:
>>> +
>>> +        If non-final iteration process VF elements.
>>> +
>>> +          ...
>>> +          .LEN_STORE (vectp_f.8_51, 128B, _71, { 1, 2, 1, 2 }, 0);
>>> +          .LEN_STORE (vectp_f.8_56, 128B, _72, { 1, 2, 1, 2 }, 0);
>>> +          ...
>>> +
>>> +        If non-final iteration process non-VF elements.
>>> +
>>> +          ...
>>> +          .LEN_STORE (vectp_f.8_51, 128B, _71, { 1, 2, 1, 2 }, 0);
>>> +          if (_71 % 2 == 0)
>>> +           .LEN_STORE (vectp_f.8_56, 128B, _72, { 1, 2, 1, 2 }, 0);
>>> +          else
>>> +           .LEN_STORE (vectp_f.8_56, 128B, _72, { 2, 1, 2, 1 }, 0);
>>> +          ...
>>> +
>>> +         This is the simple case of 2-elements interleaved vector SLP. We
>>> +         consider other interleave vector, the situation will become more
>>> +         complicated.  */
>>> +      res_len = gimple_build (header_seq, MIN_EXPR, iv_type,
>>> +                  index_before_incr, vf_step);
>>> +      if (rgc->max_nscalars_per_iter != 1)
>>> +        LOOP_VINFO_USING_SLP_ADJUSTED_LEN_P (loop_vinfo) = true;
>>> +    }
>>> +      gassign *assign = gimple_build_assign (ctrl, res_len);
>>> +      gimple_seq_add_stmt (header_seq, assign);
>>> +    }
>>> +
>>> +  return index_after_incr;
>>> +}
>>> +
>>>  /* Helper for vect_set_loop_condition_partial_vectors.  Generate
> definitions
>>>     for all the rgroup controls in RGC and return a control that is nonzero
>>>     when the loop needs to iterate.  Add any new preheader statements to
>>> @@ -704,6 +1051,10 @@ vect_set_loop_condition_partial_vectors (class loop
>> *loop,
>>>  
>>>    bool use_masks_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
>>>    tree compare_type = LOOP_VINFO_RGROUP_COMPARE_TYPE (loop_vinfo);
>>> +  tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
>>> +  bool use_vl_p = !use_masks_p
>>> +          && direct_internal_fn_supported_p (IFN_SELECT_VL, iv_type,
>>> +                             OPTIMIZE_FOR_SPEED);
>>>    unsigned int compare_precision = TYPE_PRECISION (compare_type);
>>>    tree orig_niters = niters;
>>>  
>>> @@ -753,17 +1104,34 @@ vect_set_loop_condition_partial_vectors (class loop
>> *loop,
>>>            continue;
>>>        }
>>>  
>>> +    if (use_vl_p && rgc->max_nscalars_per_iter == 1
>>> +        && rgc != &LOOP_VINFO_LENS (loop_vinfo)[0])
>>> +      {
>>> +        rgroup_controls *sub_rgc
>>> +          = &(*controls)[nmasks / rgc->controls.length () - 1];
>>> +        if (!sub_rgc->controls.is_empty ())
>>> +          {
>>> +        vect_adjust_loop_lens (iv_type, &header_seq, rgc, sub_rgc);
>>> +        continue;
>>> +          }
>>> +      }
>>> +
>>>      /* See whether zero-based IV would ever generate all-false masks
>>>         or zero length before wrapping around.  */
>>>      bool might_wrap_p = vect_rgroup_iv_might_wrap_p (loop_vinfo, rgc);
>>>  
>>>      /* Set up all controls for this group.  */
>>> -    test_ctrl = vect_set_loop_controls_directly (loop, loop_vinfo,
>>> -                             &preheader_seq,
>>> -                             &header_seq,
>>> -                             loop_cond_gsi, rgc,
>>> -                             niters, niters_skip,
>>> -                             might_wrap_p);
>>> +    if (use_vl_p)
>>> +      test_ctrl
>>> +        = vect_set_loop_controls_by_select_vl (loop, loop_vinfo,
>>> +                           &preheader_seq, &header_seq,
>>> +                           rgc, niters);
>>> +    else
>>> +      test_ctrl
>>> +        = vect_set_loop_controls_directly (loop, loop_vinfo, &
> preheader_seq,
>>> +                           &header_seq, loop_cond_gsi, rgc,
>>> +                           niters, niters_skip,
>>> +                           might_wrap_p);
>>>        }
>>>  
>>>    /* Emit all accumulated statements.  */
>>> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
>>> index ed0166fedab..fe6af4286bf 100644
>>> --- a/gcc/tree-vect-loop.cc
>>> +++ b/gcc/tree-vect-loop.cc
>>> @@ -973,6 +973,8 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in,
>> vec_info_shared *shared)
>>>      vectorizable (false),
>>>      can_use_partial_vectors_p (param_vect_partial_vector_usage != 0),
>>>      using_partial_vectors_p (false),
>>> +    using_select_vl_p (false),
>>> +    using_slp_adjusted_len_p (false),
>>>      epil_using_partial_vectors_p (false),
>>>      partial_load_store_bias (0),
>>>      peeling_for_gaps (false),
>>> @@ -10361,15 +10363,18 @@ vect_record_loop_len (loop_vec_info loop_vinfo,
>> vec_loop_lens *lens,
>>>  }
>>>  
>>>  /* Given a complete set of length LENS, extract length number INDEX for an
>>> -   rgroup that operates on NVECTORS vectors, where 0 <= INDEX < NVECTORS.
>  *
>> /
>>> +   rgroup that operates on NVECTORS vectors, where 0 <= INDEX < NVECTORS.
>>> +   Insert any set-up statements before GSI.  */
>>>  
>>>  tree
>>> -vect_get_loop_len (loop_vec_info loop_vinfo, vec_loop_lens *lens,
>>> -           unsigned int nvectors, unsigned int index)
>>> +vect_get_loop_len (loop_vec_info loop_vinfo, gimple_stmt_iterator *gsi,
>>> +           vec_loop_lens *lens, unsigned int nvectors, tree vectype,
>>> +           unsigned int index)
>>>  {
>>>    rgroup_controls *rgl = &(*lens)[nvectors - 1];
>>>    bool use_bias_adjusted_len =
>>>      LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) != 0;
>>> +  tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
>>>  
>>>    /* Populate the rgroup's len array, if this is the first time we've
>>>       used it.  */
>>> @@ -10400,6 +10405,26 @@ vect_get_loop_len (loop_vec_info loop_vinfo,
>> vec_loop_lens *lens,
>>>  
>>>    if (use_bias_adjusted_len)
>>>      return rgl->bias_adjusted_ctrl;
>>> +  else if (LOOP_VINFO_USING_SLP_ADJUSTED_LEN_P (loop_vinfo))
>>> +    {
>>> +      tree loop_len = rgl->controls[index];
>>> +      poly_int64 nunits1 = TYPE_VECTOR_SUBPARTS (rgl->type);
>>> +      poly_int64 nunits2 = TYPE_VECTOR_SUBPARTS (vectype);
>>> +      if (maybe_ne (nunits1, nunits2))
>>> +    {
>>> +      /* A loop len for data type X can be reused for data type Y
>>> +         if X has N times more elements than Y and if Y's elements
>>> +         are N times bigger than X's.  */
>>> +      gcc_assert (multiple_p (nunits1, nunits2));
>>> +      unsigned int factor = exact_div (nunits1, nunits2).to_constant ();
>>> +      gimple_seq seq = NULL;
>>> +      loop_len = gimple_build (&seq, RDIV_EXPR, iv_type, loop_len,
>>> +                   build_int_cst (iv_type, factor));
>>> +      if (seq)
>>> +        gsi_insert_seq_before (gsi, seq, GSI_SAME_STMT);
>>> +    }
>>> +      return loop_len;
>>> +    }
>>>    else
>>>      return rgl->controls[index];
>>>  }
>>> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
>>> index 7313191b0db..15b22132bd6 100644
>>> --- a/gcc/tree-vect-stmts.cc
>>> +++ b/gcc/tree-vect-stmts.cc
>>> @@ -3147,6 +3147,61 @@ vect_get_data_ptr_increment (vec_info *vinfo,
>>>    return iv_step;
>>>  }
>>>  
>>> +/* Prepare the pointer IVs which needs to be updated by a variable amount.
>>> +   Such variable amount is the outcome of .SELECT_VL. In this case, we can
>>> +   allow each iteration process the flexible number of elements as long as
>>> +   the number <= vf elments.
>>> +
>>> +   Return data reference according to SELECT_VL.
>>> +   If new statements are needed, insert them before GSI.  */
>>> +
>>> +static tree
>>> +get_select_vl_data_ref_ptr (vec_info *vinfo, stmt_vec_info stmt_info,
>>> +                tree aggr_type, class loop *at_loop, tree offset,
>>> +                tree *dummy, gimple_stmt_iterator *gsi,
>>> +                bool simd_lane_access_p, vec_loop_lens *loop_lens,
>>> +                dr_vec_info *dr_info,
>>> +                vect_memory_access_type memory_access_type)
>>> +{
>>> +  loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo);
>>> +  tree step = vect_dr_behavior (vinfo, dr_info)->step;
>>> +
>>> +  /* TODO: We don't support gather/scatter or load_lanes/store_lanes for
>> pointer
>>> +     IVs are updated by variable amount but we will support them in the
>> future.
>>> +   */
>>> +  gcc_assert (memory_access_type != VMAT_GATHER_SCATTER
>>> +          && memory_access_type != VMAT_LOAD_STORE_LANES);
>>> +
>>> +  /* When we support SELECT_VL pattern, we dynamic adjust
>>> +     the memory address by .SELECT_VL result.
>>> +
>>> +     The result of .SELECT_VL is the number of elements to
>>> +     be processed of each iteration. So the memory address
>>> +     adjustment operation should be:
>>> +
>>> +     bytesize = GET_MODE_SIZE (element_mode (aggr_type));
>>> +     addr = addr + .SELECT_VL (ARG..) * bytesize;
>>> +  */
>>> +  gimple *ptr_incr;
>>> +  tree loop_len
>>> +    = vect_get_loop_len (loop_vinfo, gsi, loop_lens, 1, aggr_type, 0);
>>> +  tree len_type = TREE_TYPE (loop_len);
>>> +  poly_uint64 bytesize = GET_MODE_SIZE (element_mode (aggr_type));
>>> +  /* Since the outcome of .SELECT_VL is element size, we should adjust
>>> +     it into bytesize so that it can be used in address pointer variable
>>> +     amount IVs adjustment.  */
>>> +  tree tmp = fold_build2 (MULT_EXPR, len_type, loop_len,
>>> +              build_int_cst (len_type, bytesize));
>>> +  if (tree_int_cst_sgn (step) == -1)
>>> +    tmp = fold_build1 (NEGATE_EXPR, len_type, tmp);
>>> +  tree bump = make_temp_ssa_name (len_type, NULL, "ivtmp");
>>> +  gassign *assign = gimple_build_assign (bump, tmp);
>>> +  gsi_insert_before (gsi, assign, GSI_SAME_STMT);
>>> +  return vect_create_data_ref_ptr (vinfo, stmt_info, aggr_type, at_loop,
>> offset,
>>> +                   dummy, gsi, &ptr_incr, simd_lane_access_p,
>>> +                   bump);
>>> +}
>>> +
>>>  /* Check and perform vectorization of BUILT_IN_BSWAP{16,32,64,128}.  */
>>>  
>>>  static bool
>>> @@ -8547,6 +8602,14 @@ vectorizable_store (vec_info *vinfo,
>>>          vect_get_gather_scatter_ops (loop_vinfo, loop, stmt_info,
>>>                       slp_node, &gs_info, &dataref_ptr,
>>>                       &vec_offsets);
>>> +      else if (loop_vinfo && LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo)
>>> +           && memory_access_type != VMAT_INVARIANT)
>>> +        dataref_ptr
>>> +          = get_select_vl_data_ref_ptr (vinfo, stmt_info, aggr_type,
>>> +                        simd_lane_access_p ? loop : NULL,
>>> +                        offset, &dummy, gsi,
>>> +                        simd_lane_access_p, loop_lens,
>>> +                        dr_info, memory_access_type);
>>>        else
>>>          dataref_ptr
>>>            = vect_create_data_ref_ptr (vinfo, first_stmt_info, aggr_type,
>>> @@ -8795,8 +8858,9 @@ vectorizable_store (vec_info *vinfo,
>>>            else if (loop_lens)
>>>          {
>>>            tree final_len
>>> -            = vect_get_loop_len (loop_vinfo, loop_lens,
>>> -                     vec_num * ncopies, vec_num * j + i);
>>> +            = vect_get_loop_len (loop_vinfo, gsi, loop_lens,
>>> +                     vec_num * ncopies, vectype,
>>> +                     vec_num * j + i);
>>>            tree ptr = build_int_cst (ref_type, align * BITS_PER_UNIT);
>>>            machine_mode vmode = TYPE_MODE (vectype);
>>>            opt_machine_mode new_ovmode
>>> @@ -9935,6 +9999,13 @@ vectorizable_load (vec_info *vinfo,
>>>                         slp_node, &gs_info, &dataref_ptr,
>>>                         &vec_offsets);
>>>          }
>>> +      else if (loop_vinfo && LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo)
>>> +           && memory_access_type != VMAT_INVARIANT)
>>> +        dataref_ptr
>>> +          = get_select_vl_data_ref_ptr (vinfo, stmt_info, aggr_type,
>>> +                        at_loop, offset, &dummy, gsi,
>>> +                        simd_lane_access_p, loop_lens,
>>> +                        dr_info, memory_access_type);
>>>        else
>>>          dataref_ptr
>>>            = vect_create_data_ref_ptr (vinfo, first_stmt_info, aggr_type,
>>> @@ -10151,8 +10222,8 @@ vectorizable_load (vec_info *vinfo,
>>>              else if (loop_lens && memory_access_type != VMAT_INVARIANT)
>>>                {
>>>              tree final_len
>>> -              = vect_get_loop_len (loop_vinfo, loop_lens,
>>> -                           vec_num * ncopies,
>>> +              = vect_get_loop_len (loop_vinfo, gsi, loop_lens,
>>> +                           vec_num * ncopies, vectype,
>>>                             vec_num * j + i);
>>>              tree ptr = build_int_cst (ref_type,
>>>                            align * BITS_PER_UNIT);
>>> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
>>> index 9cf2fb23fe3..3d21e23513d 100644
>>> --- a/gcc/tree-vectorizer.h
>>> +++ b/gcc/tree-vectorizer.h
>>> @@ -818,6 +818,13 @@ public:
>>>       the vector loop can handle fewer than VF scalars.  */
>>>    bool using_partial_vectors_p;
>>>  
>>> +  /* True if we've decided to use SELECT_VL to get the number of active
>>> +     elements in a vector loop to be updated.  */
>>> +  bool using_select_vl_p;
>>> +
>>> +  /* True if use adjusted loop length for SLP.  */
>>> +  bool using_slp_adjusted_len_p;
>>> +
>>>    /* True if we've decided to use partially-populated vectors for the
>>>       epilogue of loop.  */
>>>    bool epil_using_partial_vectors_p;
>>> @@ -890,6 +897,8 @@ public:
>>>  #define LOOP_VINFO_VECTORIZABLE_P(L)       (L)->vectorizable
>>>  #define LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P(L) (L)->
>> can_use_partial_vectors_p
>>>  #define LOOP_VINFO_USING_PARTIAL_VECTORS_P(L) (L)->using_partial_vectors_p
>>> +#define LOOP_VINFO_USING_SELECT_VL_P(L) (L)->using_select_vl_p
>>> +#define LOOP_VINFO_USING_SLP_ADJUSTED_LEN_P(L) (L)->
> using_slp_adjusted_len_p
>>>  #define LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P(L)
>>                             \
>>>    (L)->epil_using_partial_vectors_p
>>>  #define LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS(L) (L)->partial_load_store_bias
>>> @@ -2293,7 +2302,8 @@ extern tree vect_get_loop_mask (gimple_stmt_iterator
> *,
>> vec_loop_masks *,
>>>                  unsigned int, tree, unsigned int);
>>>  extern void vect_record_loop_len (loop_vec_info, vec_loop_lens *, unsigned
>> int,
>>>                    tree, unsigned int);
>>> -extern tree vect_get_loop_len (loop_vec_info, vec_loop_lens *, unsigned
> int,
>>> +extern tree vect_get_loop_len (loop_vec_info, gimple_stmt_iterator *,
>>> +                   vec_loop_lens *, unsigned int, tree,
>>>                     unsigned int);
>>>  extern gimple_seq vect_gen_len (tree, tree, tree, tree);
>>>  extern stmt_vec_info info_for_reduction (vec_info *, stmt_vec_info);
>> 8
> C

      parent reply	other threads:[~2023-05-12 13:26 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-11 23:11 juzhe.zhong
2023-05-12 11:08 ` Richard Sandiford
     [not found] ` <5C0696881FC420F8+A4387224-068E-4647-B237-BC14AE06A32D@rivai.ai>
2023-05-12 11:39   ` Richard Sandiford
     [not found]   ` <A80AA82F40C90097+853F3055-CCC0-44A5-B9C8-A931F855E0FB@rivai.ai>
2023-05-12 13:25     ` Richard Sandiford [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mpt5y8xadg8.fsf@arm.com \
    --to=richard.sandiford@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=juzhe.zhong@rivai.ai \
    --cc=kito.cheng@gmail.com \
    --cc=palmer@dabbelt.com \
    --cc=richard.guenther@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).