From: Richard Biener <rguenther@suse.de>
To: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Cc: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com
Subject: Re: [PATCH V2] Middle-end: Fix bug of induction variable vectorization for RVV
Date: Fri, 10 Nov 2023 13:35:00 +0000 (UTC) [thread overview]
Message-ID: <nycvar.YFH.7.77.849.2311101334530.8772@jbgna.fhfr.qr> (raw)
In-Reply-To: <20231110122011.3626658-1-juzhe.zhong@rivai.ai>
On Fri, 10 Nov 2023, Juzhe-Zhong wrote:
> PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438
>
> 1. Since SELECT_VL result is not necessary always VF in non-final iteration.
>
> Current GIMPLE IR is wrong:
>
> # vect_vec_iv_.8_22 = PHI <_21(4), { 0, 1, 2, ... }(3)>
> ...
> _35 = .SELECT_VL (ivtmp_33, VF);
> _21 = vect_vec_iv_.8_22 + { VF, ... };
>
> E.g. Consider the total iterations N = 6, the VF = 4.
> Since SELECT_VL output is defined as not always to be VF in non-final iteration
> which needs to depend on hardware implementation.
>
> Suppose we have a RVV CPU core with vsetvl doing even distribution workload optimization.
> It may process 3 elements at the 1st iteration and 3 elements at the last iteration.
> Then the induction variable here: _21 = vect_vec_iv_.8_22 + { POLY_INT_CST [4, 4], ... };
> is wrong which is adding VF, which is 4, actually, we didn't process 4 elements.
>
> It should be adding 3 elements which is the result of SELECT_VL.
> So, here the correct IR should be:
>
> _36 = .SELECT_VL (ivtmp_34, VF);
> _22 = (int) _36;
> vect_cst__21 = [vec_duplicate_expr] _22;
>
> 2. This issue only happens on non-SLP vectorization single rgroup since:
>
> if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
> {
> tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
> if (direct_internal_fn_supported_p (IFN_SELECT_VL, iv_type,
> OPTIMIZE_FOR_SPEED)
> && LOOP_VINFO_LENS (loop_vinfo).length () == 1
> && LOOP_VINFO_LENS (loop_vinfo)[0].factor == 1 && !slp
> && (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> || !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ()))
> LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo) = true;
> }
>
> 3. This issue doesn't appears on nested loop no matter LOOP_VINFO_USING_SELECT_VL_P is true or false.
>
> Since:
>
> # vect_vec_iv_.6_5 = PHI <_19(3), { 0, ... }(5)>
> # vect_diff_15.7_20 = PHI <vect_diff_9.8_22(3), vect_diff_18.5_11(5)>
> _19 = vect_vec_iv_.6_5 + { 1, ... };
> vect_diff_9.8_22 = .COND_LEN_ADD ({ -1, ... }, vect_vec_iv_.6_5, vect_diff_15.7_20, vect_diff_15.7_20, _28, 0);
> ivtmp_1 = ivtmp_4 + 4294967295;
> ....
> <bb 5> [local count: 6549826]:
> # vect_diff_18.5_11 = PHI <vect_diff_9.8_22(4), { 0, ... }(2)>
> # ivtmp_26 = PHI <ivtmp_27(4), 40(2)>
> _28 = .SELECT_VL (ivtmp_26, POLY_INT_CST [4, 4]);
> goto <bb 3>; [100.00%]
>
> Note the induction variable IR: _21 = vect_vec_iv_.8_22 + { POLY_INT_CST [4, 4], ... }; update induction variable
> independent on VF (or don't care about how many elements are processed in the iteration).
>
> The update is loop invariant. So it won't be the problem even if LOOP_VINFO_USING_SELECT_VL_P is true.
>
> Testing passed, Ok for trunk ?
OK.
Richard.
> PR tree-optimization/112438
>
> gcc/ChangeLog:
>
> * tree-vect-loop.cc (vectorizable_induction):
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/pr112438.c: New test.
>
> ---
> .../gcc.target/riscv/rvv/autovec/pr112438.c | 33 +++++++++++++++++++
> gcc/tree-vect-loop.cc | 30 ++++++++++++++++-
> 2 files changed, 62 insertions(+), 1 deletion(-)
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112438.c
>
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112438.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112438.c
> new file mode 100644
> index 00000000000..51f90df38a0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112438.c
> @@ -0,0 +1,33 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -fno-vect-cost-model -ffast-math -fdump-tree-optimized-details" } */
> +
> +void
> +foo (int n, int *__restrict in, int *__restrict out)
> +{
> + for (int i = 0; i < n; i += 1)
> + {
> + out[i] = in[i] + i;
> + }
> +}
> +
> +void
> +foo2 (int n, float * __restrict in,
> +float * __restrict out)
> +{
> + for (int i = 0; i < n; i += 1)
> + {
> + out[i] = in[i] + i;
> + }
> +}
> +
> +void
> +foo3 (int n, float * __restrict in,
> +float * __restrict out, float x)
> +{
> + for (int i = 0; i < n; i += 1)
> + {
> + out[i] = in[i] + i* i;
> + }
> +}
> +
> +/* We don't want to see vect_vec_iv_.21_25 + { POLY_INT_CST [4, 4], ... }. */
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 8abc1937d74..b152072c969 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -10306,10 +10306,36 @@ vectorizable_induction (loop_vec_info loop_vinfo,
>
>
> /* Create the vector that holds the step of the induction. */
> + gimple_stmt_iterator *step_iv_si = NULL;
> if (nested_in_vect_loop)
> /* iv_loop is nested in the loop to be vectorized. Generate:
> vec_step = [S, S, S, S] */
> new_name = step_expr;
> + else if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
> + {
> + /* When we're using loop_len produced by SELEC_VL, the non-final
> + iterations are not always processing VF elements. So vectorize
> + induction variable instead of
> +
> + _21 = vect_vec_iv_.6_22 + { VF, ... };
> +
> + We should generate:
> +
> + _35 = .SELECT_VL (ivtmp_33, VF);
> + vect_cst__22 = [vec_duplicate_expr] _35;
> + _21 = vect_vec_iv_.6_22 + vect_cst__22; */
> + gcc_assert (!slp_node);
> + gimple_seq seq = NULL;
> + vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);
> + tree len = vect_get_loop_len (loop_vinfo, NULL, lens, 1, vectype, 0, 0);
> + expr = force_gimple_operand (fold_convert (TREE_TYPE (step_expr),
> + unshare_expr (len)),
> + &seq, true, NULL_TREE);
> + new_name = gimple_build (&seq, MULT_EXPR, TREE_TYPE (step_expr), expr,
> + step_expr);
> + gsi_insert_seq_before (&si, seq, GSI_SAME_STMT);
> + step_iv_si = &si;
> + }
> else
> {
> /* iv_loop is the loop to be vectorized. Generate:
> @@ -10336,7 +10362,7 @@ vectorizable_induction (loop_vec_info loop_vinfo,
> || TREE_CODE (new_name) == SSA_NAME);
> new_vec = build_vector_from_val (step_vectype, t);
> vec_step = vect_init_vector (loop_vinfo, stmt_info,
> - new_vec, step_vectype, NULL);
> + new_vec, step_vectype, step_iv_si);
>
>
> /* Create the following def-use cycle:
> @@ -10382,6 +10408,8 @@ vectorizable_induction (loop_vec_info loop_vinfo,
> gimple_seq seq = NULL;
> /* FORNOW. This restriction should be relaxed. */
> gcc_assert (!nested_in_vect_loop);
> + /* We expect LOOP_VINFO_USING_SELECT_VL_P to be false if ncopies > 1. */
> + gcc_assert (!LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo));
>
> /* Create the vector that holds the step of the induction. */
> if (SCALAR_FLOAT_TYPE_P (TREE_TYPE (step_expr)))
>
--
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
prev parent reply other threads:[~2023-11-10 13:35 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-10 12:20 Juzhe-Zhong
2023-11-10 13:35 ` Richard Biener [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=nycvar.YFH.7.77.849.2311101334530.8772@jbgna.fhfr.qr \
--to=rguenther@suse.de \
--cc=gcc-patches@gcc.gnu.org \
--cc=juzhe.zhong@rivai.ai \
--cc=richard.sandiford@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).