From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2001:67c:2178:6::1d]) by sourceware.org (Postfix) with ESMTPS id 6D78C3858D32 for ; Fri, 10 Nov 2023 13:35:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 6D78C3858D32 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 6D78C3858D32 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2001:67c:2178:6::1d ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699623303; cv=none; b=MHxW+f6sL3Z9bDmTV1ITq7Ksm56XFUybbGBBeitNrE861JdTJZ1sw4BhS6sciaO6RGKwnxOYNktKjtIlnCeWZyNGOdFKKyPQahhP+ZRsamwkvXuxjvxeZCLyMZ1BYQDCT+WcnAz4ZYQV2VaAwioMfqCuEq353NryoJgH7/O/8K8= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699623303; c=relaxed/simple; bh=ZGj/F5sOPrqgjdQvsFEd0udOkAYBCAJW6SJLnmFVRJw=; h=DKIM-Signature:DKIM-Signature:Date:From:To:Subject:Message-ID: MIME-Version; b=gJTKp1fwM4A6EXP/zwHly3mduzLycDH/IWrjdzG3uMkAYLjQyx7VJUCqf2ynWwuI6GsphkSmo/+Aj4Rxb1eW285Bj2hQkz+rzJt4PMDoXQfvm0i0PvCVp7nwvN1u7r0/+omA04HVupSeoNsE6dnS9TOZP43KNNOmgxPgmEWcjPI= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 724D31F8BE; Fri, 10 Nov 2023 13:35:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1699623300; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Z/rqsQ7BHqZaa30d2HgL6JQ/GhabJ6hHZJaOHBAZo8U=; b=UC8C8rwfha6kSvLplw/HeyooxAjex4a7pYEHa8ukKWCU72y/NZh56p1NrwHJ8KWfOD63Ky fDVf8YuSogH2Rzp9Ayan7VjJdB9kDWw8IV79JxZTi0bQ/A5tpj0xDqlpcJNxTAQuqj+nmr VJwFoHDNMByWzl5d6zJdJc2UFeG9nbI= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1699623300; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Z/rqsQ7BHqZaa30d2HgL6JQ/GhabJ6hHZJaOHBAZo8U=; b=DXQAx1yvtRXxZDAy+vslJBsumsW8+r3HFLMIie/d9YcVwgkTx7Bu25nSPfA26vor3DEV21 ZWsV2ldJLza1BkBg== Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 2B9722D196; Fri, 10 Nov 2023 13:35:00 +0000 (UTC) Date: Fri, 10 Nov 2023 13:35:00 +0000 (UTC) From: Richard Biener To: Juzhe-Zhong cc: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: Re: [PATCH V2] Middle-end: Fix bug of induction variable vectorization for RVV In-Reply-To: <20231110122011.3626658-1-juzhe.zhong@rivai.ai> Message-ID: References: <20231110122011.3626658-1-juzhe.zhong@rivai.ai> User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, 10 Nov 2023, Juzhe-Zhong wrote: > PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438 > > 1. Since SELECT_VL result is not necessary always VF in non-final iteration. > > Current GIMPLE IR is wrong: > > # vect_vec_iv_.8_22 = PHI <_21(4), { 0, 1, 2, ... }(3)> > ... > _35 = .SELECT_VL (ivtmp_33, VF); > _21 = vect_vec_iv_.8_22 + { VF, ... }; > > E.g. Consider the total iterations N = 6, the VF = 4. > Since SELECT_VL output is defined as not always to be VF in non-final iteration > which needs to depend on hardware implementation. > > Suppose we have a RVV CPU core with vsetvl doing even distribution workload optimization. > It may process 3 elements at the 1st iteration and 3 elements at the last iteration. > Then the induction variable here: _21 = vect_vec_iv_.8_22 + { POLY_INT_CST [4, 4], ... }; > is wrong which is adding VF, which is 4, actually, we didn't process 4 elements. > > It should be adding 3 elements which is the result of SELECT_VL. > So, here the correct IR should be: > > _36 = .SELECT_VL (ivtmp_34, VF); > _22 = (int) _36; > vect_cst__21 = [vec_duplicate_expr] _22; > > 2. This issue only happens on non-SLP vectorization single rgroup since: > > if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)) > { > tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo); > if (direct_internal_fn_supported_p (IFN_SELECT_VL, iv_type, > OPTIMIZE_FOR_SPEED) > && LOOP_VINFO_LENS (loop_vinfo).length () == 1 > && LOOP_VINFO_LENS (loop_vinfo)[0].factor == 1 && !slp > && (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) > || !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ())) > LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo) = true; > } > > 3. This issue doesn't appears on nested loop no matter LOOP_VINFO_USING_SELECT_VL_P is true or false. > > Since: > > # vect_vec_iv_.6_5 = PHI <_19(3), { 0, ... }(5)> > # vect_diff_15.7_20 = PHI > _19 = vect_vec_iv_.6_5 + { 1, ... }; > vect_diff_9.8_22 = .COND_LEN_ADD ({ -1, ... }, vect_vec_iv_.6_5, vect_diff_15.7_20, vect_diff_15.7_20, _28, 0); > ivtmp_1 = ivtmp_4 + 4294967295; > .... > [local count: 6549826]: > # vect_diff_18.5_11 = PHI > # ivtmp_26 = PHI > _28 = .SELECT_VL (ivtmp_26, POLY_INT_CST [4, 4]); > goto ; [100.00%] > > Note the induction variable IR: _21 = vect_vec_iv_.8_22 + { POLY_INT_CST [4, 4], ... }; update induction variable > independent on VF (or don't care about how many elements are processed in the iteration). > > The update is loop invariant. So it won't be the problem even if LOOP_VINFO_USING_SELECT_VL_P is true. > > Testing passed, Ok for trunk ? OK. Richard. > PR tree-optimization/112438 > > gcc/ChangeLog: > > * tree-vect-loop.cc (vectorizable_induction): > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rvv/autovec/pr112438.c: New test. > > --- > .../gcc.target/riscv/rvv/autovec/pr112438.c | 33 +++++++++++++++++++ > gcc/tree-vect-loop.cc | 30 ++++++++++++++++- > 2 files changed, 62 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112438.c > > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112438.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112438.c > new file mode 100644 > index 00000000000..51f90df38a0 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112438.c > @@ -0,0 +1,33 @@ > +/* { dg-do compile } */ > +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -fno-vect-cost-model -ffast-math -fdump-tree-optimized-details" } */ > + > +void > +foo (int n, int *__restrict in, int *__restrict out) > +{ > + for (int i = 0; i < n; i += 1) > + { > + out[i] = in[i] + i; > + } > +} > + > +void > +foo2 (int n, float * __restrict in, > +float * __restrict out) > +{ > + for (int i = 0; i < n; i += 1) > + { > + out[i] = in[i] + i; > + } > +} > + > +void > +foo3 (int n, float * __restrict in, > +float * __restrict out, float x) > +{ > + for (int i = 0; i < n; i += 1) > + { > + out[i] = in[i] + i* i; > + } > +} > + > +/* We don't want to see vect_vec_iv_.21_25 + { POLY_INT_CST [4, 4], ... }. */ > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > index 8abc1937d74..b152072c969 100644 > --- a/gcc/tree-vect-loop.cc > +++ b/gcc/tree-vect-loop.cc > @@ -10306,10 +10306,36 @@ vectorizable_induction (loop_vec_info loop_vinfo, > > > /* Create the vector that holds the step of the induction. */ > + gimple_stmt_iterator *step_iv_si = NULL; > if (nested_in_vect_loop) > /* iv_loop is nested in the loop to be vectorized. Generate: > vec_step = [S, S, S, S] */ > new_name = step_expr; > + else if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo)) > + { > + /* When we're using loop_len produced by SELEC_VL, the non-final > + iterations are not always processing VF elements. So vectorize > + induction variable instead of > + > + _21 = vect_vec_iv_.6_22 + { VF, ... }; > + > + We should generate: > + > + _35 = .SELECT_VL (ivtmp_33, VF); > + vect_cst__22 = [vec_duplicate_expr] _35; > + _21 = vect_vec_iv_.6_22 + vect_cst__22; */ > + gcc_assert (!slp_node); > + gimple_seq seq = NULL; > + vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo); > + tree len = vect_get_loop_len (loop_vinfo, NULL, lens, 1, vectype, 0, 0); > + expr = force_gimple_operand (fold_convert (TREE_TYPE (step_expr), > + unshare_expr (len)), > + &seq, true, NULL_TREE); > + new_name = gimple_build (&seq, MULT_EXPR, TREE_TYPE (step_expr), expr, > + step_expr); > + gsi_insert_seq_before (&si, seq, GSI_SAME_STMT); > + step_iv_si = &si; > + } > else > { > /* iv_loop is the loop to be vectorized. Generate: > @@ -10336,7 +10362,7 @@ vectorizable_induction (loop_vec_info loop_vinfo, > || TREE_CODE (new_name) == SSA_NAME); > new_vec = build_vector_from_val (step_vectype, t); > vec_step = vect_init_vector (loop_vinfo, stmt_info, > - new_vec, step_vectype, NULL); > + new_vec, step_vectype, step_iv_si); > > > /* Create the following def-use cycle: > @@ -10382,6 +10408,8 @@ vectorizable_induction (loop_vec_info loop_vinfo, > gimple_seq seq = NULL; > /* FORNOW. This restriction should be relaxed. */ > gcc_assert (!nested_in_vect_loop); > + /* We expect LOOP_VINFO_USING_SELECT_VL_P to be false if ncopies > 1. */ > + gcc_assert (!LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo)); > > /* Create the vector that holds the step of the induction. */ > if (SCALAR_FLOAT_TYPE_P (TREE_TYPE (step_expr))) > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)