From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by sourceware.org (Postfix) with ESMTPS id DDD473858D32 for ; Fri, 10 Nov 2023 10:19:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org DDD473858D32 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org DDD473858D32 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2001:67c:2178:6::1c ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699611598; cv=none; b=o8DHyirIiWPd5JwI0R6XAcJoJylcCHWouCJKrAPgLLhNIwIvB6pwyaIk86oJEG0OYLHWDWvXriaam809omNB8718i1zVJVh4dzh4cWhHFPQtVL3cJx+Z+2A7vyDst5wdPob31rroAre1QEg5ff5dVH7/TbyNJ1/gD+cz2kBufKY= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699611598; c=relaxed/simple; bh=xj2vWbh4SncWZmSsWKa81tfrLYmD/I+Wl9KhCdaZhGI=; h=DKIM-Signature:DKIM-Signature:Date:From:To:Subject:Message-ID: MIME-Version; b=OS9VKFepSRJ7XGJCWfQ2P1GyDagITUqoaKld52ig7Y54G+L88Zf0Mhs5puraixCH7srrjQOcHExxxlsy2RZLtjo+5ifXAgaBxOYic6Slw7sSlaQqMfm3y6z5uUxnGCFVAd9e5UtNGaAEz2jV9AsaFKJF6dVNPumLl+3IEv4hSnU= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id B2AE92199D; Fri, 10 Nov 2023 10:19:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1699611594; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=kV7EfrNMB2PNw+uwjcZ8LgCDwOOiOojLx00ZDgOzol0=; b=pTmh/vvDzKhhanpn/KPrQ7ekRa9qK9Hvwv9qkxNyLfEG+4pLj0Xgsqm68r7jBk0PKuV1eq n/wA67/RqmdMyK2YeepTWujomRwTH6C1RPWhoLhGuEUKe1vlOtUR9VQq1Bd/ScoyzCW/b+ VbDVzwRETmymGOy134b5FvB2JG73mTo= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1699611594; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=kV7EfrNMB2PNw+uwjcZ8LgCDwOOiOojLx00ZDgOzol0=; b=MonY0j/mDaG0DJVQL4I9f5rfqxgOkf2cjq+zXhj6PkvGbM7iNMbghYkpHpPVTWF9YOGfuo rlbYLLK6GArfEZBw== Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 7878F2C24F; Fri, 10 Nov 2023 10:19:54 +0000 (UTC) Date: Fri, 10 Nov 2023 10:19:54 +0000 (UTC) From: Richard Biener To: "juzhe.zhong@rivai.ai" cc: Richard Biener , gcc-patches , "richard.sandiford" , "kito.cheng" , "Kito.cheng" Subject: Re: Re: [PATCH] Middle-end: Fix bug of induction variable vectorization for RVV In-Reply-To: <13B9AC74355D1006+2023111018134955379425@rivai.ai> Message-ID: References: <20231108105317.1786716-1-juzhe.zhong@rivai.ai>, , , <13B9AC74355D1006+2023111018134955379425@rivai.ai> User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, 10 Nov 2023, juzhe.zhong@rivai.ai wrote: > Hi, Richard. > > I am sorry for bothering you. I am trying to understand what you mean. > > Is this following codes that you want ? > > /* Create the vector that holds the step of the induction. */ > if (nested_in_vect_loop) > { > /* iv_loop is nested in the loop to be vectorized. Generate: > vec_step = [S, S, S, S] */ > new_name = step_expr; > /* We expect LOOP_VINFO_USING_SELECT_VL_P to be false in nested loop. */ > gcc_assert (!LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo)); > t = unshare_expr (new_name); > gcc_assert (CONSTANT_CLASS_P (new_name) > || TREE_CODE (new_name) == SSA_NAME); > new_vec = build_vector_from_val (step_vectype, t); > vec_step > = vect_init_vector (loop_vinfo, stmt_info, new_vec, step_vectype, NULL); > } > else if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo)) > { > /* When we're using loop_len produced by SELEC_VL, the non-final > iterations are not always processing VF elements. So vectorize > induction variable instead of > > _21 = vect_vec_iv_.6_22 + { VF, ... }; > > We should generate: > > _35 = .SELECT_VL (ivtmp_33, VF); > vect_cst__22 = [vec_duplicate_expr] _35; > _21 = vect_vec_iv_.6_22 + vect_cst__22; */ > vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo); > tree len = vect_get_loop_len (loop_vinfo, NULL, lens, 1, vectype, 0, 0); > expr = force_gimple_operand (fold_convert (TREE_TYPE (step_expr), > unshare_expr (len)), > &seq, true, NULL_TREE); > gsi_insert_seq_before (&si, seq, GSI_SAME_STMT); > t = unshare_expr (new_name); > gcc_assert (CONSTANT_CLASS_P (new_name) > || TREE_CODE (new_name) == SSA_NAME); > new_vec = build_vector_from_val (step_vectype, t); > vec_step > = vect_init_vector (loop_vinfo, stmt_info, new_vec, step_vectype, &si); > } > else > { > /* iv_loop is the loop to be vectorized. Generate: > vec_step = [VF*S, VF*S, VF*S, VF*S] */ > gimple_seq seq = NULL; > if (SCALAR_FLOAT_TYPE_P (TREE_TYPE (step_expr))) > { > expr = build_int_cst (integer_type_node, vf); > expr = gimple_build (&seq, FLOAT_EXPR, TREE_TYPE (step_expr), expr); > } > else > expr = build_int_cst (TREE_TYPE (step_expr), vf); > new_name = gimple_build (&seq, MULT_EXPR, TREE_TYPE (step_expr), > expr, step_expr); > if (seq) > { > new_bb = gsi_insert_seq_on_edge_immediate (pe, seq); > gcc_assert (!new_bb); > } > t = unshare_expr (new_name); > gcc_assert (CONSTANT_CLASS_P (new_name) > || TREE_CODE (new_name) == SSA_NAME); > new_vec = build_vector_from_val (step_vectype, t); > vec_step > = vect_init_vector (loop_vinfo, stmt_info, new_vec, step_vectype, NULL); > } > > It seems that this following codes: > > t = unshare_expr (new_name); > gcc_assert (CONSTANT_CLASS_P (new_name) > || TREE_CODE (new_name) == SSA_NAME); > new_vec = build_vector_from_val (step_vectype, t); > vec_step > = vect_init_vector > > appears 3 times. I am not sure whether it is the way you want? I'd avoid that particular bit by having gimple_stmt_iterator *si = NULL; before the if () and set that accordingly only in the LOOP_VINFO_USING_SELECT_VL_P path. But otherwise yes. Richard. > > Thanks. > > > > juzhe.zhong@rivai.ai > > From: Richard Biener > Date: 2023-11-10 17:46 > To: ??? > CC: richard.guenther; gcc-patches; richard.sandiford; kito.cheng; kito.cheng > Subject: Re: Re: [PATCH] Middle-end: Fix bug of induction variable vectorization for RVV > On Thu, 9 Nov 2023, ??? wrote: > > > Hi, Richard. > > > > >> I think it would be better to split out building a tree from VF from both > > >> arms and avoid using 'vf' when LOOP_VINFO_USING_SELECT_VL_P. > > > > I am trying to split out building tree from both arms as you suggested.. > > Could you take a look the following codes ? > > > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > > index 8abc1937d74..24a86187d11 100644 > > --- a/gcc/tree-vect-loop.cc > > +++ b/gcc/tree-vect-loop.cc > > @@ -10315,19 +10315,47 @@ vectorizable_induction (loop_vec_info loop_vinfo, > > /* iv_loop is the loop to be vectorized. Generate: > > vec_step = [VF*S, VF*S, VF*S, VF*S] */ > > gimple_seq seq = NULL; > > - if (SCALAR_FLOAT_TYPE_P (TREE_TYPE (step_expr))) > > + if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo)) > > { > > - expr = build_int_cst (integer_type_node, vf); > > - expr = gimple_build (&seq, FLOAT_EXPR, TREE_TYPE (step_expr), expr); > > + /* When we're using loop_len produced by SELEC_VL, the non-final > > + iterations are not always processing VF elements. So vectorize > > + induction variable instead of > > + > > + _21 = vect_vec_iv_.6_22 + { VF, ... }; > > + > > + We should generate: > > + > > + _35 = .SELECT_VL (ivtmp_33, VF); > > + vect_cst__22 = [vec_duplicate_expr] _35; > > + _21 = vect_vec_iv_.6_22 + vect_cst__22; */ > > + vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo); > > + tree len > > + = vect_get_loop_len (loop_vinfo, NULL, lens, 1, vectype, 0, 0); > > + expr = force_gimple_operand (fold_convert (TREE_TYPE (step_expr), > > + unshare_expr (len)), > > + &seq, true, NULL_TREE); > > } > > else > > - expr = build_int_cst (TREE_TYPE (step_expr), vf); > > + { > > + bool float_p = SCALAR_FLOAT_TYPE_P (TREE_TYPE (step_expr)); > > + expr = build_int_cst (float_p ? integer_type_node > > + : TREE_TYPE (step_expr), > > + vf); > > + if (float_p) > > + expr = gimple_build (&seq, FLOAT_EXPR, TREE_TYPE (step_expr), expr); > > + } > > + > > I meant you keep the existing flow in the function, specifically > I think you should handle SCALAR_FLOAT_TYPE_P like it was previously > handled, just build 'vf' in the dynamic way. > > > new_name = gimple_build (&seq, MULT_EXPR, TREE_TYPE (step_expr), > > expr, step_expr); > > if (seq) > > { > > - new_bb = gsi_insert_seq_on_edge_immediate (pe, seq); > > - gcc_assert (!new_bb); > > + if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo)) > > + gsi_insert_seq_before (&si, seq, GSI_SAME_STMT); > > + else > > + { > > + new_bb = gsi_insert_seq_on_edge_immediate (pe, seq); > > + gcc_assert (!new_bb); > > + } > > } > > } > > > > @@ -10335,9 +10363,9 @@ vectorizable_induction (loop_vec_info loop_vinfo, > > gcc_assert (CONSTANT_CLASS_P (new_name) > > || TREE_CODE (new_name) == SSA_NAME); > > new_vec = build_vector_from_val (step_vectype, t); > > - vec_step = vect_init_vector (loop_vinfo, stmt_info, > > - new_vec, step_vectype, NULL); > > - > > + vec_step > > + = vect_init_vector (loop_vinfo, stmt_info, new_vec, step_vectype, > > + LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo) ? &si : NULL); > > again this makes the flow hard to follow. I suppose refactoring this > overall to > > if (nested_in_vect_loop) > ... > else if (LOOP_VINFO_USING_SELECT_VL_P (..)) > ... > else > ... > > and duplicate this tail into the cases makes it easier to follow. > > For nested_in_vect_loop we never have LOOP_VINFO_USING_SELECT_VL_P? > > Richard. > > > > Thanks. > > > > > > juzhe.zhong@rivai.ai > > > > From: Richard Biener > > Date: 2023-11-09 20:16 > > To: Juzhe-Zhong > > CC: gcc-patches; richard.sandiford; rguenther; kito.cheng; kito.cheng > > Subject: Re: [PATCH] Middle-end: Fix bug of induction variable vectorization for RVV > > On Wed, Nov 8, 2023 at 11:53?AM Juzhe-Zhong wrote: > > > > > > PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438 > > > > > > SELECT_VL result is not necessary always VF in non-final iteration. > > > > > > Current GIMPLE IR is wrong: > > > > > > # vect_vec_iv_.21_25 = PHI <_24(4), { 0, 1, 2, ... }(3)> > > > ... > > > _24 = vect_vec_iv_.21_25 + { POLY_INT_CST [4, 4], ... }; > > > > > > After this patch which is correct for SELECT_VL: > > > > > > # vect_vec_iv_.8_22 = PHI <_21(4), { 0, 1, 2, ... }(3)> > > > ... > > > _35 = .SELECT_VL (ivtmp_33, POLY_INT_CST [4, 4]); > > > _21 = vect_vec_iv_.8_22 + { POLY_INT_CST [4, 4], ... }; > > > > > > kito, could you give more explanation ? > > > > > > PR middle/112438 > > > > > > gcc/ChangeLog: > > > > > > * tree-vect-loop.cc (vectorizable_induction): Fix bug. > > > > > > gcc/testsuite/ChangeLog: > > > > > > * gcc.target/riscv/rvv/autovec/pr112438.c: New test. > > > > > > --- > > > .../gcc.target/riscv/rvv/autovec/pr112438.c | 35 +++++++++++++++++ > > > gcc/tree-vect-loop.cc | 39 +++++++++++++++---- > > > 2 files changed, 67 insertions(+), 7 deletions(-) > > > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112438.c > > > > > > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112438.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112438.c > > > new file mode 100644 > > > index 00000000000..b326d56a52c > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112438.c > > > @@ -0,0 +1,35 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -fno-vect-cost-model -ffast-math -fdump-tree-optimized-details" } */ > > > + > > > +void > > > +foo (int n, int *__restrict in, int *__restrict out) > > > +{ > > > + for (int i = 0; i < n; i += 1) > > > + { > > > + out[i] = in[i] + i; > > > + } > > > +} > > > + > > > +void > > > +foo2 (int n, float * __restrict in, > > > +float * __restrict out) > > > +{ > > > + for (int i = 0; i < n; i += 1) > > > + { > > > + out[i] = in[i] + i; > > > + } > > > +} > > > + > > > +void > > > +foo3 (int n, float * __restrict in, > > > +float * __restrict out, float x) > > > +{ > > > + for (int i = 0; i < n; i += 1) > > > + { > > > + out[i] = in[i] + i* i; > > > + } > > > +} > > > + > > > +/* We don't want to see vect_vec_iv_.21_25 + { POLY_INT_CST [4, 4], ... }. */ > > > +/* { dg-final { scan-tree-dump-not "\\+ \{ POLY_INT_CST" "optimized" } } */ > > > + > > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > > > index a544bc9b059..3e103946168 100644 > > > --- a/gcc/tree-vect-loop.cc > > > +++ b/gcc/tree-vect-loop.cc > > > @@ -10309,10 +10309,30 @@ vectorizable_induction (loop_vec_info loop_vinfo, > > > new_name = step_expr; > > > else > > > { > > > + gimple_seq seq = NULL; > > > + if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo)) > > > + { > > > + /* When we're using loop_len produced by SELEC_VL, the non-final > > > + iterations are not always processing VF elements. So vectorize > > > + induction variable instead of > > > + > > > + _21 = vect_vec_iv_.6_22 + { VF, ... }; > > > + > > > + We should generate: > > > + > > > + _35 = .SELECT_VL (ivtmp_33, VF); > > > + vect_cst__22 = [vec_duplicate_expr] _35; > > > + _21 = vect_vec_iv_.6_22 + vect_cst__22; */ > > > + vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo); > > > + tree len > > > + = vect_get_loop_len (loop_vinfo, NULL, lens, 1, vectype, 0, 0); > > > + expr = force_gimple_operand (fold_convert (TREE_TYPE (step_expr), > > > + unshare_expr (len)), > > > + &seq, true, NULL_TREE); > > > + } > > > > I think it would be better to split out building a tree from VF from both > > arms and avoid using 'vf' when LOOP_VINFO_USING_SELECT_VL_P. > > > > Btw, you are not patching the SLP path here which I believe has the same > > problem but is currently exempt from non-constant VF at least. > > > > Richard. > > > > > /* iv_loop is the loop to be vectorized. Generate: > > > vec_step = [VF*S, VF*S, VF*S, VF*S] */ > > > - gimple_seq seq = NULL; > > > - if (SCALAR_FLOAT_TYPE_P (TREE_TYPE (step_expr))) > > > + else if (SCALAR_FLOAT_TYPE_P (TREE_TYPE (step_expr))) > > > { > > > expr = build_int_cst (integer_type_node, vf); > > > expr = gimple_build (&seq, FLOAT_EXPR, TREE_TYPE (step_expr), expr); > > > @@ -10323,8 +10343,13 @@ vectorizable_induction (loop_vec_info loop_vinfo, > > > expr, step_expr); > > > if (seq) > > > { > > > - new_bb = gsi_insert_seq_on_edge_immediate (pe, seq); > > > - gcc_assert (!new_bb); > > > + if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo)) > > > + gsi_insert_seq_before (&si, seq, GSI_SAME_STMT); > > > + else > > > + { > > > + new_bb = gsi_insert_seq_on_edge_immediate (pe, seq); > > > + gcc_assert (!new_bb); > > > + } > > > } > > > } > > > > > > @@ -10332,9 +10357,9 @@ vectorizable_induction (loop_vec_info loop_vinfo, > > > gcc_assert (CONSTANT_CLASS_P (new_name) > > > || TREE_CODE (new_name) == SSA_NAME); > > > new_vec = build_vector_from_val (step_vectype, t); > > > - vec_step = vect_init_vector (loop_vinfo, stmt_info, > > > - new_vec, step_vectype, NULL); > > > - > > > + vec_step > > > + = vect_init_vector (loop_vinfo, stmt_info, new_vec, step_vectype, > > > + LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo) ? &si : NULL); > > > > > > /* Create the following def-use cycle: > > > loop prolog: > > > -- > > > 2.36.3 > > > > > > > > > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)