From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtpbgsg1.qq.com (smtpbgsg1.qq.com [54.254.200.92]) by sourceware.org (Postfix) with ESMTPS id 432653858D28 for ; Sat, 6 May 2023 11:14:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 432653858D28 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai X-QQ-mid: bizesmtp81t1683371691tklnclcx Received: from rios-cad5.localdomain ( [58.60.1.11]) by bizesmtp.qq.com (ESMTP) with id ; Sat, 06 May 2023 19:14:50 +0800 (CST) X-QQ-SSF: 01400000000000F0Q000000A0000000 X-QQ-FEAT: CR3LFp2JE4lDbkXqJf4owRXjEttrjSQmnNIBBSzn/OGFZ0dCrI8KcdyWcIG42 P+IaKdTWznRM8udjYc05rePV7JW2bnfP0DomBNmdLmw1XT69CYzj5B8heL8EWNn75JBJaGo ujTlFRlcJN3JO6xSviOa5X6M5lqUgx+L9gbc0KNy1lUjbT7l3d79VesWy0HF/yB34P+8cU0 jbHNuFAKmGkQlp58jvyvWbGO4cK9ovf1KhPkWvTN9NR6yTjsbeXdRMcdZayehtqt/x3TBy7 WQLXw8v2vw1I+HKn9Ic8xgrvCqNwW3/3o9YgbiblD8mxHA750kfaSkVlk5waVjgFBpnYn1x FUfAphIQzbZB9noxTLySLK8gOgy0PETh+St898RH8jjCDfdbo/QqGrFnR0EMsKB8TQ1DGBp y2Y7/1ZfSjh2aV52u22tHw== X-QQ-GoodBg: 2 X-BIZMAIL-ID: 9419477481845101609 From: juzhe.zhong@rivai.ai To: gcc-patches@gcc.gnu.org Cc: kito.cheng@gmail.com, Juzhe-Zhong Subject: [PATCH] RISC-V: Optimize vsetvli of LCM INSERTED edge for user vsetvli [PR 109743] Date: Sat, 6 May 2023 19:14:49 +0800 Message-Id: <20230506111449.2128575-1-juzhe.zhong@rivai.ai> X-Mailer: git-send-email 2.36.3 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0 X-Spam-Status: No, score=-9.0 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_STATUS,KAM_SHORT,LIKELY_SPAM_BODY,RCVD_IN_BARRACUDACENTRAL,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SCC_10_SHORT_WORD_LINES,SCC_5_SHORT_WORD_LINES,SPF_HELO_PASS,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: From: Juzhe-Zhong This patch is fixing: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109743. This issue happens is because we are currently very conservative in optimization of user vsetvli. Consider this following case: bb 1: vsetvli a5,a4... (demand AVL = a4). bb 2: RVV insn use a5 (demand AVL = a5). LCM will hoist vsetvl of bb 2 into bb 1. We don't do AVL propagation for this situation since it's complicated that we should analyze the code sequence between vsetvli in bb 1 and RVV insn in bb 2. They are not necessary the consecutive blocks. This patch is doing the optimizations after LCM, we will check and eliminate the vsetvli in LCM inserted edge if such vsetvli is redundant. Such approach is much simplier and safe. code: void foo2 (int32_t *a, int32_t *b, int n) { if (n <= 0) return; int i = n; size_t vl = __riscv_vsetvl_e32m1 (i); for (; i >= 0; i--) { vint32m1_t v = __riscv_vle32_v_i32m1 (a, vl); __riscv_vse32_v_i32m1 (b, v, vl); if (i >= vl) continue; if (i == 0) return; vl = __riscv_vsetvl_e32m1 (i); } } Before this patch: foo2: .LFB2: .cfi_startproc ble a2,zero,.L1 mv a4,a2 li a3,-1 vsetvli a5,a2,e32,m1,ta,mu vsetvli zero,a5,e32,m1,ta,ma <- can be eliminated. .L5: vle32.v v1,0(a0) vse32.v v1,0(a1) bgeu a4,a5,.L3 .L10: beq a2,zero,.L1 vsetvli a5,a4,e32,m1,ta,mu addi a4,a4,-1 vsetvli zero,a5,e32,m1,ta,ma <- can be eliminated. vle32.v v1,0(a0) vse32.v v1,0(a1) addiw a2,a2,-1 bltu a4,a5,.L10 .L3: addiw a2,a2,-1 addi a4,a4,-1 bne a2,a3,.L5 .L1: ret After this patch: f: ble a2,zero,.L1 mv a4,a2 li a3,-1 vsetvli a5,a2,e32,m1,ta,ma .L5: vle32.v v1,0(a0) vse32.v v1,0(a1) bgeu a4,a5,.L3 .L10: beq a2,zero,.L1 vsetvli a5,a4,e32,m1,ta,ma addi a4,a4,-1 vle32.v v1,0(a0) vse32.v v1,0(a1) addiw a2,a2,-1 bltu a4,a5,.L10 .L3: addiw a2,a2,-1 addi a4,a4,-1 bne a2,a3,.L5 .L1: ret PR target/109743 gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (pass_vsetvl::commit_vsetvls): Add optimization for LCM inserted edge. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/vsetvl/pr109743-1.c: New test. * gcc.target/riscv/rvv/vsetvl/pr109743-2.c: New test. * gcc.target/riscv/rvv/vsetvl/pr109743-3.c: New test. * gcc.target/riscv/rvv/vsetvl/pr109743-4.c: New test. --- gcc/config/riscv/riscv-vsetvl.cc | 42 +++++++++++++++++++ .../gcc.target/riscv/rvv/vsetvl/pr109743-1.c | 26 ++++++++++++ .../gcc.target/riscv/rvv/vsetvl/pr109743-2.c | 27 ++++++++++++ .../gcc.target/riscv/rvv/vsetvl/pr109743-3.c | 28 +++++++++++++ .../gcc.target/riscv/rvv/vsetvl/pr109743-4.c | 28 +++++++++++++ 5 files changed, 151 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-4.c diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc index f55907a410e..fcee7fdf323 100644 --- a/gcc/config/riscv/riscv-vsetvl.cc +++ b/gcc/config/riscv/riscv-vsetvl.cc @@ -3834,6 +3834,48 @@ pass_vsetvl::commit_vsetvls (void) const vector_insn_info *require = m_vector_manager->vector_exprs[i]; gcc_assert (require->valid_or_dirty_p ()); + + /* Here we optimize the VSETVL is hoisted by LCM: + + Before LCM: + bb 1: + vsetvli a5,a2,e32,m1,ta,mu + bb 2: + vsetvli zero,a5,e32,m1,ta,mu + ... + + After LCM: + bb 1: + vsetvli a5,a2,e32,m1,ta,mu + LCM INSERTED: vsetvli zero,a5,e32,m1,ta,mu --> eliminate + bb 2: + ... + */ + const basic_block pred_cfg_bb = eg->src; + const auto block_info + = m_vector_manager->vector_block_infos[pred_cfg_bb->index]; + const insn_info *pred_insn = block_info.reaching_out.get_insn (); + if (pred_insn && vsetvl_insn_p (pred_insn->rtl ()) + && require->get_avl_source () + && require->get_avl_source ()->insn () + && require->skip_avl_compatible_p (block_info.reaching_out)) + { + vector_insn_info new_info = *require; + new_info.set_avl_info ( + block_info.reaching_out.get_avl_info ()); + new_info + = block_info.reaching_out.merge (new_info, LOCAL_MERGE); + change_vsetvl_insn (pred_insn, new_info); + bitmap_clear_bit (m_vector_manager->vector_insert[ed], i); + if (dump_file) + fprintf ( + dump_file, + "\nLCM INSERTED edge %d from bb %d to bb %d for VSETVL " + "expr[%ld] is removed\n", + ed, eg->src->index, eg->dest->index, i); + continue; + } + rtl_profile_for_edge (eg); start_sequence (); diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-1.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-1.c new file mode 100644 index 00000000000..f30275c8280 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-1.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv32gcv -mabi=ilp32 -fno-tree-vectorize -fno-schedule-insns -fno-schedule-insns2" } */ + +#include "riscv_vector.h" + +void f (int32_t * a, int32_t * b, int n) +{ + if (n <= 0) + return; + int i = n; + size_t vl = __riscv_vsetvl_e32m1 (i); + for (; i >= 0; i--) + { + vint32m1_t v = __riscv_vle32_v_i32m1 (a, vl); + __riscv_vse32_v_i32m1 (b, v, vl); + + if (i >= vl) + continue; + if (i == 0) + return; + vl = __riscv_vsetvl_e32m1 (i); + } +} + +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*e32,\s*m1,\s*t[au],\s*m[au]} 2 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */ +/* { dg-final { scan-assembler-times {vsetvli} 2 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-2.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-2.c new file mode 100644 index 00000000000..5f6647bb916 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-2.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv32gcv -mabi=ilp32 -fno-tree-vectorize -fno-schedule-insns -fno-schedule-insns2" } */ + +#include "riscv_vector.h" + +void f (int32_t * a, int32_t * b, int n) +{ + if (n <= 0) + return; + int i = n; + size_t vl = __riscv_vsetvl_e8mf4 (i); + for (; i >= 0; i--) + { + vint32m1_t v = __riscv_vle32_v_i32m1 (a, vl); + __riscv_vse32_v_i32m1 (b, v, vl); + + if (i >= vl) + continue; + if (i == 0) + return; + vl = __riscv_vsetvl_e32m1 (i); + } +} + + +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*e32,\s*m1,\s*t[au],\s*m[au]} 2 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */ +/* { dg-final { scan-assembler-times {vsetvli} 2 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-3.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-3.c new file mode 100644 index 00000000000..5dbc871ed12 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-3.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv32gcv -mabi=ilp32 -fno-tree-vectorize -fno-schedule-insns -fno-schedule-insns2" } */ + +#include "riscv_vector.h" + +void f (int32_t * a, int32_t * b, int n) +{ + if (n <= 0) + return; + int i = n; + size_t vl = __riscv_vsetvl_e8mf2 (i); + for (; i >= 0; i--) + { + vint32m1_t v = __riscv_vle32_v_i32m1 (a, vl); + __riscv_vse32_v_i32m1 (b, v, vl); + + if (i >= vl) + continue; + if (i == 0) + return; + vl = __riscv_vsetvl_e32m1 (i); + } +} + +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*e8,\s*mf2,\s*t[au],\s*m[au]} 1 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */ +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*e32,\s*m1,\s*t[au],\s*m[au]} 1 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */ +/* { dg-final { scan-assembler-times {vsetvli\s+zero,\s*[a-x0-9]+,\s*e32,\s*m1,\s*t[au],\s*m[au]} 1 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */ +/* { dg-final { scan-assembler-times {vsetvli} 3 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-4.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-4.c new file mode 100644 index 00000000000..edd12855f58 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-4.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv32gcv -mabi=ilp32 -fno-tree-vectorize -fno-schedule-insns -fno-schedule-insns2" } */ + +#include "riscv_vector.h" + +void +f (int32_t *a, int32_t *b, int n) +{ + if (n <= 0) + return; + int i = n; + size_t vl = __riscv_vsetvl_e8mf4 (i); + for (; i >= 0; i--) + { + vint32m1_t v = __riscv_vle32_v_i32m1 (a + i, vl); + v = __riscv_vle32_v_i32m1_tu (v, a + i + 100, vl); + __riscv_vse32_v_i32m1 (b + i, v, vl); + + if (i >= vl) + continue; + if (i == 0) + return; + vl = __riscv_vsetvl_e8mf4 (i); + } +} + +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*e32,\s*m1,\s*tu,\s*m[au]} 2 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */ +/* { dg-final { scan-assembler-times {vsetvli} 2 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */ -- 2.36.3