From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtpbgau1.qq.com (smtpbgau1.qq.com [54.206.16.166]) by sourceware.org (Postfix) with ESMTPS id F40583858D35 for ; Mon, 9 Oct 2023 12:07:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org F40583858D35 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai X-QQ-mid: bizesmtp87t1696853229tqrk60yc Received: from rios-cad122.hadoop.rioslab.org ( [58.60.1.26]) by bizesmtp.qq.com (ESMTP) with id ; Mon, 09 Oct 2023 20:07:08 +0800 (CST) X-QQ-SSF: 01400000000000G0V000000A0000000 X-QQ-FEAT: +ynUkgUhZJmveF6ca95QbQXRmRW2d2WNsNwgAkGExLMf0/WAQHdcWq6wsOG5K XLd6QH+ARj5Biq7fbQarsqLAt6bniCQi6TLqG7v7BhEmzSfnVQsvEb61XL/xJMag6Dtr0K1 alDrC452aPEgam0nkc6ITpFGcPVk2y6H0SRxVxKBY55A+FVf8pZsu4vhRe6J2hJD1nBz9oS eEJlnPSvq0syFr8mr3arz3IFwJLB9zWPK8f87H+kpUAoXI80rJW5PNpnzIBkMiNAWR1gDd9 M1sXuW2lCjXd0C4J9PBh423oJnHBo0Fd9HS4gaA5T2su3BbRD16yRxbwg6hMGPtnVkUA1Nd DF5hqni7JdSLxDbHkexojbE+DRkD99FX3fk6lD2o8tECCkUf9Avskt9t4q7TQLnRukML3TL WX9zcA3wK8xpZcu+CMSb0Q== X-QQ-GoodBg: 2 X-BIZMAIL-ID: 11557312792222965751 From: Juzhe-Zhong To: gcc-patches@gcc.gnu.org Cc: kito.cheng@gmail.com, kito.cheng@sifive.com, jeffreyalaw@gmail.com, rdapp.gcc@gmail.com, Juzhe-Zhong Subject: [PATCH V2] RISC-V: Support movmisalign of RVV VLA modes Date: Mon, 9 Oct 2023 20:07:07 +0800 Message-Id: <20231009120707.2746-1-juzhe.zhong@rivai.ai> X-Mailer: git-send-email 2.36.3 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0 X-Spam-Status: No, score=-8.9 required=5.0 tests=BAYES_00,GIT_PATCH_0,HEXHASH_WORD,KAM_ASCII_DIVIDERS,KAM_DMARC_STATUS,RCVD_IN_BARRACUDACENTRAL,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SCC_5_SHORT_WORD_LINES,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: This patch fixed these following FAILs in regressions: FAIL: gcc.dg/vect/slp-perm-11.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorizing stmts using SLP" 1 FAIL: gcc.dg/vect/slp-perm-11.c scan-tree-dump-times vect "vectorizing stmts using SLP" 1 FAIL: gcc.dg/vect/vect-bitfield-read-2.c -flto -ffat-lto-objects scan-tree-dump-not optimized "Invalid sum" FAIL: gcc.dg/vect/vect-bitfield-read-2.c scan-tree-dump-not optimized "Invalid sum" FAIL: gcc.dg/vect/vect-bitfield-read-4.c -flto -ffat-lto-objects scan-tree-dump-not optimized "Invalid sum" FAIL: gcc.dg/vect/vect-bitfield-read-4.c scan-tree-dump-not optimized "Invalid sum" FAIL: gcc.dg/vect/vect-bitfield-write-2.c -flto -ffat-lto-objects scan-tree-dump-not optimized "Invalid sum" FAIL: gcc.dg/vect/vect-bitfield-write-2.c scan-tree-dump-not optimized "Invalid sum" FAIL: gcc.dg/vect/vect-bitfield-write-3.c -flto -ffat-lto-objects scan-tree-dump-not optimized "Invalid sum" FAIL: gcc.dg/vect/vect-bitfield-write-3.c scan-tree-dump-not optimized "Invalid sum" Previously, I removed the movmisalign pattern to fix the execution FAILs in this commit: https://github.com/gcc-mirror/gcc/commit/f7bff24905a6959f85f866390db2fff1d6f95520 I was thinking that RVV doesn't allow misaligned at the beginning so I removed that pattern. However, after deep investigation && reading RVV ISA again and experiment on SPIKE, I realized I was wrong. RVV ISA reference: https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-memory-alignment-constraints "If an element accessed by a vector memory instruction is not naturally aligned to the size of the element, either the element is transferred successfully or an address misaligned exception is raised on that element." It's obvious that RVV ISA does allow misaligned vector load/store. And experiment and confirm on SPIKE: [jzzhong@rios-cad122:/work/home/jzzhong/work/toolchain/riscv/gcc/gcc/testsuite/gcc.dg/vect]$~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/bin/spike --isa=rv64gcv --varch=vlen:128,elen:64 ~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/riscv64-unknown-elf/bin/pk64 a.out bbl loader z 0000000000000000 ra 0000000000010158 sp 0000003ffffffb40 gp 0000000000012c48 tp 0000000000000000 t0 00000000000110da t1 000000000000000f t2 0000000000000000 s0 0000000000013460 s1 0000000000000000 a0 0000000000012ef5 a1 0000000000012018 a2 0000000000012a71 a3 000000000000000d a4 0000000000000004 a5 0000000000012a71 a6 0000000000012a71 a7 0000000000012018 s2 0000000000000000 s3 0000000000000000 s4 0000000000000000 s5 0000000000000000 s6 0000000000000000 s7 0000000000000000 s8 0000000000000000 s9 0000000000000000 sA 0000000000000000 sB 0000000000000000 t3 0000000000000000 t4 0000000000000000 t5 0000000000000000 t6 0000000000000000 pc 0000000000010258 va/inst 00000000020660a7 sr 8000000200006620 Store/AMO access fault! [jzzhong@rios-cad122:/work/home/jzzhong/work/toolchain/riscv/gcc/gcc/testsuite/gcc.dg/vect]$~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/bin/spike --misaligned --isa=rv64gcv --varch=vlen:128,elen:64 ~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/riscv64-unknown-elf/bin/pk64 a.out bbl loader We can see SPIKE can pass previous *FAILED* execution tests with specifying --misaligned to SPIKE. So, to honor RVV ISA SPEC, we should add movmisalign pattern back base on the investigations I have done since it can improve multiple vectorization tests and fix dumple FAILs. This patch adds TARGET_VECTOR_MISALIGN_SUPPORTED to decide whether we support misalign pattern for VLA modes (By default it is enabled). Consider this following case: struct s { unsigned i : 31; char a : 4; }; #define N 32 #define ELT0 {0x7FFFFFFFUL, 0} #define ELT1 {0x7FFFFFFFUL, 1} #define ELT2 {0x7FFFFFFFUL, 2} #define ELT3 {0x7FFFFFFFUL, 3} #define RES 48 struct s A[N] = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3}; int __attribute__ ((noipa)) f(struct s *ptr, unsigned n) { int res = 0; for (int i = 0; i < n; ++i) res += ptr[i].a; return res; } -O3 -S -fno-vect-cost-model (default strict-align): f: mv a4,a0 beq a1,zero,.L9 addiw a5,a1,-1 li a3,14 vsetivli zero,16,e64,m8,ta,ma bleu a5,a3,.L3 andi a5,a0,127 bne a5,zero,.L3 srliw a3,a1,4 slli a3,a3,7 li a0,15 slli a0,a0,32 add a3,a3,a4 mv a5,a4 li a2,32 vmv.v.x v16,a0 vsetvli zero,zero,e32,m4,ta,ma vmv.v.i v4,0 .L4: vsetvli zero,zero,e64,m8,ta,ma vle64.v v8,0(a5) addi a5,a5,128 vand.vv v8,v8,v16 vsetvli zero,zero,e32,m4,ta,ma vnsrl.wx v8,v8,a2 vadd.vv v4,v4,v8 bne a5,a3,.L4 li a3,0 andi a5,a1,15 vmv.s.x v1,a3 andi a3,a1,-16 vredsum.vs v1,v4,v1 vmv.x.s a0,v1 mv a2,a0 beq a5,zero,.L15 slli a5,a3,3 add a5,a4,a5 lw a0,4(a5) andi a0,a0,15 addiw a4,a3,1 addw a0,a0,a2 bgeu a4,a1,.L15 lw a2,12(a5) andi a2,a2,15 addiw a4,a3,2 addw a0,a2,a0 bgeu a4,a1,.L15 lw a2,20(a5) andi a2,a2,15 addiw a4,a3,3 addw a0,a2,a0 bgeu a4,a1,.L15 lw a2,28(a5) andi a2,a2,15 addiw a4,a3,4 addw a0,a2,a0 bgeu a4,a1,.L15 lw a2,36(a5) andi a2,a2,15 addiw a4,a3,5 addw a0,a2,a0 bgeu a4,a1,.L15 lw a2,44(a5) andi a2,a2,15 addiw a4,a3,6 addw a0,a2,a0 bgeu a4,a1,.L15 lw a2,52(a5) andi a2,a2,15 addiw a4,a3,7 addw a0,a2,a0 bgeu a4,a1,.L15 lw a4,60(a5) andi a4,a4,15 addw a4,a4,a0 addiw a2,a3,8 mv a0,a4 bgeu a2,a1,.L15 lw a0,68(a5) andi a0,a0,15 addiw a2,a3,9 addw a0,a0,a4 bgeu a2,a1,.L15 lw a2,76(a5) andi a2,a2,15 addiw a4,a3,10 addw a0,a2,a0 bgeu a4,a1,.L15 lw a2,84(a5) andi a2,a2,15 addiw a4,a3,11 addw a0,a2,a0 bgeu a4,a1,.L15 lw a2,92(a5) andi a2,a2,15 addiw a4,a3,12 addw a0,a2,a0 bgeu a4,a1,.L15 lw a2,100(a5) andi a2,a2,15 addiw a4,a3,13 addw a0,a2,a0 bgeu a4,a1,.L15 lw a4,108(a5) andi a4,a4,15 addiw a3,a3,14 addw a0,a4,a0 bgeu a3,a1,.L15 lw a5,116(a5) andi a5,a5,15 addw a0,a5,a0 ret .L9: li a0,0 .L15: ret .L3: mv a5,a4 slli a4,a1,32 srli a1,a4,29 add a1,a5,a1 li a0,0 .L7: lw a4,4(a5) andi a4,a4,15 addi a5,a5,8 addw a0,a4,a0 bne a5,a1,.L7 ret -O3 -S -mno-strict-align -fno-vect-cost-model: f: beq a1,zero,.L4 slli a1,a1,32 li a5,15 vsetvli a4,zero,e64,m1,ta,ma slli a5,a5,32 srli a1,a1,32 li a6,32 vmv.v.x v3,a5 vsetvli zero,zero,e32,mf2,ta,ma vmv.v.i v2,0 .L3: vsetvli a5,a1,e64,m1,ta,ma vle64.v v1,0(a0) vsetvli a3,zero,e64,m1,ta,ma slli a2,a5,3 vand.vv v1,v1,v3 sub a1,a1,a5 vsetvli zero,zero,e32,mf2,ta,ma add a0,a0,a2 vnsrl.wx v1,v1,a6 vsetvli zero,a5,e32,mf2,tu,ma vadd.vv v2,v2,v1 bne a1,zero,.L3 li a5,0 vsetvli a3,zero,e32,mf2,ta,ma vmv.s.x v1,a5 vredsum.vs v2,v2,v1 vmv.x.s a0,v2 ret .L4: li a0,0 ret We can see it improves this case codegen a lot. gcc/ChangeLog: * config/riscv/riscv-opts.h (TARGET_VECTOR_MISALIGN_SUPPORTED): New macro. * config/riscv/riscv.cc (riscv_support_vector_misalignment): Depend on movmisalign pattern. * config/riscv/vector.md (movmisalign): New pattern. --- gcc/config/riscv/riscv-opts.h | 3 +++ gcc/config/riscv/riscv.cc | 13 +------------ gcc/config/riscv/vector.md | 13 +++++++++++++ 3 files changed, 17 insertions(+), 12 deletions(-) diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h index 7e4b0cc6fe1..119fe06e86a 100644 --- a/gcc/config/riscv/riscv-opts.h +++ b/gcc/config/riscv/riscv-opts.h @@ -116,4 +116,7 @@ enum riscv_entity #define TARGET_VECTOR_VLS \ (TARGET_VECTOR && riscv_autovec_preference == RVV_SCALABLE) +/* TODO: Enable RVV movmisalign by default for now. */ +#define TARGET_VECTOR_MISALIGN_SUPPORTED 1 + #endif /* ! GCC_RISCV_OPTS_H */ diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 2b839241f1a..55c6b2f264d 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -9021,18 +9021,7 @@ riscv_support_vector_misalignment (machine_mode mode, int misalignment, bool is_packed ATTRIBUTE_UNUSED) { - /* Only enable misalign data movements for VLS modes. */ - if (TARGET_VECTOR_VLS && STRICT_ALIGNMENT) - { - /* Return if movmisalign pattern is not supported for this mode. */ - if (optab_handler (movmisalign_optab, mode) == CODE_FOR_nothing) - return false; - - /* Misalignment factor is unknown at compile time. */ - if (misalignment == -1) - return false; - } - /* Disable movmisalign for VLA auto-vectorization. */ + /* Depend on movmisalign pattern. */ return default_builtin_support_vector_misalignment (mode, type, misalignment, is_packed); } diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md index cf5c0a40257..aa8a0ba7865 100644 --- a/gcc/config/riscv/vector.md +++ b/gcc/config/riscv/vector.md @@ -1326,6 +1326,19 @@ } ) +;; According to RVV ISA: +;; If an element accessed by a vector memory instruction is not naturally aligned to the size of the element, +;; either the element is transferred successfully or an address misaligned exception is raised on that element. +(define_expand "movmisalign" + [(set (match_operand:V 0 "nonimmediate_operand") + (match_operand:V 1 "general_operand"))] + "TARGET_VECTOR && TARGET_VECTOR_MISALIGN_SUPPORTED" + { + emit_move_insn (operands[0], operands[1]); + DONE; + } +) + ;; ----------------------------------------------------------------- ;; ---- Duplicate Operations ;; ----------------------------------------------------------------- -- 2.36.3