[gcc(refs/vendors/riscv/heads/gcc-13-with-riscv-opts)] RISC-V: Support movmisalign of RVV VLA modes

public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed

* [gcc(refs/vendors/riscv/heads/gcc-13-with-riscv-opts)] RISC-V: Support movmisalign of RVV VLA modes
@ 2023-10-12 21:58 Jeff Law
  0 siblings, 0 replies; only message in thread
From: Jeff Law @ 2023-10-12 21:58 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:20c88fcbe10f9fdfc1487ebdad9744196c4569ce

commit 20c88fcbe10f9fdfc1487ebdad9744196c4569ce
Author: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Date:   Mon Oct 9 20:07:07 2023 +0800

    RISC-V: Support movmisalign of RVV VLA modes
    
    This patch fixed these following FAILs in regressions:
    FAIL: gcc.dg/vect/slp-perm-11.c -flto -ffat-lto-objects  scan-tree-dump-times vect "vectorizing stmts using SLP" 1
    FAIL: gcc.dg/vect/slp-perm-11.c scan-tree-dump-times vect "vectorizing stmts using SLP" 1
    FAIL: gcc.dg/vect/vect-bitfield-read-2.c -flto -ffat-lto-objects  scan-tree-dump-not optimized "Invalid sum"
    FAIL: gcc.dg/vect/vect-bitfield-read-2.c scan-tree-dump-not optimized "Invalid sum"
    FAIL: gcc.dg/vect/vect-bitfield-read-4.c -flto -ffat-lto-objects  scan-tree-dump-not optimized "Invalid sum"
    FAIL: gcc.dg/vect/vect-bitfield-read-4.c scan-tree-dump-not optimized "Invalid sum"
    FAIL: gcc.dg/vect/vect-bitfield-write-2.c -flto -ffat-lto-objects  scan-tree-dump-not optimized "Invalid sum"
    FAIL: gcc.dg/vect/vect-bitfield-write-2.c scan-tree-dump-not optimized "Invalid sum"
    FAIL: gcc.dg/vect/vect-bitfield-write-3.c -flto -ffat-lto-objects  scan-tree-dump-not optimized "Invalid sum"
    FAIL: gcc.dg/vect/vect-bitfield-write-3.c scan-tree-dump-not optimized "Invalid sum"
    
    Previously, I removed the movmisalign pattern to fix the execution FAILs in this commit:
    https://github.com/gcc-mirror/gcc/commit/f7bff24905a6959f85f866390db2fff1d6f95520
    
    I was thinking that RVV doesn't allow misaligned at the beginning so I removed that pattern.
    However, after deep investigation && reading RVV ISA again and experiment on SPIKE,
    I realized I was wrong.
    
    RVV ISA reference: https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-memory-alignment-constraints
    
    "If an element accessed by a vector memory instruction is not naturally aligned to the size of the element,
     either the element is transferred successfully or an address misaligned exception is raised on that element."
    
    It's obvious that RVV ISA does allow misaligned vector load/store.
    
    And experiment and confirm on SPIKE:
    
    [jzzhong@rios-cad122:/work/home/jzzhong/work/toolchain/riscv/gcc/gcc/testsuite/gcc.dg/vect]$~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/bin/spike --isa=rv64gcv --varch=vlen:128,elen:64 ~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/riscv64-unknown-elf/bin/pk64  a.out
    bbl loader
    z  0000000000000000 ra 0000000000010158 sp 0000003ffffffb40 gp 0000000000012c48
    tp 0000000000000000 t0 00000000000110da t1 000000000000000f t2 0000000000000000
    s0 0000000000013460 s1 0000000000000000 a0 0000000000012ef5 a1 0000000000012018
    a2 0000000000012a71 a3 000000000000000d a4 0000000000000004 a5 0000000000012a71
    a6 0000000000012a71 a7 0000000000012018 s2 0000000000000000 s3 0000000000000000
    s4 0000000000000000 s5 0000000000000000 s6 0000000000000000 s7 0000000000000000
    s8 0000000000000000 s9 0000000000000000 sA 0000000000000000 sB 0000000000000000
    t3 0000000000000000 t4 0000000000000000 t5 0000000000000000 t6 0000000000000000
    pc 0000000000010258 va/inst 00000000020660a7 sr 8000000200006620
    Store/AMO access fault!
    
    [jzzhong@rios-cad122:/work/home/jzzhong/work/toolchain/riscv/gcc/gcc/testsuite/gcc.dg/vect]$~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/bin/spike --misaligned --isa=rv64gcv --varch=vlen:128,elen:64 ~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/riscv64-unknown-elf/bin/pk64  a.out
    bbl loader
    
    We can see SPIKE can pass previous *FAILED* execution tests with specifying --misaligned to SPIKE.
    
    So, to honor RVV ISA SPEC, we should add movmisalign pattern back base on the investigations I have done since
    it can improve multiple vectorization tests and fix dumple FAILs.
    
    This patch adds TARGET_VECTOR_MISALIGN_SUPPORTED to decide whether we support misalign pattern for VLA modes (By default it is enabled).
    
    Consider this following case:
    
    struct s {
        unsigned i : 31;
        char a : 4;
    };
    
    #define N 32
    #define ELT0 {0x7FFFFFFFUL, 0}
    #define ELT1 {0x7FFFFFFFUL, 1}
    #define ELT2 {0x7FFFFFFFUL, 2}
    #define ELT3 {0x7FFFFFFFUL, 3}
    #define RES 48
    struct s A[N]
      = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
          ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
          ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
          ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
    
    int __attribute__ ((noipa))
    f(struct s *ptr, unsigned n) {
        int res = 0;
        for (int i = 0; i < n; ++i)
          res += ptr[i].a;
        return res;
    }
    
    -O3 -S -fno-vect-cost-model (default strict-align):
    
    f:
            mv      a4,a0
            beq     a1,zero,.L9
            addiw   a5,a1,-1
            li      a3,14
            vsetivli        zero,16,e64,m8,ta,ma
            bleu    a5,a3,.L3
            andi    a5,a0,127
            bne     a5,zero,.L3
            srliw   a3,a1,4
            slli    a3,a3,7
            li      a0,15
            slli    a0,a0,32
            add     a3,a3,a4
            mv      a5,a4
            li      a2,32
            vmv.v.x v16,a0
            vsetvli zero,zero,e32,m4,ta,ma
            vmv.v.i v4,0
    .L4:
            vsetvli zero,zero,e64,m8,ta,ma
            vle64.v v8,0(a5)
            addi    a5,a5,128
            vand.vv v8,v8,v16
            vsetvli zero,zero,e32,m4,ta,ma
            vnsrl.wx        v8,v8,a2
            vadd.vv v4,v4,v8
            bne     a5,a3,.L4
            li      a3,0
            andi    a5,a1,15
            vmv.s.x v1,a3
            andi    a3,a1,-16
            vredsum.vs      v1,v4,v1
            vmv.x.s a0,v1
            mv      a2,a0
            beq     a5,zero,.L15
            slli    a5,a3,3
            add     a5,a4,a5
            lw      a0,4(a5)
            andi    a0,a0,15
            addiw   a4,a3,1
            addw    a0,a0,a2
            bgeu    a4,a1,.L15
            lw      a2,12(a5)
            andi    a2,a2,15
            addiw   a4,a3,2
            addw    a0,a2,a0
            bgeu    a4,a1,.L15
            lw      a2,20(a5)
            andi    a2,a2,15
            addiw   a4,a3,3
            addw    a0,a2,a0
            bgeu    a4,a1,.L15
            lw      a2,28(a5)
            andi    a2,a2,15
            addiw   a4,a3,4
            addw    a0,a2,a0
            bgeu    a4,a1,.L15
            lw      a2,36(a5)
            andi    a2,a2,15
            addiw   a4,a3,5
            addw    a0,a2,a0
            bgeu    a4,a1,.L15
            lw      a2,44(a5)
            andi    a2,a2,15
            addiw   a4,a3,6
            addw    a0,a2,a0
            bgeu    a4,a1,.L15
            lw      a2,52(a5)
            andi    a2,a2,15
            addiw   a4,a3,7
            addw    a0,a2,a0
            bgeu    a4,a1,.L15
            lw      a4,60(a5)
            andi    a4,a4,15
            addw    a4,a4,a0
            addiw   a2,a3,8
            mv      a0,a4
            bgeu    a2,a1,.L15
            lw      a0,68(a5)
            andi    a0,a0,15
            addiw   a2,a3,9
            addw    a0,a0,a4
            bgeu    a2,a1,.L15
            lw      a2,76(a5)
            andi    a2,a2,15
            addiw   a4,a3,10
            addw    a0,a2,a0
            bgeu    a4,a1,.L15
            lw      a2,84(a5)
            andi    a2,a2,15
            addiw   a4,a3,11
            addw    a0,a2,a0
            bgeu    a4,a1,.L15
            lw      a2,92(a5)
            andi    a2,a2,15
            addiw   a4,a3,12
            addw    a0,a2,a0
            bgeu    a4,a1,.L15
            lw      a2,100(a5)
            andi    a2,a2,15
            addiw   a4,a3,13
            addw    a0,a2,a0
            bgeu    a4,a1,.L15
            lw      a4,108(a5)
            andi    a4,a4,15
            addiw   a3,a3,14
            addw    a0,a4,a0
            bgeu    a3,a1,.L15
            lw      a5,116(a5)
            andi    a5,a5,15
            addw    a0,a5,a0
            ret
    .L9:
            li      a0,0
    .L15:
            ret
    .L3:
            mv      a5,a4
            slli    a4,a1,32
            srli    a1,a4,29
            add     a1,a5,a1
            li      a0,0
    .L7:
            lw      a4,4(a5)
            andi    a4,a4,15
            addi    a5,a5,8
            addw    a0,a4,a0
            bne     a5,a1,.L7
            ret
    
    -O3 -S -mno-strict-align -fno-vect-cost-model:
    
    f:
            beq     a1,zero,.L4
            slli    a1,a1,32
            li      a5,15
            vsetvli a4,zero,e64,m1,ta,ma
            slli    a5,a5,32
            srli    a1,a1,32
            li      a6,32
            vmv.v.x v3,a5
            vsetvli zero,zero,e32,mf2,ta,ma
            vmv.v.i v2,0
    .L3:
            vsetvli a5,a1,e64,m1,ta,ma
            vle64.v v1,0(a0)
            vsetvli a3,zero,e64,m1,ta,ma
            slli    a2,a5,3
            vand.vv v1,v1,v3
            sub     a1,a1,a5
            vsetvli zero,zero,e32,mf2,ta,ma
            add     a0,a0,a2
            vnsrl.wx        v1,v1,a6
            vsetvli zero,a5,e32,mf2,tu,ma
            vadd.vv v2,v2,v1
            bne     a1,zero,.L3
            li      a5,0
            vsetvli a3,zero,e32,mf2,ta,ma
            vmv.s.x v1,a5
            vredsum.vs      v2,v2,v1
            vmv.x.s a0,v2
            ret
    .L4:
            li      a0,0
            ret
    
    We can see it improves this case codegen a lot.
    
    gcc/ChangeLog:
    
            * config/riscv/riscv-opts.h (TARGET_VECTOR_MISALIGN_SUPPORTED): New macro.
            * config/riscv/riscv.cc (riscv_support_vector_misalignment): Depend on movmisalign pattern.
            * config/riscv/vector.md (movmisalign<mode>): New pattern.
    
    (cherry picked from commit dee55cf59ceea989f47e7605205c6644b27a1f78)

Diff:
---
 gcc/config/riscv/riscv-opts.h |  3 +++
 gcc/config/riscv/riscv.cc     | 13 +------------
 gcc/config/riscv/vector.md    | 13 +++++++++++++
 3 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 7e4b0cc6fe1..119fe06e86a 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -116,4 +116,7 @@ enum riscv_entity
 #define TARGET_VECTOR_VLS                                                      \
   (TARGET_VECTOR && riscv_autovec_preference == RVV_SCALABLE)
 
+/* TODO: Enable RVV movmisalign by default for now.  */
+#define TARGET_VECTOR_MISALIGN_SUPPORTED 1
+
 #endif /* ! GCC_RISCV_OPTS_H */
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 2b839241f1a..55c6b2f264d 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -9021,18 +9021,7 @@ riscv_support_vector_misalignment (machine_mode mode,
 				   int misalignment,
 				   bool is_packed ATTRIBUTE_UNUSED)
 {
-  /* Only enable misalign data movements for VLS modes.  */
-  if (TARGET_VECTOR_VLS && STRICT_ALIGNMENT)
-    {
-      /* Return if movmisalign pattern is not supported for this mode.  */
-      if (optab_handler (movmisalign_optab, mode) == CODE_FOR_nothing)
-	return false;
-
-      /* Misalignment factor is unknown at compile time.  */
-      if (misalignment == -1)
-	return false;
-    }
-  /* Disable movmisalign for VLA auto-vectorization.  */
+  /* Depend on movmisalign pattern.  */
   return default_builtin_support_vector_misalignment (mode, type, misalignment,
 						      is_packed);
 }
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index cf5c0a40257..aa8a0ba7865 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -1326,6 +1326,19 @@
   }
 )
 
+;; According to RVV ISA:
+;; If an element accessed by a vector memory instruction is not naturally aligned to the size of the element,
+;; either the element is transferred successfully or an address misaligned exception is raised on that element.
+(define_expand "movmisalign<mode>"
+  [(set (match_operand:V 0 "nonimmediate_operand")
+	(match_operand:V 1 "general_operand"))]
+  "TARGET_VECTOR && TARGET_VECTOR_MISALIGN_SUPPORTED"
+  {
+    emit_move_insn (operands[0], operands[1]);
+    DONE;
+  }
+)
+
 ;; -----------------------------------------------------------------
 ;; ---- Duplicate Operations
 ;; -----------------------------------------------------------------

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2023-10-12 21:58 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-12 21:58 [gcc(refs/vendors/riscv/heads/gcc-13-with-riscv-opts)] RISC-V: Support movmisalign of RVV VLA modes Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).