[PATCH] RISC-V: Fine tune gather load RA constraint

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] RISC-V: Fine tune gather load RA constraint
@ 2023-03-13  8:28 juzhe.zhong
  2023-03-14 18:08 ` Jeff Law
  2023-04-21 20:36 ` Jeff Law
  0 siblings, 2 replies; 8+ messages in thread
From: juzhe.zhong @ 2023-03-13  8:28 UTC (permalink / raw)
  To: gcc-patches; +Cc: kito.cheng, Ju-Zhe Zhong

From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>

For DEST EEW < SOURCE EEW, we can partial overlap register
according to RVV ISA.

gcc/ChangeLog:

        * config/riscv/vector.md: Fix RA constraint.

gcc/testsuite/ChangeLog:

        * gcc.target/riscv/rvv/base/narrow_constraint-12.c: New test.

---
 gcc/config/riscv/vector.md                    |  54 ++--
 .../riscv/rvv/base/narrow_constraint-12.c     | 303 ++++++++++++++++++
 2 files changed, 330 insertions(+), 27 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/narrow_constraint-12.c

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 37a539b4852..4ea74372de5 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -1434,63 +1434,63 @@
 
 ;; DEST eew is smaller than SOURCE eew.
 (define_insn "@pred_indexed_<order>load<mode>_x2_smaller_eew"
-  [(set (match_operand:VEEWTRUNC2 0 "register_operand"                "=&vr,  &vr")
+  [(set (match_operand:VEEWTRUNC2 0 "register_operand"               "=vd, vd, vr, vr,  &vr,  &vr")
 	(if_then_else:VEEWTRUNC2
 	  (unspec:<VM>
-	    [(match_operand:<VM> 1 "vector_mask_operand"             "vmWc1,vmWc1")
-	     (match_operand 5 "vector_length_operand"                "   rK,   rK")
-	     (match_operand 6 "const_int_operand"                    "    i,    i")
-	     (match_operand 7 "const_int_operand"                    "    i,    i")
-	     (match_operand 8 "const_int_operand"                    "    i,    i")
+	    [(match_operand:<VM> 1 "vector_mask_operand"             " vm, vm,Wc1,Wc1,vmWc1,vmWc1")
+	     (match_operand 5 "vector_length_operand"                " rK, rK, rK, rK,   rK,   rK")
+	     (match_operand 6 "const_int_operand"                    "  i,  i,  i,  i,    i,    i")
+	     (match_operand 7 "const_int_operand"                    "  i,  i,  i,  i,    i,    i")
+	     (match_operand 8 "const_int_operand"                    "  i,  i,  i,  i,    i,    i")
 	     (reg:SI VL_REGNUM)
 	     (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
 	  (unspec:VEEWTRUNC2
-	    [(match_operand 3 "pmode_register_operand"               "    r,    r")
+	    [(match_operand 3 "pmode_register_operand"               "  r,  r,  r,  r,    r,    r")
 	     (mem:BLK (scratch))
-	     (match_operand:<VINDEX_DOUBLE_EXT> 4 "register_operand" "   vr,   vr")] ORDER)
-	  (match_operand:VEEWTRUNC2 2 "vector_merge_operand"         "   vu,    0")))]
+	     (match_operand:<VINDEX_DOUBLE_EXT> 4 "register_operand" "  0,  0,  0,  0,   vr,   vr")] ORDER)
+	  (match_operand:VEEWTRUNC2 2 "vector_merge_operand"         " vu,  0, vu,  0,   vu,    0")))]
   "TARGET_VECTOR"
   "vl<order>xei<double_ext_sew>.v\t%0,(%3),%4%p1"
   [(set_attr "type" "vld<order>x")
    (set_attr "mode" "<MODE>")])
 
 (define_insn "@pred_indexed_<order>load<mode>_x4_smaller_eew"
-  [(set (match_operand:VEEWTRUNC4 0 "register_operand"              "=&vr,  &vr")
+  [(set (match_operand:VEEWTRUNC4 0 "register_operand"             "=vd, vd, vr, vr,  &vr,  &vr")
 	(if_then_else:VEEWTRUNC4
 	  (unspec:<VM>
-	    [(match_operand:<VM> 1 "vector_mask_operand"           "vmWc1,vmWc1")
-	     (match_operand 5 "vector_length_operand"              "   rK,   rK")
-	     (match_operand 6 "const_int_operand"                  "    i,    i")
-	     (match_operand 7 "const_int_operand"                  "    i,    i")
-	     (match_operand 8 "const_int_operand"                  "    i,    i")
+	    [(match_operand:<VM> 1 "vector_mask_operand"           " vm, vm,Wc1,Wc1,vmWc1,vmWc1")
+	     (match_operand 5 "vector_length_operand"              " rK, rK, rK, rK,   rK,   rK")
+	     (match_operand 6 "const_int_operand"                  "  i,  i,  i,  i,    i,    i")
+	     (match_operand 7 "const_int_operand"                  "  i,  i,  i,  i,    i,    i")
+	     (match_operand 8 "const_int_operand"                  "  i,  i,  i,  i,    i,    i")
 	     (reg:SI VL_REGNUM)
 	     (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
 	  (unspec:VEEWTRUNC4
-	    [(match_operand 3 "pmode_register_operand"             "    r,    r")
+	    [(match_operand 3 "pmode_register_operand"             "  r,  r,  r,  r,    r,    r")
 	     (mem:BLK (scratch))
-	     (match_operand:<VINDEX_QUAD_EXT> 4 "register_operand" "   vr,   vr")] ORDER)
-	  (match_operand:VEEWTRUNC4 2 "vector_merge_operand"       "   vu,    0")))]
+	     (match_operand:<VINDEX_QUAD_EXT> 4 "register_operand" "  0,  0,  0,  0,   vr,   vr")] ORDER)
+	  (match_operand:VEEWTRUNC4 2 "vector_merge_operand"       " vu,  0, vu,  0,   vu,    0")))]
   "TARGET_VECTOR"
   "vl<order>xei<quad_ext_sew>.v\t%0,(%3),%4%p1"
   [(set_attr "type" "vld<order>x")
    (set_attr "mode" "<MODE>")])
 
 (define_insn "@pred_indexed_<order>load<mode>_x8_smaller_eew"
-  [(set (match_operand:VEEWTRUNC8 0 "register_operand"             "=&vr,  &vr")
+  [(set (match_operand:VEEWTRUNC8 0 "register_operand"            "=vd, vd, vr, vr,  &vr,  &vr")
 	(if_then_else:VEEWTRUNC8
 	  (unspec:<VM>
-	    [(match_operand:<VM> 1 "vector_mask_operand"          "vmWc1,vmWc1")
-	     (match_operand 5 "vector_length_operand"             "   rK,   rK")
-	     (match_operand 6 "const_int_operand"                 "    i,    i")
-	     (match_operand 7 "const_int_operand"                 "    i,    i")
-	     (match_operand 8 "const_int_operand"                 "    i,    i")
+	    [(match_operand:<VM> 1 "vector_mask_operand"          " vm, vm,Wc1,Wc1,vmWc1,vmWc1")
+	     (match_operand 5 "vector_length_operand"             " rK, rK, rK, rK,   rK,   rK")
+	     (match_operand 6 "const_int_operand"                 "  i,  i,  i,  i,    i,    i")
+	     (match_operand 7 "const_int_operand"                 "  i,  i,  i,  i,    i,    i")
+	     (match_operand 8 "const_int_operand"                 "  i,  i,  i,  i,    i,    i")
 	     (reg:SI VL_REGNUM)
 	     (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
 	  (unspec:VEEWTRUNC8
-	    [(match_operand 3 "pmode_register_operand"            "    r,    r")
+	    [(match_operand 3 "pmode_register_operand"            "  r,  r,  r,  r,    r,    r")
 	     (mem:BLK (scratch))
-	     (match_operand:<VINDEX_OCT_EXT> 4 "register_operand" "   vr,   vr")] ORDER)
-	  (match_operand:VEEWTRUNC8 2 "vector_merge_operand"      "   vu,    0")))]
+	     (match_operand:<VINDEX_OCT_EXT> 4 "register_operand" "  0,  0,  0,  0,   vr,   vr")] ORDER)
+	  (match_operand:VEEWTRUNC8 2 "vector_merge_operand"      " vu,  0, vu,  0,   vu,    0")))]
   "TARGET_VECTOR"
   "vl<order>xei<oct_ext_sew>.v\t%0,(%3),%4%p1"
   [(set_attr "type" "vld<order>x")
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/narrow_constraint-12.c b/gcc/testsuite/gcc.target/riscv/rvv/base/narrow_constraint-12.c
new file mode 100644
index 00000000000..df5b2dc5c51
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/narrow_constraint-12.c
@@ -0,0 +1,303 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */
+
+#include "riscv_vector.h"
+
+void f0 (void *base,void *out,size_t vl)
+{
+    vuint64m1_t bindex = __riscv_vle64_v_u64m1 (base, vl);
+    vint8mf8_t v = __riscv_vluxei64_v_i8mf8(base,bindex,vl);
+    __riscv_vse8_v_i8mf8 (out,v,vl);
+}
+
+void f1 (void *base,void *out,size_t vl)
+{
+    vuint64m1_t bindex = __riscv_vle64_v_u64m1 (base, vl);
+    vint8mf8_t bindex2 = __riscv_vle8_v_i8mf8 ((void *)(base + 100), vl);
+    vint8mf8_t v = __riscv_vluxei64_v_i8mf8_tu(bindex2,base,bindex,vl);
+    __riscv_vse8_v_i8mf8 (out,v,vl);
+}
+
+void f2 (void *base,void *out,size_t vl)
+{
+    vuint64m1_t bindex = __riscv_vle64_v_u64m1 (base, vl);
+    vint8mf8_t v = __riscv_vluxei64_v_i8mf8(base,bindex,vl);
+    vuint64m1_t v2 = __riscv_vadd_vv_u64m1 (bindex, bindex,vl);
+    __riscv_vse8_v_i8mf8 (out,v,vl);
+    __riscv_vse64_v_u64m1 ((void *)out,v2,vl);
+}
+
+void f3 (void *base,void *out,size_t vl, int n)
+{
+    for (int i = 0; i < n; i++){
+      vuint64m1_t bindex = __riscv_vle64_v_u64m1 (base + 100*i, vl);
+      vint8mf8_t v = __riscv_vluxei64_v_i8mf8(base,bindex,vl);
+      vuint64m1_t v2 = __riscv_vadd_vv_u64m1 (bindex, bindex,vl);
+      __riscv_vse8_v_i8mf8 (out + 100*i,v,vl);
+      __riscv_vse64_v_u64m1 ((void *)(out + 200*i),v2,vl);
+    }
+}
+
+void f4 (void *base,void *out,size_t vl)
+{
+    vuint64m1_t bindex = __riscv_vle64_v_u64m1 (base, vl);
+    vint8mf8_t v = __riscv_vluxei64_v_i8mf8(base,bindex,vl);
+    v = __riscv_vluxei64_v_i8mf8_tu(v,base,bindex,vl);
+    v = __riscv_vluxei64_v_i8mf8_tu(v,base,bindex,vl);
+    vuint64m1_t v2 = __riscv_vadd_vv_u64m1 (bindex, bindex,vl);
+    __riscv_vse8_v_i8mf8 (out,v,vl);
+    __riscv_vse64_v_u64m1 ((void *)out,v2,vl);
+}
+
+void f5 (void *base,void *base2,void *out,size_t vl, int n)
+{
+    vuint64m1_t bindex = __riscv_vle64_v_u64m1 (base + 100, vl);
+    for (int i = 0; i < n; i++){
+      vbool64_t m = __riscv_vlm_v_b64 (base + i, vl);
+      vint8mf8_t v = __riscv_vluxei64_v_i8mf8_m(m,base,bindex,vl);
+      v = __riscv_vluxei64_v_i8mf8_tu(v,base,bindex,vl);
+      v = __riscv_vle8_v_i8mf8_tu (v, base2, vl);
+      __riscv_vse8_v_i8mf8 (out + 100*i,v,vl);
+    }
+}
+
+void f6 (void *base,void *out,size_t vl)
+{
+    vuint64m8_t bindex = __riscv_vle64_v_u64m8 (base, vl);
+    vint8m1_t v = __riscv_vluxei64_v_i8m1(base,bindex,vl);
+    __riscv_vse8_v_i8m1 (out,v,vl);
+}
+
+void f7 (void *base,void *out,size_t vl)
+{
+    vuint64m8_t bindex = __riscv_vle64_v_u64m8 (base, vl);
+    vint8m1_t src = __riscv_vle8_v_i8m1 ((void *)(base + 100), vl);
+    vint8m1_t v = __riscv_vluxei64_v_i8m1_tu(src,base,bindex,vl);
+    __riscv_vse8_v_i8m1 (out,v,vl);
+}
+
+void f8 (void *base,void *out,size_t vl)
+{
+    vuint64m8_t bindex = __riscv_vle64_v_u64m8 (base, vl);
+    vint8m1_t v = __riscv_vluxei64_v_i8m1(base,bindex,vl);
+    vuint64m8_t v2 = __riscv_vadd_vv_u64m8 (bindex, bindex,vl);
+    __riscv_vse8_v_i8m1 (out,v,vl);
+    __riscv_vse64_v_u64m8 ((void *)out,v2,vl);
+}
+
+void f9 (void *base,void *out,size_t vl, int n)
+{
+    for (int i = 0; i < n; i++){
+      vuint64m8_t bindex = __riscv_vle64_v_u64m8 (base + 100*i, vl);
+      vint8m1_t v = __riscv_vluxei64_v_i8m1(base,bindex,vl);
+      vuint64m8_t v2 = __riscv_vadd_vv_u64m8 (bindex, bindex,vl);
+      __riscv_vse8_v_i8m1 (out + 100*i,v,vl);
+      __riscv_vse64_v_u64m8 ((void *)(out + 200*i),v2,vl);
+    }
+}
+
+void f10 (void *base,void *out,size_t vl)
+{
+    vuint64m8_t bindex = __riscv_vle64_v_u64m8 (base, vl);
+    vint8m1_t v = __riscv_vluxei64_v_i8m1(base,bindex,vl);
+    v = __riscv_vluxei64_v_i8m1_tu(v,base,bindex,vl);
+    v = __riscv_vluxei64_v_i8m1_tu(v,base,bindex,vl);
+    vuint64m8_t v2 = __riscv_vadd_vv_u64m8 (bindex, bindex,vl);
+    __riscv_vse8_v_i8m1 (out,v,vl);
+    __riscv_vse64_v_u64m8 ((void *)out,v2,vl);
+}
+
+void f11 (void *base,void *base2,void *out,size_t vl, int n)
+{
+    vuint64m8_t bindex = __riscv_vle64_v_u64m8 (base + 100, vl);
+    for (int i = 0; i < n; i++){
+      vbool8_t m = __riscv_vlm_v_b8 (base + i, vl);
+      vint8m1_t v = __riscv_vluxei64_v_i8m1_m(m,base,bindex,vl);
+      v = __riscv_vluxei64_v_i8m1_tu(v,base,bindex,vl);
+      v = __riscv_vle8_v_i8m1_tu (v, base2, vl);
+      __riscv_vse8_v_i8m1 (out + 100*i,v,vl);
+    }
+}
+
+void f12 (void *base,void *out,size_t vl, int n)
+{
+    vint8mf8_t v = __riscv_vle8_v_i8mf8 ((void *)(base + 1000), vl);
+    for (int i = 0; i < n; i++){
+      vuint64m1_t bindex = __riscv_vle64_v_u64m1 (base + 100*i, vl);
+      v = __riscv_vluxei64_v_i8mf8_tu(v,base,bindex,vl);
+      v = __riscv_vluxei64_v_i8mf8_tu(v,base,bindex,vl);
+      v = __riscv_vluxei64_v_i8mf8_tu(v,base,bindex,vl);
+      v = __riscv_vluxei64_v_i8mf8_tu(v,base,bindex,vl);
+      v = __riscv_vluxei64_v_i8mf8_tu(v,base,bindex,vl);
+      v = __riscv_vluxei64_v_i8mf8_tu(v,base,bindex,vl);
+      __riscv_vse8_v_i8mf8 (out + 100*i,v,vl);
+    }
+}
+
+void f13 (void *base,void *out,size_t vl, int n)
+{
+    vint8m1_t v = __riscv_vle8_v_i8m1 ((void *)(base + 1000), vl);
+    for (int i = 0; i < n; i++){
+      vuint64m8_t bindex = __riscv_vle64_v_u64m8 (base + 100*i, vl);
+      v = __riscv_vluxei64_v_i8m1_tu(v,base,bindex,vl);
+      v = __riscv_vluxei64_v_i8m1_tu(v,base,bindex,vl);
+      v = __riscv_vluxei64_v_i8m1_tu(v,base,bindex,vl);
+      v = __riscv_vluxei64_v_i8m1_tu(v,base,bindex,vl);
+      v = __riscv_vluxei64_v_i8m1_tu(v,base,bindex,vl);
+      v = __riscv_vluxei64_v_i8m1_tu(v,base,bindex,vl);
+      __riscv_vse8_v_i8m1 (out + 100*i,v,vl);
+    }
+}
+
+void f14 (void *base,void *out,size_t vl, int n)
+{
+    for (int i = 0; i < n; i++){
+      vint8mf8_t v = __riscv_vle8_v_i8mf8 ((void *)(base + 1000 * i), vl);
+      vuint64m1_t bindex = __riscv_vle64_v_u64m1 (base + 100*i, vl);
+      v = __riscv_vluxei64_v_i8mf8_tu(v,base,bindex,vl);
+      v = __riscv_vluxei64_v_i8mf8_tu(v,base,bindex,vl);
+      v = __riscv_vluxei64_v_i8mf8_tu(v,base,bindex,vl);
+      v = __riscv_vluxei64_v_i8mf8_tu(v,base,bindex,vl);
+      v = __riscv_vluxei64_v_i8mf8_tu(v,base,bindex,vl);
+      v = __riscv_vluxei64_v_i8mf8_tu(v,base,bindex,vl);
+      __riscv_vse8_v_i8mf8 (out + 100*i,v,vl);
+    }
+}
+
+void f15 (void *base,void *out,size_t vl, int n)
+{
+    for (int i = 0; i < n; i++){
+      vint8m1_t v = __riscv_vle8_v_i8m1 ((void *)(base + 1000 * i), vl);
+      vuint64m8_t bindex = __riscv_vle64_v_u64m8 (base + 100*i, vl);
+      v = __riscv_vluxei64_v_i8m1_tu(v,base,bindex,vl);
+      v = __riscv_vluxei64_v_i8m1_tu(v,base,bindex,vl);
+      v = __riscv_vluxei64_v_i8m1_tu(v,base,bindex,vl);
+      v = __riscv_vluxei64_v_i8m1_tu(v,base,bindex,vl);
+      v = __riscv_vluxei64_v_i8m1_tu(v,base,bindex,vl);
+      v = __riscv_vluxei64_v_i8m1_tu(v,base,bindex,vl);
+      __riscv_vse8_v_i8m1 (out + 100*i,v,vl);
+    }
+}
+
+void f16 (void *base,void *out,size_t vl, int n)
+{
+    for (int i = 0; i < n; i++){
+      vint8mf8_t v = __riscv_vle8_v_i8mf8 ((void *)(base + 1000 * i), vl);
+      vuint64m1_t bindex1 = __riscv_vle64_v_u64m1 (base + 100*i, vl);
+      vuint64m1_t bindex2 = __riscv_vle64_v_u64m1 (base + 200*i, vl);
+      vuint64m1_t bindex3 = __riscv_vle64_v_u64m1 (base + 300*i, vl);
+      vuint64m1_t bindex4 = __riscv_vle64_v_u64m1 (base + 400*i, vl);
+      vuint64m1_t bindex5 = __riscv_vle64_v_u64m1 (base + 500*i, vl);
+      vuint64m1_t bindex6 = __riscv_vle64_v_u64m1 (base + 600*i, vl);
+      v = __riscv_vluxei64_v_i8mf8_tu(v,base,bindex1,vl);
+      v = __riscv_vluxei64_v_i8mf8_tu(v,base,bindex2,vl);
+      v = __riscv_vluxei64_v_i8mf8_tu(v,base,bindex3,vl);
+      v = __riscv_vluxei64_v_i8mf8_tu(v,base,bindex4,vl);
+      v = __riscv_vluxei64_v_i8mf8_tu(v,base,bindex5,vl);
+      v = __riscv_vluxei64_v_i8mf8_tu(v,base,bindex6,vl);
+      __riscv_vse8_v_i8mf8 (out + 100*i,v,vl);
+    }
+}
+
+void f17 (void *base,void *out,size_t vl, int n)
+{
+    for (int i = 0; i < n; i++){
+      vint8m1_t v = __riscv_vle8_v_i8m1 ((void *)(base + 1000 * i), vl);
+      vuint64m8_t bindex1 = __riscv_vle64_v_u64m8 (base + 100*i, vl);
+      vuint64m8_t bindex2 = __riscv_vle64_v_u64m8 (base + 200*i, vl);
+      vuint64m8_t bindex3 = __riscv_vle64_v_u64m8 (base + 300*i, vl);
+      v = __riscv_vluxei64_v_i8m1_tu(v,base,bindex1,vl);
+      v = __riscv_vluxei64_v_i8m1_tu(v,base,bindex2,vl);
+      v = __riscv_vluxei64_v_i8m1_tu(v,base,bindex3,vl);
+      __riscv_vse8_v_i8m1 (out + 100*i,v,vl);
+    }
+}
+
+void f18 (void *base,void *base2,void *out,size_t vl, int n)
+{
+    vuint64m8_t bindex = __riscv_vle64_v_u64m8 (base + 100, vl);
+    for (int i = 0; i < n; i++){
+      vbool8_t m = __riscv_vlm_v_b8 (base + i, vl);
+      vuint32m4_t v = __riscv_vluxei64_v_u32m4_m(m,base,bindex,vl);
+      vuint32m4_t v2 = __riscv_vle32_v_u32m4_tu (v, base2 + i, vl);
+      vint8m1_t v3 = __riscv_vluxei32_v_i8m1_m(m,base,v2,vl);
+      __riscv_vse8_v_i8m1 (out + 100*i,v3,vl);
+    }
+}
+
+void f19 (void *base,void *base2,void *out,size_t vl, int n)
+{
+    vuint64m8_t bindex = __riscv_vle64_v_u64m8 (base + 100, vl);
+    for (int i = 0; i < n; i++){
+      vbool8_t m = __riscv_vlm_v_b8 (base + i, vl);
+      vuint64m8_t v = __riscv_vluxei64_v_u64m8_m(m,base,bindex,vl);
+      vuint64m8_t v2 = __riscv_vle64_v_u64m8_tu (v, base2 + i, vl);
+      vint8m1_t v3 = __riscv_vluxei64_v_i8m1_m(m,base,v,vl);
+      vint8m1_t v4 = __riscv_vluxei64_v_i8m1_m(m,base,v2,vl);
+      __riscv_vse8_v_i8m1 (out + 100*i,v3,vl);
+      __riscv_vse8_v_i8m1 (out + 222*i,v4,vl);
+    }
+}
+void f20 (void *base,void *out,size_t vl)
+{
+    vuint64m8_t bindex = __riscv_vle64_v_u64m8 (base, vl);
+    asm volatile("#" ::
+		 : "v0", "v1", "v2", "v3", "v4", "v5", "v6", "v7", "v8", "v9",
+		   "v10", "v11", "v12", "v13", "v14", "v15", "v16", "v17", 
+		   "v18", "v19", "v20", "v21", "v22", "v23");
+
+    vint8m1_t v = __riscv_vluxei64_v_i8m1(base,bindex,vl);
+    asm volatile("#" ::                                                        
+		 : "v0", "v1", "v2", "v3", "v4", "v5", "v6", "v7", "v8", "v9", 
+		   "v10", "v11", "v12", "v13", "v14", "v15", "v16", "v17",     
+		   "v18", "v19", "v20", "v21", "v22", "v23", "v25",     
+		   "v26", "v27", "v28", "v29", "v30", "v31");
+
+    __riscv_vse8_v_i8m1 (out,v,vl);
+}
+
+void f21 (void *base,void *out,size_t vl)
+{
+    vuint64m8_t bindex = __riscv_vle64_v_u64m8 (base, vl);
+    vbool8_t m = __riscv_vlm_v_b8 (base, vl);
+    asm volatile("#" ::
+		 : "v1", "v2", "v3", "v4", "v5", "v6", "v7", "v8", "v9",
+		   "v10", "v11", "v12", "v13", "v14", "v15", "v16", "v17", 
+		   "v18", "v19", "v20", "v21", "v22", "v23");
+
+    vint8m1_t v = __riscv_vluxei64_v_i8m1_m(m,base,bindex,vl);
+    asm volatile("#" ::                                                        
+		 : "v1", "v2", "v3", "v4", "v5", "v6", "v7", "v8", "v9", 
+		   "v10", "v11", "v12", "v13", "v14", "v15", "v16", "v17",     
+		   "v18", "v19", "v20", "v21", "v22", "v23", "v25",     
+		   "v26", "v27", "v28", "v29", "v30", "v31");
+
+    __riscv_vse8_v_i8m1 (out,v,vl);
+}
+
+void f22 (void *base,void *out,size_t vl)
+{
+    vuint64m8_t bindex = __riscv_vle64_v_u64m8 (base, vl);
+    asm volatile("#" ::
+		 : "v0", "v1", "v2", "v3", "v4", "v5", "v6", "v7", "v8", "v9",
+		   "v10", "v11", "v12", "v13", "v14", "v15", "v16", "v17", 
+		   "v18", "v19", "v20", "v21", "v22", "v23");
+
+    vint8m1_t v = __riscv_vluxei64_v_i8m1(base,bindex,vl);
+    asm volatile("#" ::                                                        
+		 : "v0", "v2", "v3", "v4", "v5", "v6", "v7", "v8", "v9", 
+		   "v10", "v11", "v12", "v13", "v14", "v15", "v16", "v17",     
+		   "v18", "v19", "v20", "v21", "v22", "v23", "v24", "v25",     
+		   "v26", "v27", "v28", "v29", "v30", "v31");
+    v = __riscv_vadd_vv_i8m1 (v,v,vl);
+    asm volatile("#" ::                                                        
+		 : "v0", "v2", "v3", "v4", "v5", "v6", "v7", "v8", "v9", 
+		   "v10", "v11", "v12", "v13", "v14", "v15", "v16", "v17",     
+		   "v18", "v19", "v20", "v21", "v22", "v23", "v24", "v25",     
+		   "v26", "v27", "v28", "v29", "v30", "v31");
+
+    __riscv_vse8_v_i8m1 (out,v,vl);
+}
+
+/* { dg-final { scan-assembler-times {vmv} 1 } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
-- 
2.36.3


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] RISC-V: Fine tune gather load RA constraint
  2023-03-13  8:28 [PATCH] RISC-V: Fine tune gather load RA constraint juzhe.zhong
@ 2023-03-14 18:08 ` Jeff Law
  2023-03-15  6:52   ` juzhe.zhong
  2023-04-21 20:36 ` Jeff Law
  1 sibling, 1 reply; 8+ messages in thread
From: Jeff Law @ 2023-03-14 18:08 UTC (permalink / raw)
  To: juzhe.zhong, gcc-patches; +Cc: kito.cheng



On 3/13/23 02:28, juzhe.zhong@rivai.ai wrote:
> From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
> 
> For DEST EEW < SOURCE EEW, we can partial overlap register
> according to RVV ISA.
> 
> gcc/ChangeLog:
> 
>          * config/riscv/vector.md: Fix RA constraint.
> 
> gcc/testsuite/ChangeLog:
> 
>          * gcc.target/riscv/rvv/base/narrow_constraint-12.c: New test.
Similarly.  I think this can wait for gcc-14.

jeff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Re: [PATCH] RISC-V: Fine tune gather load RA constraint
  2023-03-14 18:08 ` Jeff Law
@ 2023-03-15  6:52   ` juzhe.zhong
  2023-03-19 16:55     ` Jeff Law
  0 siblings, 1 reply; 8+ messages in thread
From: juzhe.zhong @ 2023-03-15  6:52 UTC (permalink / raw)
  To: jeffreyalaw, gcc-patches; +Cc: kito.cheng

[-- Attachment #1: Type: text/plain, Size: 1029 bytes --]

Hi, Jeff. I really hope the current "refine tune RA constraint" patches can be merged into GCC-13.
These patches are just making RA constraint to be consistent with RVV ISA after I double checked RVV ISA.
These RA constraints changing is very safe.
This is the last stuff that I want to make it into GCC-13. 

More patches I am gonna to send are going to expected to be merged into GCC-14.

Thanks.


juzhe.zhong@rivai.ai
 
From: Jeff Law
Date: 2023-03-15 02:08
To: juzhe.zhong; gcc-patches
CC: kito.cheng
Subject: Re: [PATCH] RISC-V: Fine tune gather load RA constraint
 
 
On 3/13/23 02:28, juzhe.zhong@rivai.ai wrote:
> From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
> 
> For DEST EEW < SOURCE EEW, we can partial overlap register
> according to RVV ISA.
> 
> gcc/ChangeLog:
> 
>          * config/riscv/vector.md: Fix RA constraint.
> 
> gcc/testsuite/ChangeLog:
> 
>          * gcc.target/riscv/rvv/base/narrow_constraint-12.c: New test.
Similarly.  I think this can wait for gcc-14.
 
jeff
 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] RISC-V: Fine tune gather load RA constraint
  2023-03-15  6:52   ` juzhe.zhong
@ 2023-03-19 16:55     ` Jeff Law
  2023-03-20  0:49       ` juzhe.zhong
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff Law @ 2023-03-19 16:55 UTC (permalink / raw)
  To: juzhe.zhong, gcc-patches; +Cc: kito.cheng



On 3/15/23 00:52, juzhe.zhong@rivai.ai wrote:
> Hi, Jeff. I really hope the current "refine tune RA constraint" patches 
> can be merged into GCC-13.
> These patches are just making RA constraint to be consistent with RVV 
> ISA after I double checked RVV ISA.
> These RA constraints changing is very safe.They may be very safe, but we're *way* past the point where we should be 
making this kind of change.  When I agreed to not object to including 
the RVV builtins in gcc-13, I never imagined we'd still be making 
changes to that code in March.   My bad for not getting clarification on 
how much work remained to be done.


Jeff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Re: [PATCH] RISC-V: Fine tune gather load RA constraint
  2023-03-19 16:55     ` Jeff Law
@ 2023-03-20  0:49       ` juzhe.zhong
  0 siblings, 0 replies; 8+ messages in thread
From: juzhe.zhong @ 2023-03-20  0:49 UTC (permalink / raw)
  To: jeffreyalaw, gcc-patches; +Cc: kito.cheng

[-- Attachment #1: Type: text/plain, Size: 1010 bytes --]

It's ok to defer them GCC-14. I will keep testing and fix bugs during these 2 months.
I won't support any more feature or optimizations until GCC-14 is open.



juzhe.zhong@rivai.ai
 
From: Jeff Law
Date: 2023-03-20 00:55
To: juzhe.zhong@rivai.ai; gcc-patches
CC: kito.cheng
Subject: Re: [PATCH] RISC-V: Fine tune gather load RA constraint
 
 
On 3/15/23 00:52, juzhe.zhong@rivai.ai wrote:
> Hi, Jeff. I really hope the current "refine tune RA constraint" patches 
> can be merged into GCC-13.
> These patches are just making RA constraint to be consistent with RVV 
> ISA after I double checked RVV ISA.
> These RA constraints changing is very safe.They may be very safe, but we're *way* past the point where we should be 
making this kind of change.  When I agreed to not object to including 
the RVV builtins in gcc-13, I never imagined we'd still be making 
changes to that code in March.   My bad for not getting clarification on 
how much work remained to be done.
 
 
Jeff
 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] RISC-V: Fine tune gather load RA constraint
  2023-03-13  8:28 [PATCH] RISC-V: Fine tune gather load RA constraint juzhe.zhong
  2023-03-14 18:08 ` Jeff Law
@ 2023-04-21 20:36 ` Jeff Law
  2023-04-24  3:05   ` juzhe.zhong
  1 sibling, 1 reply; 8+ messages in thread
From: Jeff Law @ 2023-04-21 20:36 UTC (permalink / raw)
  To: juzhe.zhong, gcc-patches; +Cc: kito.cheng

On 3/13/23 02:28, juzhe.zhong@rivai.ai wrote:
> From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
> 
> For DEST EEW < SOURCE EEW, we can partial overlap register
> according to RVV ISA.
> 
> gcc/ChangeLog:
> 
>          * config/riscv/vector.md: Fix RA constraint.
> 
> gcc/testsuite/ChangeLog:
> 
>          * gcc.target/riscv/rvv/base/narrow_constraint-12.c: New test.
This is OK.

The one question I keep having when I read these patterns is why we have 
the earlyclobber.

Earlyclobber means that the output is potentially written before the 
inputs are consumed.   Typically for a single instruction pattern such 
constraints wouldn't make a lot of sense as *usually* the inputs are 
consumed before the output is written.

Just looking for a clarification as to why the earlyclobbers are needed 
at all, particularly for non-reduction patterns.

jeff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Re: [PATCH] RISC-V: Fine tune gather load RA constraint
  2023-04-21 20:36 ` Jeff Law
@ 2023-04-24  3:05   ` juzhe.zhong
  2023-04-26  4:21     ` Kito Cheng
  0 siblings, 1 reply; 8+ messages in thread
From: juzhe.zhong @ 2023-04-24  3:05 UTC (permalink / raw)
  To: jeffreyalaw, gcc-patches; +Cc: kito.cheng

[-- Attachment #1: Type: text/plain, Size: 1368 bytes --]

Adding  earlyclobber is to make dest operand do not overlap with source operand.
For example:
for gather load, vluxei.v v8,(a5),v8 is illegal according to RVV ISA.
GCC is using same way as LLVM which is also adding earlyclobber for modeling disabling overlap between dest and source operand.

juzhe.zhong@rivai.ai

From: Jeff Law
Date: 2023-04-22 04:36
To: juzhe.zhong; gcc-patches
CC: kito.cheng
Subject: Re: [PATCH] RISC-V: Fine tune gather load RA constraint

On 3/13/23 02:28, juzhe.zhong@rivai.ai wrote:
> From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
> 
> For DEST EEW < SOURCE EEW, we can partial overlap register
> according to RVV ISA.
> 
> gcc/ChangeLog:
> 
>          * config/riscv/vector.md: Fix RA constraint.
> 
> gcc/testsuite/ChangeLog:
> 
>          * gcc.target/riscv/rvv/base/narrow_constraint-12.c: New test.
This is OK.

The one question I keep having when I read these patterns is why we have 
the earlyclobber.

Earlyclobber means that the output is potentially written before the 
inputs are consumed.   Typically for a single instruction pattern such 
constraints wouldn't make a lot of sense as *usually* the inputs are 
consumed before the output is written.

Just looking for a clarification as to why the earlyclobbers are needed 
at all, particularly for non-reduction patterns.

jeff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Re: [PATCH] RISC-V: Fine tune gather load RA constraint
  2023-04-24  3:05   ` juzhe.zhong
@ 2023-04-26  4:21     ` Kito Cheng
  0 siblings, 0 replies; 8+ messages in thread
From: Kito Cheng @ 2023-04-26  4:21 UTC (permalink / raw)
  To: juzhe.zhong; +Cc: jeffreyalaw, gcc-patches

Committed to trunk

On Mon, Apr 24, 2023 at 11:06 AM juzhe.zhong@rivai.ai
<juzhe.zhong@rivai.ai> wrote:
>
> Adding  earlyclobber is to make dest operand do not overlap with source operand.
> For example:
> for gather load, vluxei.v v8,(a5),v8 is illegal according to RVV ISA.
> GCC is using same way as LLVM which is also adding earlyclobber for modeling disabling overlap between dest and source operand.
>
>
>
> juzhe.zhong@rivai.ai
>
> From: Jeff Law
> Date: 2023-04-22 04:36
> To: juzhe.zhong; gcc-patches
> CC: kito.cheng
> Subject: Re: [PATCH] RISC-V: Fine tune gather load RA constraint
>
>
> On 3/13/23 02:28, juzhe.zhong@rivai.ai wrote:
> > From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
> >
> > For DEST EEW < SOURCE EEW, we can partial overlap register
> > according to RVV ISA.
> >
> > gcc/ChangeLog:
> >
> >          * config/riscv/vector.md: Fix RA constraint.
> >
> > gcc/testsuite/ChangeLog:
> >
> >          * gcc.target/riscv/rvv/base/narrow_constraint-12.c: New test.
> This is OK.
>
> The one question I keep having when I read these patterns is why we have
> the earlyclobber.
>
> Earlyclobber means that the output is potentially written before the
> inputs are consumed.   Typically for a single instruction pattern such
> constraints wouldn't make a lot of sense as *usually* the inputs are
> consumed before the output is written.
>
> Just looking for a clarification as to why the earlyclobbers are needed
> at all, particularly for non-reduction patterns.
>
> jeff
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-04-26  4:22 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-13  8:28 [PATCH] RISC-V: Fine tune gather load RA constraint juzhe.zhong
2023-03-14 18:08 ` Jeff Law
2023-03-15  6:52   ` juzhe.zhong
2023-03-19 16:55     ` Jeff Law
2023-03-20  0:49       ` juzhe.zhong
2023-04-21 20:36 ` Jeff Law
2023-04-24  3:05   ` juzhe.zhong
2023-04-26  4:21     ` Kito Cheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).