[PATCH V2] RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv instruction optimizations

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH V2] RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv instruction optimizations
@ 2023-06-01  8:32 juzhe.zhong
  2023-06-02  7:01 ` Robin Dapp
  0 siblings, 1 reply; 7+ messages in thread
From: juzhe.zhong @ 2023-06-01  8:32 UTC (permalink / raw)
  To: gcc-patches
  Cc: kito.cheng, kito.cheng, palmer, palmer, jeffreyalaw, rdapp.gcc,
	Juzhe-Zhong

From: Juzhe-Zhong <juzhe.zhong@rivai.ai>

This patch is to enhance vwmul.vv combine optimizations.
Consider this following code:
void
vwadd_int16_t_int8_t (int16_t *__restrict dst, int16_t *__restrict dst2,
		      int16_t *__restrict dst3, int16_t *__restrict dst4,
		      int8_t *__restrict a, int8_t *__restrict b,
		      int8_t *__restrict a2, int8_t *__restrict b2, int n)
{
  for (int i = 0; i < n; i++)
    {
      dst[i] = (int16_t) a[i] * (int16_t) b[i];
      dst2[i] = (int16_t) a2[i] * (int16_t) b[i];
      dst3[i] = (int16_t) a2[i] * (int16_t) a[i];
      dst4[i] = (int16_t) a[i] * (int16_t) b2[i];
    }
}

In such complicate case, the operand is not single used, used by multiple statements.
GCC combine optimization will iterate the combination of the operands.

Also, we add another pattern of vwmulsu.vv to enhance the vwmulsu.vv optimization.
Currently, we have format:

(mult: (sign_extend) (zero_extend)) in vector.md for intrinsics calling.
Now, we add a new vwmulsu.ww with this format:
(mult: (zero_extend) (sign_extend)) 

To handle this following cases (sign and unsigned widening multiplication mixing codes):
void
vwadd_int16_t_int8_t (int16_t *__restrict dst, int16_t *__restrict dst2,
		      int16_t *__restrict dst3, int16_t *__restrict dst4,
		      int8_t *__restrict a, uint8_t *__restrict b,
		      uint8_t *__restrict a2, int8_t *__restrict b2, int n)
{
  for (int i = 0; i < n; i++)
    {
      dst[i] = (int16_t) a[i] * (int16_t) b[i];
      dst2[i] = (int16_t) a2[i] * (int16_t) b[i];
      dst3[i] = (int16_t) a2[i] * (int16_t) a[i];
      dst4[i] = (int16_t) a[i] * (int16_t) b2[i];
    }
}

Before this patch:

...
       vsetvli zero,t1,e8,m1,ta,ma
        vle8.v  v1,0(a4)
        vsetvli t3,zero,e16,m2,ta,ma
        vsext.vf2       v6,v1
        vsetvli zero,t1,e8,m1,ta,ma
        vle8.v  v1,0(a5)
        vsetvli t3,zero,e16,m2,ta,ma
        add     t0,a0,t4
        vzext.vf2       v4,v1
        vmul.vv v2,v4,v6
        vsetvli zero,t1,e16,m2,ta,ma
        vse16.v v2,0(t0)
        vle8.v  v1,0(a6)
        vsetvli t3,zero,e16,m2,ta,ma
        add     t0,a1,t4
        vzext.vf2       v2,v1
        vmul.vv v4,v2,v4
        vsetvli zero,t1,e16,m2,ta,ma
        vse16.v v4,0(t0)
        vsetvli t3,zero,e16,m2,ta,ma
        add     t0,a2,t4
        vmul.vv v2,v2,v6
        vsetvli zero,t1,e16,m2,ta,ma
        vse16.v v2,0(t0)
        add     t0,a3,t4
        vle8.v  v1,0(a7)
        vsetvli t3,zero,e16,m2,ta,ma
        sub     t6,t6,t1
        vsext.vf2       v2,v1
        vmul.vv v2,v2,v6
        vsetvli zero,t1,e16,m2,ta,ma
        vse16.v v2,0(t0)
...

After this patch:
...
      vsetvli zero,t1,e8,mf2,ta,ma
        vle8.v  v1,0(a4)
        vle8.v  v3,0(a5)
        vsetvli t6,zero,e8,mf2,ta,ma
        add     t0,a0,t3
        vwmulsu.vv      v2,v1,v3
        vsetvli zero,t1,e16,m1,ta,ma
        vse16.v v2,0(t0)
        vle8.v  v2,0(a6)
        vsetvli t6,zero,e8,mf2,ta,ma
        add     t0,a1,t3
        vwmulu.vv       v4,v3,v2
        vsetvli zero,t1,e16,m1,ta,ma
        vse16.v v4,0(t0)
        vsetvli t6,zero,e8,mf2,ta,ma
        add     t0,a2,t3
        vwmulsu.vv      v3,v1,v2
        vsetvli zero,t1,e16,m1,ta,ma
        vse16.v v3,0(t0)
        add     t0,a3,t3
        vle8.v  v3,0(a7)
        vsetvli t6,zero,e8,mf2,ta,ma
        sub     t4,t4,t1
        vwmul.vv        v2,v1,v3
        vsetvli zero,t1,e16,m1,ta,ma
        vse16.v v2,0(t0)
...

gcc/ChangeLog:

        * config/riscv/vector.md: Add vector-opt.md.
        * config/riscv/autovec-opt.md: New file.

gcc/testsuite/ChangeLog:

        * gcc.target/riscv/rvv/autovec/widen/widen-7.c: New test.
        * gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c: New test.
        * gcc.target/riscv/rvv/autovec/widen/widen-complicate-4.c: New test.
        * gcc.target/riscv/rvv/autovec/widen/widen_run-7.c: New test.

---
 gcc/config/riscv/autovec-opt.md               | 80 +++++++++++++++++++
 gcc/config/riscv/vector.md                    |  3 +-
 .../riscv/rvv/autovec/widen/widen-7.c         | 27 +++++++
 .../rvv/autovec/widen/widen-complicate-3.c    | 32 ++++++++
 .../rvv/autovec/widen/widen-complicate-4.c    | 31 +++++++
 .../riscv/rvv/autovec/widen/widen_run-7.c     | 34 ++++++++
 6 files changed, 206 insertions(+), 1 deletion(-)
 create mode 100644 gcc/config/riscv/autovec-opt.md
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-7.c

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
new file mode 100644
index 00000000000..92cdc4e9a16
--- /dev/null
+++ b/gcc/config/riscv/autovec-opt.md
@@ -0,0 +1,80 @@
+;; Machine description for optimization of RVV auto-vectorization.
+;; Copyright (C) 2023 Free Software Foundation, Inc.
+;; Contributed by Juzhe Zhong (juzhe.zhong@rivai.ai), RiVAI Technologies Ltd.
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+;; We don't have vwmul.wv instruction like vwadd.wv in RVV.
+;; This pattern is an intermediate RTL IR as a pseudo vwmul.wv to enhance
+;; optimization of instructions combine.
+(define_insn_and_split "@pred_single_widen_mul<any_extend:su><mode>"
+  [(set (match_operand:VWEXTI 0 "register_operand"                  "=&vr,&vr")
+	(if_then_else:VWEXTI
+	  (unspec:<VM>
+	    [(match_operand:<VM> 1 "vector_mask_operand"           "vmWc1,vmWc1")
+	     (match_operand 5 "vector_length_operand"              "   rK,   rK")
+	     (match_operand 6 "const_int_operand"                  "    i,    i")
+	     (match_operand 7 "const_int_operand"                  "    i,    i")
+	     (match_operand 8 "const_int_operand"                  "    i,    i")
+	     (reg:SI VL_REGNUM)
+	     (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+	  (mult:VWEXTI
+	    (any_extend:VWEXTI
+	      (match_operand:<V_DOUBLE_TRUNC> 4 "register_operand" "   vr,   vr"))
+	    (match_operand:VWEXTI 3 "register_operand"             "   vr,   vr"))
+	  (match_operand:VWEXTI 2 "vector_merge_operand"           "   vu,    0")))]
+  "TARGET_VECTOR"
+  "#"
+  "&& can_create_pseudo_p ()"
+  [(const_int 0)]
+  {
+    insn_code icode = code_for_pred_vf2 (<CODE>, <MODE>mode);
+    rtx tmp = gen_reg_rtx (<MODE>mode);
+    rtx ops[] = {tmp, operands[4]};
+    riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, ops);
+
+    emit_insn (gen_pred (MULT, <MODE>mode, operands[0], operands[1], operands[2],
+			 operands[3], tmp, operands[5], operands[6],
+			 operands[7], operands[8]));
+    DONE;
+  }
+  [(set_attr "type" "viwmul")
+   (set_attr "mode" "<MODE>")])
+
+;; This pattern it to enchance the instruction combine optimizations for complicate
+;; sign and unsigned widening multiplication operations.
+(define_insn "*pred_widen_mulsu<mode>"
+  [(set (match_operand:VWEXTI 0 "register_operand"                  "=&vr,&vr")
+	(if_then_else:VWEXTI
+	  (unspec:<VM>
+	    [(match_operand:<VM> 1 "vector_mask_operand"           "vmWc1,vmWc1")
+	     (match_operand 5 "vector_length_operand"              "   rK,   rK")
+	     (match_operand 6 "const_int_operand"                  "    i,    i")
+	     (match_operand 7 "const_int_operand"                  "    i,    i")
+	     (match_operand 8 "const_int_operand"                  "    i,    i")
+	     (reg:SI VL_REGNUM)
+	     (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+	  (mult:VWEXTI
+	    (zero_extend:VWEXTI
+	      (match_operand:<V_DOUBLE_TRUNC> 4 "register_operand" "   vr,   vr"))
+	    (sign_extend:VWEXTI
+	      (match_operand:<V_DOUBLE_TRUNC> 3 "register_operand" "   vr,   vr")))
+	  (match_operand:VWEXTI 2 "vector_merge_operand"           "   vu,    0")))]
+  "TARGET_VECTOR"
+  "vwmulsu.vv\t%0,%3,%4%p1"
+  [(set_attr "type" "viwmul")
+   (set_attr "mode" "<V_DOUBLE_TRUNC>")])
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index c74dce89db6..419853a93c1 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -24,7 +24,7 @@
 ;;
 ;; - Intrinsics (https://github.com/riscv/rvv-intrinsic-doc)
 ;; - Auto-vectorization (autovec.md)
-;; - Combine optimization (TBD)
+;; - Optimization (autovec-opt.md)
 
 (include "vector-iterators.md")
 
@@ -8422,3 +8422,4 @@
 )
 
 (include "autovec.md")
+(include "autovec-opt.md")
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-7.c
new file mode 100644
index 00000000000..cc43d9ba3fe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-7.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable" } */
+
+#include <stdint-gcc.h>
+
+#define TEST_TYPE(TYPE1, TYPE2)                                                \
+  __attribute__ ((noipa)) void vwmul_##TYPE1_##TYPE2 (TYPE1 *__restrict dst,   \
+						      TYPE2 *__restrict a,     \
+						      TYPE1 *__restrict b,     \
+						      int n)                   \
+  {                                                                            \
+    for (int i = 0; i < n; i++)                                                \
+      dst[i] = ((TYPE1) a[i]) * b[i];                                          \
+  }
+
+#define TEST_ALL()                                                             \
+  TEST_TYPE (int16_t, int8_t)                                                  \
+  TEST_TYPE (uint16_t, uint8_t)                                                \
+  TEST_TYPE (int32_t, int16_t)                                                 \
+  TEST_TYPE (uint32_t, uint16_t)                                               \
+  TEST_TYPE (int64_t, int32_t)                                                 \
+  TEST_TYPE (uint64_t, uint32_t)
+
+TEST_ALL ()
+
+/* { dg-final { scan-assembler-times {\tvsext\.vf2} 3 } } */
+/* { dg-final { scan-assembler-times {\tvzext\.vf2} 3 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c
new file mode 100644
index 00000000000..e1fd79430c3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable" } */
+
+#include <stdint-gcc.h>
+
+#define TEST_TYPE(TYPE1, TYPE2)                                                \
+  __attribute__ ((noipa)) void vwadd_##TYPE1_##TYPE2 (                         \
+    TYPE1 *__restrict dst, TYPE1 *__restrict dst2, TYPE1 *__restrict dst3,     \
+    TYPE1 *__restrict dst4, TYPE2 *__restrict a, TYPE2 *__restrict b,          \
+    TYPE2 *__restrict a2, TYPE2 *__restrict b2, int n)                         \
+  {                                                                            \
+    for (int i = 0; i < n; i++)                                                \
+      {                                                                        \
+	dst[i] = (TYPE1) a[i] * (TYPE1) b[i];                                  \
+	dst2[i] = (TYPE1) a2[i] * (TYPE1) b[i];                                \
+	dst3[i] = (TYPE1) a2[i] * (TYPE1) a[i];                                \
+	dst4[i] = (TYPE1) a[i] * (TYPE1) b2[i];                                \
+      }                                                                        \
+  }
+
+#define TEST_ALL()                                                             \
+  TEST_TYPE (int16_t, int8_t)                                                  \
+  TEST_TYPE (uint16_t, uint8_t)                                                \
+  TEST_TYPE (int32_t, int16_t)                                                 \
+  TEST_TYPE (uint32_t, uint16_t)                                               \
+  TEST_TYPE (int64_t, int32_t)                                                 \
+  TEST_TYPE (uint64_t, uint32_t)
+
+TEST_ALL ()
+
+/* { dg-final { scan-assembler-times {\tvwmul\.vv} 12 } } */
+/* { dg-final { scan-assembler-times {\tvwmulu\.vv} 12 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-4.c
new file mode 100644
index 00000000000..15fdefc550b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-4.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable" } */
+
+#include <stdint-gcc.h>
+
+#define TEST_TYPE(TYPE1, TYPE2, TYPE3)                                         \
+  __attribute__ ((noipa)) void vwadd_##TYPE1_##TYPE2 (                         \
+    TYPE1 *__restrict dst, TYPE1 *__restrict dst2, TYPE1 *__restrict dst3,     \
+    TYPE1 *__restrict dst4, TYPE2 *__restrict a, TYPE3 *__restrict b,          \
+    TYPE3 *__restrict a2, TYPE2 *__restrict b2, int n)                         \
+  {                                                                            \
+    for (int i = 0; i < n; i++)                                                \
+      {                                                                        \
+	dst[i] = (TYPE1) a[i] * (TYPE1) b[i];                                  \
+	dst2[i] = (TYPE1) a2[i] * (TYPE1) b[i];                                \
+	dst3[i] = (TYPE1) a2[i] * (TYPE1) a[i];                                \
+	dst4[i] = (TYPE1) a[i] * (TYPE1) b2[i];                                \
+      }                                                                        \
+  }
+
+#define TEST_ALL()                                                             \
+  TEST_TYPE (int16_t, int8_t, uint8_t)                                         \
+  TEST_TYPE (int32_t, int16_t, uint16_t)                                       \
+  TEST_TYPE (int64_t, int32_t, uint32_t)
+
+TEST_ALL ()
+
+/* { dg-final { scan-assembler-times {\tvwmulsu\.vv} 6 } } */
+/* { dg-final { scan-assembler-times {\tvwmul\.vv} 3 } } */
+/* { dg-final { scan-assembler-times {\tvwmulu\.vv} 3 } } */
+/* { dg-final { scan-assembler-not {\tvmul} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-7.c
new file mode 100644
index 00000000000..4abddd5d718
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-7.c
@@ -0,0 +1,34 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable" } */
+
+#include <assert.h>
+#include "widen-7.c"
+
+#define SZ 512
+
+#define RUN(TYPE1, TYPE2, LIMIT)                                               \
+  TYPE2 a##TYPE2[SZ];                                                          \
+  TYPE1 b##TYPE1[SZ];                                                          \
+  TYPE1 dst##TYPE1[SZ];                                                        \
+  for (int i = 0; i < SZ; i++)                                                 \
+    {                                                                          \
+      a##TYPE2[i] = LIMIT + i % LIMIT;                                         \
+      b##TYPE1[i] = LIMIT + i & LIMIT;                                         \
+    }                                                                          \
+  vwmul_##TYPE1_##TYPE2 (dst##TYPE1, a##TYPE2, b##TYPE1, SZ);                  \
+  for (int i = 0; i < SZ; i++)                                                 \
+    assert (dst##TYPE1[i] == (((TYPE1) a##TYPE2[i]) * b##TYPE1[i]));
+
+#define RUN_ALL()                                                              \
+  RUN (int16_t, int8_t, -128)                                                  \
+  RUN (uint16_t, uint8_t, 255)                                                 \
+  RUN (int32_t, int16_t, -32768)                                               \
+  RUN (uint32_t, uint16_t, 65535)                                              \
+  RUN (int64_t, int32_t, -2147483648)                                          \
+  RUN (uint64_t, uint32_t, 4294967295)
+
+int
+main ()
+{
+  RUN_ALL ()
+}
-- 
2.36.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V2] RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv instruction optimizations
  2023-06-01  8:32 [PATCH V2] RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv instruction optimizations juzhe.zhong
@ 2023-06-02  7:01 ` Robin Dapp
  2023-06-02  7:13   ` juzhe.zhong
  0 siblings, 1 reply; 7+ messages in thread
From: Robin Dapp @ 2023-06-02  7:01 UTC (permalink / raw)
  To: juzhe.zhong, gcc-patches
  Cc: rdapp.gcc, kito.cheng, kito.cheng, palmer, palmer, jeffreyalaw

Hi Juzhe,

> ...
>        vsetvli zero,t1,e8,m1,ta,ma
>         vle8.v  v1,0(a4)
>         vsetvli t3,zero,e16,m2,ta,ma
>         vsext.vf2       v6,v1
>         vsetvli zero,t1,e8,m1,ta,ma
>         vle8.v  v1,0(a5)
>         vsetvli t3,zero,e16,m2,ta,ma
>         add     t0,a0,t4
>         vzext.vf2       v4,v1
>         vmul.vv v2,v4,v6
>         vsetvli zero,t1,e16,m2,ta,ma
>         vse16.v v2,0(t0)
>         vle8.v  v1,0(a6)
>         vsetvli t3,zero,e16,m2,ta,ma
>         add     t0,a1,t4
>         vzext.vf2       v2,v1
>         vmul.vv v4,v2,v4
>         vsetvli zero,t1,e16,m2,ta,ma
>         vse16.v v4,0(t0)
>         vsetvli t3,zero,e16,m2,ta,ma
>         add     t0,a2,t4
>         vmul.vv v2,v2,v6
>         vsetvli zero,t1,e16,m2,ta,ma
>         vse16.v v2,0(t0)
>         add     t0,a3,t4
>         vle8.v  v1,0(a7)
>         vsetvli t3,zero,e16,m2,ta,ma
>         sub     t6,t6,t1
>         vsext.vf2       v2,v1
>         vmul.vv v2,v2,v6
>         vsetvli zero,t1,e16,m2,ta,ma
>         vse16.v v2,0(t0)
> ...
> 
> After this patch:
> ...
>       vsetvli zero,t1,e8,mf2,ta,ma
>         vle8.v  v1,0(a4)
>         vle8.v  v3,0(a5)
>         vsetvli t6,zero,e8,mf2,ta,ma
>         add     t0,a0,t3
>         vwmulsu.vv      v2,v1,v3
>         vsetvli zero,t1,e16,m1,ta,ma
>         vse16.v v2,0(t0)
>         vle8.v  v2,0(a6)
>         vsetvli t6,zero,e8,mf2,ta,ma
>         add     t0,a1,t3
>         vwmulu.vv       v4,v3,v2
>         vsetvli zero,t1,e16,m1,ta,ma
>         vse16.v v4,0(t0)
>         vsetvli t6,zero,e8,mf2,ta,ma
>         add     t0,a2,t3
>         vwmulsu.vv      v3,v1,v2
>         vsetvli zero,t1,e16,m1,ta,ma
>         vse16.v v3,0(t0)
>         add     t0,a3,t3
>         vle8.v  v3,0(a7)
>         vsetvli t6,zero,e8,mf2,ta,ma
>         sub     t4,t4,t1
>         vwmul.vv        v2,v1,v3
>         vsetvli zero,t1,e16,m1,ta,ma
>         vse16.v v2,0(t0)
> ...

I like the code examples in general but find them hard to read
at lengths > 5-10 or so.  Could we condense this a bit?

> +(include "autovec-opt.md")
ACK for this.  We discussed before that not cluttering the regular
autovec.md with combine-targeted patterns too much so I'm in favor
of the separate file.

In total looks good to me.  I'm a bit wary about getting the costs
right for combine patterns but we can deal with this later.

Regards
 Robin

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Re: [PATCH V2] RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv instruction optimizations
  2023-06-02  7:01 ` Robin Dapp
@ 2023-06-02  7:13   ` juzhe.zhong
  2023-06-02  7:18     ` Robin Dapp
  0 siblings, 1 reply; 7+ messages in thread
From: juzhe.zhong @ 2023-06-02  7:13 UTC (permalink / raw)
  To: Robin Dapp, gcc-patches
  Cc: Robin Dapp, kito.cheng, Kito.cheng, palmer, palmer, jeffreyalaw

[-- Attachment #1: Type: text/plain, Size: 3613 bytes --]

Hi, Robin.

>> I like the code examples in general but find them hard to read
>> at lengths > 5-10 or so.  Could we condense this a bit?
Ok, Do I need to send V2 ? Or condense the commit log when merged the patch?


>> I'm a bit wary about getting the costs

>> right for combine patterns but we can deal with this later.

No, you don't need to worry about combining extensions and I don't think we need cost to adjust extensions combining.

For vmv.v.x + vadd.vv ==> vadd.vx, we can't claim that vadd.vx is better since it will increase scalar register pressures.
So, for such combining, I would like take a another approach to combine this pattern carefully with accurate register pressure calculation.

However, for this patch.

vext.vf2 + vext.vf2 + vadd ==> vwadd.vv is always better.
I don't think it is possible that using vwadd.vv will be worse. 

Thanks.



juzhe.zhong@rivai.ai
 
From: Robin Dapp
Date: 2023-06-02 15:01
To: juzhe.zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw
Subject: Re: [PATCH V2] RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv instruction optimizations
Hi Juzhe,
 
> ...
>        vsetvli zero,t1,e8,m1,ta,ma
>         vle8.v  v1,0(a4)
>         vsetvli t3,zero,e16,m2,ta,ma
>         vsext.vf2       v6,v1
>         vsetvli zero,t1,e8,m1,ta,ma
>         vle8.v  v1,0(a5)
>         vsetvli t3,zero,e16,m2,ta,ma
>         add     t0,a0,t4
>         vzext.vf2       v4,v1
>         vmul.vv v2,v4,v6
>         vsetvli zero,t1,e16,m2,ta,ma
>         vse16.v v2,0(t0)
>         vle8.v  v1,0(a6)
>         vsetvli t3,zero,e16,m2,ta,ma
>         add     t0,a1,t4
>         vzext.vf2       v2,v1
>         vmul.vv v4,v2,v4
>         vsetvli zero,t1,e16,m2,ta,ma
>         vse16.v v4,0(t0)
>         vsetvli t3,zero,e16,m2,ta,ma
>         add     t0,a2,t4
>         vmul.vv v2,v2,v6
>         vsetvli zero,t1,e16,m2,ta,ma
>         vse16.v v2,0(t0)
>         add     t0,a3,t4
>         vle8.v  v1,0(a7)
>         vsetvli t3,zero,e16,m2,ta,ma
>         sub     t6,t6,t1
>         vsext.vf2       v2,v1
>         vmul.vv v2,v2,v6
>         vsetvli zero,t1,e16,m2,ta,ma
>         vse16.v v2,0(t0)
> ...
> 
> After this patch:
> ...
>       vsetvli zero,t1,e8,mf2,ta,ma
>         vle8.v  v1,0(a4)
>         vle8.v  v3,0(a5)
>         vsetvli t6,zero,e8,mf2,ta,ma
>         add     t0,a0,t3
>         vwmulsu.vv      v2,v1,v3
>         vsetvli zero,t1,e16,m1,ta,ma
>         vse16.v v2,0(t0)
>         vle8.v  v2,0(a6)
>         vsetvli t6,zero,e8,mf2,ta,ma
>         add     t0,a1,t3
>         vwmulu.vv       v4,v3,v2
>         vsetvli zero,t1,e16,m1,ta,ma
>         vse16.v v4,0(t0)
>         vsetvli t6,zero,e8,mf2,ta,ma
>         add     t0,a2,t3
>         vwmulsu.vv      v3,v1,v2
>         vsetvli zero,t1,e16,m1,ta,ma
>         vse16.v v3,0(t0)
>         add     t0,a3,t3
>         vle8.v  v3,0(a7)
>         vsetvli t6,zero,e8,mf2,ta,ma
>         sub     t4,t4,t1
>         vwmul.vv        v2,v1,v3
>         vsetvli zero,t1,e16,m1,ta,ma
>         vse16.v v2,0(t0)
> ...
 
I like the code examples in general but find them hard to read
at lengths > 5-10 or so.  Could we condense this a bit?
 
> +(include "autovec-opt.md")
ACK for this.  We discussed before that not cluttering the regular
autovec.md with combine-targeted patterns too much so I'm in favor
of the separate file.
 
In total looks good to me.  I'm a bit wary about getting the costs
right for combine patterns but we can deal with this later.
 
Regards
Robin
 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V2] RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv instruction optimizations
  2023-06-02  7:13   ` juzhe.zhong
@ 2023-06-02  7:18     ` Robin Dapp
  2023-06-02  7:19       ` juzhe.zhong
  0 siblings, 1 reply; 7+ messages in thread
From: Robin Dapp @ 2023-06-02  7:18 UTC (permalink / raw)
  To: juzhe.zhong, gcc-patches
  Cc: rdapp.gcc, kito.cheng, Kito.cheng, palmer, palmer, jeffreyalaw

>>> I like the code examples in general but find them hard to read
>>> at lengths > 5-10 or so.  Could we condense this a bit?
> Ok, Do I need to send V2 ? Or condense the commit log when merged the patch?

Sure, just condense a bit. No need for V2.

Regards
 Robin

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Re: [PATCH V2] RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv instruction optimizations
  2023-06-02  7:18     ` Robin Dapp
@ 2023-06-02  7:19       ` juzhe.zhong
  2023-06-03  1:10         ` Kito Cheng
  0 siblings, 1 reply; 7+ messages in thread
From: juzhe.zhong @ 2023-06-02  7:19 UTC (permalink / raw)
  To: Robin Dapp, gcc-patches
  Cc: Robin Dapp, kito.cheng, Kito.cheng, palmer, palmer, jeffreyalaw

[-- Attachment #1: Type: text/plain, Size: 622 bytes --]

Thanks. I am gonna wait for Jeff or Kito final approve.



juzhe.zhong@rivai.ai
 
From: Robin Dapp
Date: 2023-06-02 15:18
To: juzhe.zhong@rivai.ai; gcc-patches
CC: rdapp.gcc; kito.cheng; Kito.cheng; palmer; palmer; jeffreyalaw
Subject: Re: [PATCH V2] RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv instruction optimizations
>>> I like the code examples in general but find them hard to read
>>> at lengths > 5-10 or so.  Could we condense this a bit?
> Ok, Do I need to send V2 ? Or condense the commit log when merged the patch?
 
Sure, just condense a bit. No need for V2.
 
Regards
Robin
 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Re: [PATCH V2] RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv instruction optimizations
  2023-06-02  7:19       ` juzhe.zhong
@ 2023-06-03  1:10         ` Kito Cheng
  2023-06-03  1:51           ` Li, Pan2
  0 siblings, 1 reply; 7+ messages in thread
From: Kito Cheng @ 2023-06-03  1:10 UTC (permalink / raw)
  To: 钟居哲
  Cc: Robin Dapp, gcc-patches, kito.cheng, palmer, palmer, jeffreyalaw

[-- Attachment #1: Type: text/plain, Size: 1002 bytes --]

Lgtm, thanks:)

juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai> 於 2023年6月2日 週五 15:20 寫道：

> Thanks. I am gonna wait for Jeff or Kito final approve.
>
> ------------------------------
> juzhe.zhong@rivai.ai
>
>
> *From:* Robin Dapp <rdapp.gcc@gmail.com>
> *Date:* 2023-06-02 15:18
> *To:* juzhe.zhong@rivai.ai; gcc-patches <gcc-patches@gcc.gnu.org>
> *CC:* rdapp.gcc <rdapp.gcc@gmail.com>; kito.cheng <kito.cheng@gmail.com>;
> Kito.cheng <kito.cheng@sifive.com>; palmer <palmer@dabbelt.com>; palmer
> <palmer@rivosinc.com>; jeffreyalaw <jeffreyalaw@gmail.com>
> *Subject:* Re: [PATCH V2] RISC-V: Add pseudo vwmul.wv pattern to enhance
> vwmul.vv instruction optimizations
> >>> I like the code examples in general but find them hard to read
> >>> at lengths > 5-10 or so.  Could we condense this a bit?
> > Ok, Do I need to send V2 ? Or condense the commit log when merged the
> patch?
>
> Sure, just condense a bit. No need for V2.
>
> Regards
> Robin
>
>
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: Re: [PATCH V2] RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv instruction optimizations
  2023-06-03  1:10         ` Kito Cheng
@ 2023-06-03  1:51           ` Li, Pan2
  0 siblings, 0 replies; 7+ messages in thread
From: Li, Pan2 @ 2023-06-03  1:51 UTC (permalink / raw)
  To: Kito Cheng, 钟居哲
  Cc: Robin Dapp, gcc-patches, kito.cheng, palmer, palmer, jeffreyalaw

Committed, with Robin's suggestion for commit log, thanks Robin and Kito.

Pan

-----Original Message-----
From: Gcc-patches <gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org> On Behalf Of Kito Cheng via Gcc-patches
Sent: Saturday, June 3, 2023 9:10 AM
To: 钟居哲 <juzhe.zhong@rivai.ai>
Cc: Robin Dapp <rdapp.gcc@gmail.com>; gcc-patches <gcc-patches@gcc.gnu.org>; kito.cheng <kito.cheng@gmail.com>; palmer <palmer@dabbelt.com>; palmer <palmer@rivosinc.com>; jeffreyalaw <jeffreyalaw@gmail.com>
Subject: Re: Re: [PATCH V2] RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv instruction optimizations

Lgtm, thanks:)

juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai> 於 2023年6月2日 週五 15:20 寫道：

> Thanks. I am gonna wait for Jeff or Kito final approve.
>
> ------------------------------
> juzhe.zhong@rivai.ai
>
>
> *From:* Robin Dapp <rdapp.gcc@gmail.com>
> *Date:* 2023-06-02 15:18
> *To:* juzhe.zhong@rivai.ai; gcc-patches <gcc-patches@gcc.gnu.org>
> *CC:* rdapp.gcc <rdapp.gcc@gmail.com>; kito.cheng 
> <kito.cheng@gmail.com>; Kito.cheng <kito.cheng@sifive.com>; palmer 
> <palmer@dabbelt.com>; palmer <palmer@rivosinc.com>; jeffreyalaw 
> <jeffreyalaw@gmail.com>
> *Subject:* Re: [PATCH V2] RISC-V: Add pseudo vwmul.wv pattern to 
> enhance vwmul.vv instruction optimizations
> >>> I like the code examples in general but find them hard to read at 
> >>> lengths > 5-10 or so.  Could we condense this a bit?
> > Ok, Do I need to send V2 ? Or condense the commit log when merged 
> > the
> patch?
>
> Sure, just condense a bit. No need for V2.
>
> Regards
> Robin
>
>
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-06-03  1:51 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-01  8:32 [PATCH V2] RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv instruction optimizations juzhe.zhong
2023-06-02  7:01 ` Robin Dapp
2023-06-02  7:13   ` juzhe.zhong
2023-06-02  7:18     ` Robin Dapp
2023-06-02  7:19       ` juzhe.zhong
2023-06-03  1:10         ` Kito Cheng
2023-06-03  1:51           ` Li, Pan2

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).