[gcc(refs/vendors/riscv/heads/gcc-13-with-riscv-opts)] RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv instruction optimizations

public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed

* [gcc(refs/vendors/riscv/heads/gcc-13-with-riscv-opts)] RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv instruction optimizations
@ 2023-06-05 16:17 Jeff Law
  0 siblings, 0 replies; 2+ messages in thread
From: Jeff Law @ 2023-06-05 16:17 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:d71d810441be342a6e546fe53762acfa4f144abc

commit d71d810441be342a6e546fe53762acfa4f144abc
Author: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Date:   Thu Jun 1 16:32:12 2023 +0800

    RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv instruction optimizations
    
    This patch is to enhance vwmul.vv combine optimizations.
    Consider this following code:
    void
    vwadd_int16_t_int8_t (int16_t *__restrict dst, int16_t *__restrict dst2,
                          int16_t *__restrict dst3, int16_t *__restrict dst4,
                          int8_t *__restrict a, int8_t *__restrict b,
                          int8_t *__restrict a2, int8_t *__restrict b2, int n)
    {
      for (int i = 0; i < n; i++)
        {
          dst[i] = (int16_t) a[i] * (int16_t) b[i];
          dst2[i] = (int16_t) a2[i] * (int16_t) b[i];
          dst3[i] = (int16_t) a2[i] * (int16_t) a[i];
          dst4[i] = (int16_t) a[i] * (int16_t) b2[i];
        }
    }
    
    In such complicate case, the operand is not single used, used by multiple statements.
    GCC combine optimization will iterate the combination of the operands.
    
    Also, we add another pattern of vwmulsu.vv to enhance the vwmulsu.vv optimization.
    Currently, we have format:
    
    (mult: (sign_extend) (zero_extend)) in vector.md for intrinsics calling.
    Now, we add a new vwmulsu.ww with this format:
    (mult: (zero_extend) (sign_extend))
    
    To handle this following cases (sign and unsigned widening multiplication mixing codes):
    void
    vwadd_int16_t_int8_t (int16_t *__restrict dst, int16_t *__restrict dst2,
                          int16_t *__restrict dst3, int16_t *__restrict dst4,
                          int8_t *__restrict a, uint8_t *__restrict b,
                          uint8_t *__restrict a2, int8_t *__restrict b2, int n)
    {
      for (int i = 0; i < n; i++)
        {
          dst[i] = (int16_t) a[i] * (int16_t) b[i];
          dst2[i] = (int16_t) a2[i] * (int16_t) b[i];
          dst3[i] = (int16_t) a2[i] * (int16_t) a[i];
          dst4[i] = (int16_t) a[i] * (int16_t) b2[i];
        }
    }
    
    Before this patch:
    
    ...
            vsext.vf2       v6,v1
            add     t0,a0,t4
            vzext.vf2       v4,v1
            vmul.vv v2,v4,v6
            add     t0,a1,t4
            vzext.vf2       v2,v1
            vmul.vv v4,v2,v4
            add     t0,a2,t4
            vmul.vv v2,v2,v6
            add     t0,a3,t4
            sub     t6,t6,t1
            vsext.vf2       v2,v1
            vmul.vv v2,v2,v6
    ...
    
    After this patch:
    ...
            add     t0,a0,t3
            vwmulsu.vv      v2,v1,v3
            add     t0,a1,t3
            vwmulu.vv       v4,v3,v2
            add     t0,a2,t3
            vwmulsu.vv      v3,v1,v2
            add     t0,a3,t3
            sub     t4,t4,t1
            vwmul.vv        v2,v1,v3
    ...
    
    gcc/ChangeLog:
    
            * config/riscv/vector.md: Add vector-opt.md.
            * config/riscv/autovec-opt.md: New file.
    
    gcc/testsuite/ChangeLog:
    
            * gcc.target/riscv/rvv/autovec/widen/widen-7.c: New test.
            * gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c: New test.
            * gcc.target/riscv/rvv/autovec/widen/widen-complicate-4.c: New test.
            * gcc.target/riscv/rvv/autovec/widen/widen_run-7.c: New test.

Diff:
---
 gcc/config/riscv/autovec-opt.md                    | 80 ++++++++++++++++++++++
 gcc/config/riscv/vector.md                         |  3 +-
 .../gcc.target/riscv/rvv/autovec/widen/widen-7.c   | 27 ++++++++
 .../riscv/rvv/autovec/widen/widen-complicate-3.c   | 32 +++++++++
 .../riscv/rvv/autovec/widen/widen-complicate-4.c   | 31 +++++++++
 .../riscv/rvv/autovec/widen/widen_run-7.c          | 34 +++++++++
 6 files changed, 206 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
new file mode 100644
index 00000000000..92cdc4e9a16
--- /dev/null
+++ b/gcc/config/riscv/autovec-opt.md
@@ -0,0 +1,80 @@
+;; Machine description for optimization of RVV auto-vectorization.
+;; Copyright (C) 2023 Free Software Foundation, Inc.
+;; Contributed by Juzhe Zhong (juzhe.zhong@rivai.ai), RiVAI Technologies Ltd.
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+;; We don't have vwmul.wv instruction like vwadd.wv in RVV.
+;; This pattern is an intermediate RTL IR as a pseudo vwmul.wv to enhance
+;; optimization of instructions combine.
+(define_insn_and_split "@pred_single_widen_mul<any_extend:su><mode>"
+  [(set (match_operand:VWEXTI 0 "register_operand"                  "=&vr,&vr")
+	(if_then_else:VWEXTI
+	  (unspec:<VM>
+	    [(match_operand:<VM> 1 "vector_mask_operand"           "vmWc1,vmWc1")
+	     (match_operand 5 "vector_length_operand"              "   rK,   rK")
+	     (match_operand 6 "const_int_operand"                  "    i,    i")
+	     (match_operand 7 "const_int_operand"                  "    i,    i")
+	     (match_operand 8 "const_int_operand"                  "    i,    i")
+	     (reg:SI VL_REGNUM)
+	     (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+	  (mult:VWEXTI
+	    (any_extend:VWEXTI
+	      (match_operand:<V_DOUBLE_TRUNC> 4 "register_operand" "   vr,   vr"))
+	    (match_operand:VWEXTI 3 "register_operand"             "   vr,   vr"))
+	  (match_operand:VWEXTI 2 "vector_merge_operand"           "   vu,    0")))]
+  "TARGET_VECTOR"
+  "#"
+  "&& can_create_pseudo_p ()"
+  [(const_int 0)]
+  {
+    insn_code icode = code_for_pred_vf2 (<CODE>, <MODE>mode);
+    rtx tmp = gen_reg_rtx (<MODE>mode);
+    rtx ops[] = {tmp, operands[4]};
+    riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, ops);
+
+    emit_insn (gen_pred (MULT, <MODE>mode, operands[0], operands[1], operands[2],
+			 operands[3], tmp, operands[5], operands[6],
+			 operands[7], operands[8]));
+    DONE;
+  }
+  [(set_attr "type" "viwmul")
+   (set_attr "mode" "<MODE>")])
+
+;; This pattern it to enchance the instruction combine optimizations for complicate
+;; sign and unsigned widening multiplication operations.
+(define_insn "*pred_widen_mulsu<mode>"
+  [(set (match_operand:VWEXTI 0 "register_operand"                  "=&vr,&vr")
+	(if_then_else:VWEXTI
+	  (unspec:<VM>
+	    [(match_operand:<VM> 1 "vector_mask_operand"           "vmWc1,vmWc1")
+	     (match_operand 5 "vector_length_operand"              "   rK,   rK")
+	     (match_operand 6 "const_int_operand"                  "    i,    i")
+	     (match_operand 7 "const_int_operand"                  "    i,    i")
+	     (match_operand 8 "const_int_operand"                  "    i,    i")
+	     (reg:SI VL_REGNUM)
+	     (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+	  (mult:VWEXTI
+	    (zero_extend:VWEXTI
+	      (match_operand:<V_DOUBLE_TRUNC> 4 "register_operand" "   vr,   vr"))
+	    (sign_extend:VWEXTI
+	      (match_operand:<V_DOUBLE_TRUNC> 3 "register_operand" "   vr,   vr")))
+	  (match_operand:VWEXTI 2 "vector_merge_operand"           "   vu,    0")))]
+  "TARGET_VECTOR"
+  "vwmulsu.vv\t%0,%3,%4%p1"
+  [(set_attr "type" "viwmul")
+   (set_attr "mode" "<V_DOUBLE_TRUNC>")])
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index c74dce89db6..419853a93c1 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -24,7 +24,7 @@
 ;;
 ;; - Intrinsics (https://github.com/riscv/rvv-intrinsic-doc)
 ;; - Auto-vectorization (autovec.md)
-;; - Combine optimization (TBD)
+;; - Optimization (autovec-opt.md)
 
 (include "vector-iterators.md")
 
@@ -8422,3 +8422,4 @@
 )
 
 (include "autovec.md")
+(include "autovec-opt.md")
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-7.c
new file mode 100644
index 00000000000..cc43d9ba3fe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-7.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable" } */
+
+#include <stdint-gcc.h>
+
+#define TEST_TYPE(TYPE1, TYPE2)                                                \
+  __attribute__ ((noipa)) void vwmul_##TYPE1_##TYPE2 (TYPE1 *__restrict dst,   \
+						      TYPE2 *__restrict a,     \
+						      TYPE1 *__restrict b,     \
+						      int n)                   \
+  {                                                                            \
+    for (int i = 0; i < n; i++)                                                \
+      dst[i] = ((TYPE1) a[i]) * b[i];                                          \
+  }
+
+#define TEST_ALL()                                                             \
+  TEST_TYPE (int16_t, int8_t)                                                  \
+  TEST_TYPE (uint16_t, uint8_t)                                                \
+  TEST_TYPE (int32_t, int16_t)                                                 \
+  TEST_TYPE (uint32_t, uint16_t)                                               \
+  TEST_TYPE (int64_t, int32_t)                                                 \
+  TEST_TYPE (uint64_t, uint32_t)
+
+TEST_ALL ()
+
+/* { dg-final { scan-assembler-times {\tvsext\.vf2} 3 } } */
+/* { dg-final { scan-assembler-times {\tvzext\.vf2} 3 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c
new file mode 100644
index 00000000000..e1fd79430c3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable" } */
+
+#include <stdint-gcc.h>
+
+#define TEST_TYPE(TYPE1, TYPE2)                                                \
+  __attribute__ ((noipa)) void vwadd_##TYPE1_##TYPE2 (                         \
+    TYPE1 *__restrict dst, TYPE1 *__restrict dst2, TYPE1 *__restrict dst3,     \
+    TYPE1 *__restrict dst4, TYPE2 *__restrict a, TYPE2 *__restrict b,          \
+    TYPE2 *__restrict a2, TYPE2 *__restrict b2, int n)                         \
+  {                                                                            \
+    for (int i = 0; i < n; i++)                                                \
+      {                                                                        \
+	dst[i] = (TYPE1) a[i] * (TYPE1) b[i];                                  \
+	dst2[i] = (TYPE1) a2[i] * (TYPE1) b[i];                                \
+	dst3[i] = (TYPE1) a2[i] * (TYPE1) a[i];                                \
+	dst4[i] = (TYPE1) a[i] * (TYPE1) b2[i];                                \
+      }                                                                        \
+  }
+
+#define TEST_ALL()                                                             \
+  TEST_TYPE (int16_t, int8_t)                                                  \
+  TEST_TYPE (uint16_t, uint8_t)                                                \
+  TEST_TYPE (int32_t, int16_t)                                                 \
+  TEST_TYPE (uint32_t, uint16_t)                                               \
+  TEST_TYPE (int64_t, int32_t)                                                 \
+  TEST_TYPE (uint64_t, uint32_t)
+
+TEST_ALL ()
+
+/* { dg-final { scan-assembler-times {\tvwmul\.vv} 12 } } */
+/* { dg-final { scan-assembler-times {\tvwmulu\.vv} 12 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-4.c
new file mode 100644
index 00000000000..15fdefc550b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-4.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable" } */
+
+#include <stdint-gcc.h>
+
+#define TEST_TYPE(TYPE1, TYPE2, TYPE3)                                         \
+  __attribute__ ((noipa)) void vwadd_##TYPE1_##TYPE2 (                         \
+    TYPE1 *__restrict dst, TYPE1 *__restrict dst2, TYPE1 *__restrict dst3,     \
+    TYPE1 *__restrict dst4, TYPE2 *__restrict a, TYPE3 *__restrict b,          \
+    TYPE3 *__restrict a2, TYPE2 *__restrict b2, int n)                         \
+  {                                                                            \
+    for (int i = 0; i < n; i++)                                                \
+      {                                                                        \
+	dst[i] = (TYPE1) a[i] * (TYPE1) b[i];                                  \
+	dst2[i] = (TYPE1) a2[i] * (TYPE1) b[i];                                \
+	dst3[i] = (TYPE1) a2[i] * (TYPE1) a[i];                                \
+	dst4[i] = (TYPE1) a[i] * (TYPE1) b2[i];                                \
+      }                                                                        \
+  }
+
+#define TEST_ALL()                                                             \
+  TEST_TYPE (int16_t, int8_t, uint8_t)                                         \
+  TEST_TYPE (int32_t, int16_t, uint16_t)                                       \
+  TEST_TYPE (int64_t, int32_t, uint32_t)
+
+TEST_ALL ()
+
+/* { dg-final { scan-assembler-times {\tvwmulsu\.vv} 6 } } */
+/* { dg-final { scan-assembler-times {\tvwmul\.vv} 3 } } */
+/* { dg-final { scan-assembler-times {\tvwmulu\.vv} 3 } } */
+/* { dg-final { scan-assembler-not {\tvmul} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-7.c
new file mode 100644
index 00000000000..4abddd5d718
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-7.c
@@ -0,0 +1,34 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable" } */
+
+#include <assert.h>
+#include "widen-7.c"
+
+#define SZ 512
+
+#define RUN(TYPE1, TYPE2, LIMIT)                                               \
+  TYPE2 a##TYPE2[SZ];                                                          \
+  TYPE1 b##TYPE1[SZ];                                                          \
+  TYPE1 dst##TYPE1[SZ];                                                        \
+  for (int i = 0; i < SZ; i++)                                                 \
+    {                                                                          \
+      a##TYPE2[i] = LIMIT + i % LIMIT;                                         \
+      b##TYPE1[i] = LIMIT + i & LIMIT;                                         \
+    }                                                                          \
+  vwmul_##TYPE1_##TYPE2 (dst##TYPE1, a##TYPE2, b##TYPE1, SZ);                  \
+  for (int i = 0; i < SZ; i++)                                                 \
+    assert (dst##TYPE1[i] == (((TYPE1) a##TYPE2[i]) * b##TYPE1[i]));
+
+#define RUN_ALL()                                                              \
+  RUN (int16_t, int8_t, -128)                                                  \
+  RUN (uint16_t, uint8_t, 255)                                                 \
+  RUN (int32_t, int16_t, -32768)                                               \
+  RUN (uint32_t, uint16_t, 65535)                                              \
+  RUN (int64_t, int32_t, -2147483648)                                          \
+  RUN (uint64_t, uint32_t, 4294967295)
+
+int
+main ()
+{
+  RUN_ALL ()
+}

^ permalink raw reply	[flat|nested] 2+ messages in thread

* [gcc(refs/vendors/riscv/heads/gcc-13-with-riscv-opts)] RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv instruction optimizations
@ 2023-07-14  2:42 Jeff Law
  0 siblings, 0 replies; 2+ messages in thread
From: Jeff Law @ 2023-07-14  2:42 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:37bfc9b75e9697a8bc4eb6dd87629781045d7ede

commit 37bfc9b75e9697a8bc4eb6dd87629781045d7ede
Author: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Date:   Thu Jun 1 16:32:12 2023 +0800

    RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv instruction optimizations
    
    This patch is to enhance vwmul.vv combine optimizations.
    Consider this following code:
    void
    vwadd_int16_t_int8_t (int16_t *__restrict dst, int16_t *__restrict dst2,
                          int16_t *__restrict dst3, int16_t *__restrict dst4,
                          int8_t *__restrict a, int8_t *__restrict b,
                          int8_t *__restrict a2, int8_t *__restrict b2, int n)
    {
      for (int i = 0; i < n; i++)
        {
          dst[i] = (int16_t) a[i] * (int16_t) b[i];
          dst2[i] = (int16_t) a2[i] * (int16_t) b[i];
          dst3[i] = (int16_t) a2[i] * (int16_t) a[i];
          dst4[i] = (int16_t) a[i] * (int16_t) b2[i];
        }
    }
    
    In such complicate case, the operand is not single used, used by multiple statements.
    GCC combine optimization will iterate the combination of the operands.
    
    Also, we add another pattern of vwmulsu.vv to enhance the vwmulsu.vv optimization.
    Currently, we have format:
    
    (mult: (sign_extend) (zero_extend)) in vector.md for intrinsics calling.
    Now, we add a new vwmulsu.ww with this format:
    (mult: (zero_extend) (sign_extend))
    
    To handle this following cases (sign and unsigned widening multiplication mixing codes):
    void
    vwadd_int16_t_int8_t (int16_t *__restrict dst, int16_t *__restrict dst2,
                          int16_t *__restrict dst3, int16_t *__restrict dst4,
                          int8_t *__restrict a, uint8_t *__restrict b,
                          uint8_t *__restrict a2, int8_t *__restrict b2, int n)
    {
      for (int i = 0; i < n; i++)
        {
          dst[i] = (int16_t) a[i] * (int16_t) b[i];
          dst2[i] = (int16_t) a2[i] * (int16_t) b[i];
          dst3[i] = (int16_t) a2[i] * (int16_t) a[i];
          dst4[i] = (int16_t) a[i] * (int16_t) b2[i];
        }
    }
    
    Before this patch:
    
    ...
            vsext.vf2       v6,v1
            add     t0,a0,t4
            vzext.vf2       v4,v1
            vmul.vv v2,v4,v6
            add     t0,a1,t4
            vzext.vf2       v2,v1
            vmul.vv v4,v2,v4
            add     t0,a2,t4
            vmul.vv v2,v2,v6
            add     t0,a3,t4
            sub     t6,t6,t1
            vsext.vf2       v2,v1
            vmul.vv v2,v2,v6
    ...
    
    After this patch:
    ...
            add     t0,a0,t3
            vwmulsu.vv      v2,v1,v3
            add     t0,a1,t3
            vwmulu.vv       v4,v3,v2
            add     t0,a2,t3
            vwmulsu.vv      v3,v1,v2
            add     t0,a3,t3
            sub     t4,t4,t1
            vwmul.vv        v2,v1,v3
    ...
    
    gcc/ChangeLog:
    
            * config/riscv/vector.md: Add vector-opt.md.
            * config/riscv/autovec-opt.md: New file.
    
    gcc/testsuite/ChangeLog:
    
            * gcc.target/riscv/rvv/autovec/widen/widen-7.c: New test.
            * gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c: New test.
            * gcc.target/riscv/rvv/autovec/widen/widen-complicate-4.c: New test.
            * gcc.target/riscv/rvv/autovec/widen/widen_run-7.c: New test.

Diff:
---
 gcc/config/riscv/autovec-opt.md                    | 80 ++++++++++++++++++++++
 gcc/config/riscv/vector.md                         |  3 +-
 .../gcc.target/riscv/rvv/autovec/widen/widen-7.c   | 27 ++++++++
 .../riscv/rvv/autovec/widen/widen-complicate-3.c   | 32 +++++++++
 .../riscv/rvv/autovec/widen/widen-complicate-4.c   | 31 +++++++++
 .../riscv/rvv/autovec/widen/widen_run-7.c          | 34 +++++++++
 6 files changed, 206 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
new file mode 100644
index 00000000000..92cdc4e9a16
--- /dev/null
+++ b/gcc/config/riscv/autovec-opt.md
@@ -0,0 +1,80 @@
+;; Machine description for optimization of RVV auto-vectorization.
+;; Copyright (C) 2023 Free Software Foundation, Inc.
+;; Contributed by Juzhe Zhong (juzhe.zhong@rivai.ai), RiVAI Technologies Ltd.
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+;; We don't have vwmul.wv instruction like vwadd.wv in RVV.
+;; This pattern is an intermediate RTL IR as a pseudo vwmul.wv to enhance
+;; optimization of instructions combine.
+(define_insn_and_split "@pred_single_widen_mul<any_extend:su><mode>"
+  [(set (match_operand:VWEXTI 0 "register_operand"                  "=&vr,&vr")
+	(if_then_else:VWEXTI
+	  (unspec:<VM>
+	    [(match_operand:<VM> 1 "vector_mask_operand"           "vmWc1,vmWc1")
+	     (match_operand 5 "vector_length_operand"              "   rK,   rK")
+	     (match_operand 6 "const_int_operand"                  "    i,    i")
+	     (match_operand 7 "const_int_operand"                  "    i,    i")
+	     (match_operand 8 "const_int_operand"                  "    i,    i")
+	     (reg:SI VL_REGNUM)
+	     (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+	  (mult:VWEXTI
+	    (any_extend:VWEXTI
+	      (match_operand:<V_DOUBLE_TRUNC> 4 "register_operand" "   vr,   vr"))
+	    (match_operand:VWEXTI 3 "register_operand"             "   vr,   vr"))
+	  (match_operand:VWEXTI 2 "vector_merge_operand"           "   vu,    0")))]
+  "TARGET_VECTOR"
+  "#"
+  "&& can_create_pseudo_p ()"
+  [(const_int 0)]
+  {
+    insn_code icode = code_for_pred_vf2 (<CODE>, <MODE>mode);
+    rtx tmp = gen_reg_rtx (<MODE>mode);
+    rtx ops[] = {tmp, operands[4]};
+    riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, ops);
+
+    emit_insn (gen_pred (MULT, <MODE>mode, operands[0], operands[1], operands[2],
+			 operands[3], tmp, operands[5], operands[6],
+			 operands[7], operands[8]));
+    DONE;
+  }
+  [(set_attr "type" "viwmul")
+   (set_attr "mode" "<MODE>")])
+
+;; This pattern it to enchance the instruction combine optimizations for complicate
+;; sign and unsigned widening multiplication operations.
+(define_insn "*pred_widen_mulsu<mode>"
+  [(set (match_operand:VWEXTI 0 "register_operand"                  "=&vr,&vr")
+	(if_then_else:VWEXTI
+	  (unspec:<VM>
+	    [(match_operand:<VM> 1 "vector_mask_operand"           "vmWc1,vmWc1")
+	     (match_operand 5 "vector_length_operand"              "   rK,   rK")
+	     (match_operand 6 "const_int_operand"                  "    i,    i")
+	     (match_operand 7 "const_int_operand"                  "    i,    i")
+	     (match_operand 8 "const_int_operand"                  "    i,    i")
+	     (reg:SI VL_REGNUM)
+	     (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+	  (mult:VWEXTI
+	    (zero_extend:VWEXTI
+	      (match_operand:<V_DOUBLE_TRUNC> 4 "register_operand" "   vr,   vr"))
+	    (sign_extend:VWEXTI
+	      (match_operand:<V_DOUBLE_TRUNC> 3 "register_operand" "   vr,   vr")))
+	  (match_operand:VWEXTI 2 "vector_merge_operand"           "   vu,    0")))]
+  "TARGET_VECTOR"
+  "vwmulsu.vv\t%0,%3,%4%p1"
+  [(set_attr "type" "viwmul")
+   (set_attr "mode" "<V_DOUBLE_TRUNC>")])
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index c74dce89db6..419853a93c1 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -24,7 +24,7 @@
 ;;
 ;; - Intrinsics (https://github.com/riscv/rvv-intrinsic-doc)
 ;; - Auto-vectorization (autovec.md)
-;; - Combine optimization (TBD)
+;; - Optimization (autovec-opt.md)
 
 (include "vector-iterators.md")
 
@@ -8422,3 +8422,4 @@
 )
 
 (include "autovec.md")
+(include "autovec-opt.md")
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-7.c
new file mode 100644
index 00000000000..cc43d9ba3fe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-7.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable" } */
+
+#include <stdint-gcc.h>
+
+#define TEST_TYPE(TYPE1, TYPE2)                                                \
+  __attribute__ ((noipa)) void vwmul_##TYPE1_##TYPE2 (TYPE1 *__restrict dst,   \
+						      TYPE2 *__restrict a,     \
+						      TYPE1 *__restrict b,     \
+						      int n)                   \
+  {                                                                            \
+    for (int i = 0; i < n; i++)                                                \
+      dst[i] = ((TYPE1) a[i]) * b[i];                                          \
+  }
+
+#define TEST_ALL()                                                             \
+  TEST_TYPE (int16_t, int8_t)                                                  \
+  TEST_TYPE (uint16_t, uint8_t)                                                \
+  TEST_TYPE (int32_t, int16_t)                                                 \
+  TEST_TYPE (uint32_t, uint16_t)                                               \
+  TEST_TYPE (int64_t, int32_t)                                                 \
+  TEST_TYPE (uint64_t, uint32_t)
+
+TEST_ALL ()
+
+/* { dg-final { scan-assembler-times {\tvsext\.vf2} 3 } } */
+/* { dg-final { scan-assembler-times {\tvzext\.vf2} 3 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c
new file mode 100644
index 00000000000..e1fd79430c3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable" } */
+
+#include <stdint-gcc.h>
+
+#define TEST_TYPE(TYPE1, TYPE2)                                                \
+  __attribute__ ((noipa)) void vwadd_##TYPE1_##TYPE2 (                         \
+    TYPE1 *__restrict dst, TYPE1 *__restrict dst2, TYPE1 *__restrict dst3,     \
+    TYPE1 *__restrict dst4, TYPE2 *__restrict a, TYPE2 *__restrict b,          \
+    TYPE2 *__restrict a2, TYPE2 *__restrict b2, int n)                         \
+  {                                                                            \
+    for (int i = 0; i < n; i++)                                                \
+      {                                                                        \
+	dst[i] = (TYPE1) a[i] * (TYPE1) b[i];                                  \
+	dst2[i] = (TYPE1) a2[i] * (TYPE1) b[i];                                \
+	dst3[i] = (TYPE1) a2[i] * (TYPE1) a[i];                                \
+	dst4[i] = (TYPE1) a[i] * (TYPE1) b2[i];                                \
+      }                                                                        \
+  }
+
+#define TEST_ALL()                                                             \
+  TEST_TYPE (int16_t, int8_t)                                                  \
+  TEST_TYPE (uint16_t, uint8_t)                                                \
+  TEST_TYPE (int32_t, int16_t)                                                 \
+  TEST_TYPE (uint32_t, uint16_t)                                               \
+  TEST_TYPE (int64_t, int32_t)                                                 \
+  TEST_TYPE (uint64_t, uint32_t)
+
+TEST_ALL ()
+
+/* { dg-final { scan-assembler-times {\tvwmul\.vv} 12 } } */
+/* { dg-final { scan-assembler-times {\tvwmulu\.vv} 12 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-4.c
new file mode 100644
index 00000000000..15fdefc550b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-4.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable" } */
+
+#include <stdint-gcc.h>
+
+#define TEST_TYPE(TYPE1, TYPE2, TYPE3)                                         \
+  __attribute__ ((noipa)) void vwadd_##TYPE1_##TYPE2 (                         \
+    TYPE1 *__restrict dst, TYPE1 *__restrict dst2, TYPE1 *__restrict dst3,     \
+    TYPE1 *__restrict dst4, TYPE2 *__restrict a, TYPE3 *__restrict b,          \
+    TYPE3 *__restrict a2, TYPE2 *__restrict b2, int n)                         \
+  {                                                                            \
+    for (int i = 0; i < n; i++)                                                \
+      {                                                                        \
+	dst[i] = (TYPE1) a[i] * (TYPE1) b[i];                                  \
+	dst2[i] = (TYPE1) a2[i] * (TYPE1) b[i];                                \
+	dst3[i] = (TYPE1) a2[i] * (TYPE1) a[i];                                \
+	dst4[i] = (TYPE1) a[i] * (TYPE1) b2[i];                                \
+      }                                                                        \
+  }
+
+#define TEST_ALL()                                                             \
+  TEST_TYPE (int16_t, int8_t, uint8_t)                                         \
+  TEST_TYPE (int32_t, int16_t, uint16_t)                                       \
+  TEST_TYPE (int64_t, int32_t, uint32_t)
+
+TEST_ALL ()
+
+/* { dg-final { scan-assembler-times {\tvwmulsu\.vv} 6 } } */
+/* { dg-final { scan-assembler-times {\tvwmul\.vv} 3 } } */
+/* { dg-final { scan-assembler-times {\tvwmulu\.vv} 3 } } */
+/* { dg-final { scan-assembler-not {\tvmul} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-7.c
new file mode 100644
index 00000000000..4abddd5d718
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-7.c
@@ -0,0 +1,34 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable" } */
+
+#include <assert.h>
+#include "widen-7.c"
+
+#define SZ 512
+
+#define RUN(TYPE1, TYPE2, LIMIT)                                               \
+  TYPE2 a##TYPE2[SZ];                                                          \
+  TYPE1 b##TYPE1[SZ];                                                          \
+  TYPE1 dst##TYPE1[SZ];                                                        \
+  for (int i = 0; i < SZ; i++)                                                 \
+    {                                                                          \
+      a##TYPE2[i] = LIMIT + i % LIMIT;                                         \
+      b##TYPE1[i] = LIMIT + i & LIMIT;                                         \
+    }                                                                          \
+  vwmul_##TYPE1_##TYPE2 (dst##TYPE1, a##TYPE2, b##TYPE1, SZ);                  \
+  for (int i = 0; i < SZ; i++)                                                 \
+    assert (dst##TYPE1[i] == (((TYPE1) a##TYPE2[i]) * b##TYPE1[i]));
+
+#define RUN_ALL()                                                              \
+  RUN (int16_t, int8_t, -128)                                                  \
+  RUN (uint16_t, uint8_t, 255)                                                 \
+  RUN (int32_t, int16_t, -32768)                                               \
+  RUN (uint32_t, uint16_t, 65535)                                              \
+  RUN (int64_t, int32_t, -2147483648)                                          \
+  RUN (uint64_t, uint32_t, 4294967295)
+
+int
+main ()
+{
+  RUN_ALL ()
+}

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2023-07-14  2:42 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-05 16:17 [gcc(refs/vendors/riscv/heads/gcc-13-with-riscv-opts)] RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv instruction optimizations Jeff Law
2023-07-14  2:42 Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).