[PATCH] RISC-V: Implement vector "average" autovec pattern.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] RISC-V: Implement vector "average" autovec pattern.
@ 2023-08-01 14:31 Robin Dapp
  2023-08-02 14:03 ` 钟居哲
  0 siblings, 1 reply; 6+ messages in thread
From: Robin Dapp @ 2023-08-01 14:31 UTC (permalink / raw)
  To: gcc-patches, palmer, Kito Cheng, juzhe.zhong, jeffreyalaw; +Cc: rdapp.gcc

Hi,

this patch adds vector average patterns

 op[0] = (narrow) ((wide) op[1] + (wide) op[2]) >> 1;
 op[0] = (narrow) ((wide) op[1] + (wide) op[2] + 1) >> 1;

If there is no direct support, the vectorizer can synthesize the patterns
but, presumably due to lack of narrowing operation support, won't try
a narrowing shift.  Therefore, this patch implements the expanders instead.

A synthesized pattern results in e.g:
	vsrl.vi	v2,v1,1
	vsrl.vi	v4,v3,1
	vand.vv	v1,v1,v3
	vadd.vv	v2,v2,v4
	vand.vi	v1,v1,1
	vadd.vv	v1,v2,v1

With this patch we generate:
	vwadd.vv	v2,v4,v1
	vadd.vi		v2,1
	vnsrl.wi	v2,v2,1

We manage to recover (i.e. create the latter sequence) for signed types
but not for unsigned.  I figured that offering both patterns might be the
safe thing to do but open to leaving the signed one out.  In the long
term we'd want full vectorizer support for this I suppose.

Regards
 Robin

gcc/ChangeLog:

	* config/riscv/autovec.md (<u>avg<v_double_trunc>3_floor):
	Implement expander.
	(<u>avg<v_double_trunc>3_ceil): Ditto.
	* config/riscv/vector-iterators.md (ashiftrt): New iterator.
	(ASHIFTRT): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/autovec/vec-avg-run.c: New test.
	* gcc.target/riscv/rvv/autovec/vec-avg-rv32gcv.c: New test.
	* gcc.target/riscv/rvv/autovec/vec-avg-rv64gcv.c: New test.
	* gcc.target/riscv/rvv/autovec/vec-avg-template.h: New test.
---
 gcc/config/riscv/autovec.md                   | 66 ++++++++++++++
 gcc/config/riscv/vector-iterators.md          |  5 ++
 .../riscv/rvv/autovec/vec-avg-run.c           | 85 +++++++++++++++++++
 .../riscv/rvv/autovec/vec-avg-rv32gcv.c       | 10 +++
 .../riscv/rvv/autovec/vec-avg-rv64gcv.c       | 10 +++
 .../riscv/rvv/autovec/vec-avg-template.h      | 33 +++++++
 6 files changed, 209 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-run.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-rv32gcv.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-rv64gcv.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-template.h

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 7b784437c7e..23d3c2feaff 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1752,3 +1752,69 @@ (define_expand "mask_len_fold_left_plus_<mode>"
 				    riscv_vector::reduction_type::MASK_LEN_FOLD_LEFT);
   DONE;
 })
+
+;; -------------------------------------------------------------------------
+;; ---- [INT] Average.
+;; -------------------------------------------------------------------------
+;; Implements the following "average" patterns:
+;; floor:
+;;  op[0] = (narrow) ((wide) op[1] + (wide) op[2]) >> 1;
+;; ceil:
+;;  op[0] = (narrow) ((wide) op[1] + (wide) op[2] + 1)) >> 1;
+;; -------------------------------------------------------------------------
+
+(define_expand "<u>avg<v_double_trunc>3_floor"
+ [(set (match_operand:<V_DOUBLE_TRUNC> 0 "register_operand")
+   (truncate:<V_DOUBLE_TRUNC>
+    (<ext_to_rshift>:VWEXTI
+     (plus:VWEXTI
+      (any_extend:VWEXTI
+       (match_operand:<V_DOUBLE_TRUNC> 1 "register_operand"))
+      (any_extend:VWEXTI
+       (match_operand:<V_DOUBLE_TRUNC> 2 "register_operand"))))))]
+  "TARGET_VECTOR"
+{
+  /* First emit a widening addition.  */
+  rtx tmp1 = gen_reg_rtx (<MODE>mode);
+  rtx ops1[] = {tmp1, operands[1], operands[2]};
+  insn_code icode = code_for_pred_dual_widen (PLUS, <CODE>, <MODE>mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops1);
+
+  /* Then a narrowing shift.  */
+  rtx ops2[] = {operands[0], tmp1, const1_rtx};
+  icode = code_for_pred_narrow_scalar (<EXT_TO_RSHIFT>, <MODE>mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops2);
+  DONE;
+})
+
+(define_expand "<u>avg<v_double_trunc>3_ceil"
+ [(set (match_operand:<V_DOUBLE_TRUNC> 0 "register_operand")
+   (truncate:<V_DOUBLE_TRUNC>
+    (<ext_to_rshift>:VWEXTI
+     (plus:VWEXTI
+      (plus:VWEXTI
+       (any_extend:VWEXTI
+	(match_operand:<V_DOUBLE_TRUNC> 1 "register_operand"))
+       (any_extend:VWEXTI
+	(match_operand:<V_DOUBLE_TRUNC> 2 "register_operand")))
+      (const_int 1)))))]
+  "TARGET_VECTOR"
+{
+  /* First emit a widening addition.  */
+  rtx tmp1 = gen_reg_rtx (<MODE>mode);
+  rtx ops1[] = {tmp1, operands[1], operands[2]};
+  insn_code icode = code_for_pred_dual_widen (PLUS, <CODE>, <MODE>mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops1);
+
+  /* Then add 1.  */
+  rtx tmp2 = gen_reg_rtx (<MODE>mode);
+  rtx ops2[] = {tmp2, tmp1, const1_rtx};
+  icode = code_for_pred_scalar (PLUS, <MODE>mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops2);
+
+  /* Finally, a narrowing shift.  */
+  rtx ops3[] = {operands[0], tmp2, const1_rtx};
+  icode = code_for_pred_narrow_scalar (<EXT_TO_RSHIFT>, <MODE>mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops3);
+  DONE;
+})
diff --git a/gcc/config/riscv/vector-iterators.md b/gcc/config/riscv/vector-iterators.md
index 37c6337f1a3..409f63332c9 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -1968,6 +1968,11 @@ (define_code_attr macc_msac [(plus "macc") (minus "msac")])
 (define_code_attr nmsub_nmadd [(plus "nmsub") (minus "nmadd")])
 (define_code_attr nmsac_nmacc [(plus "nmsac") (minus "nmacc")])
 
+(define_code_attr ext_to_rshift [(sign_extend "ashiftrt")
+                                 (zero_extend "lshiftrt")])
+(define_code_attr EXT_TO_RSHIFT [(sign_extend "ASHIFTRT")
+                                 (zero_extend "LSHIFTRT")])
+
 (define_code_iterator and_ior [and ior])
 
 (define_code_iterator any_float_binop [plus mult minus div])
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-run.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-run.c
new file mode 100644
index 00000000000..7ca193ec2f2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-run.c
@@ -0,0 +1,85 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "-std=c99 -fno-vect-cost-model --param=riscv-autovec-preference=scalable -lm" } */
+
+#include <limits.h>
+#include <math.h>
+#include <assert.h>
+
+#include "vec-avg-template.h"
+
+#define SZ 256
+
+#define RUNS1(TYPE, SCALE)                                                     \
+  TYPE a##TYPE[SZ + 1];                                                        \
+  TYPE b##TYPE[SZ + 1];                                                        \
+  TYPE dst##TYPE[SZ + 1];                                                      \
+  for (int cnt = 0, i = -(SZ * SCALE) / 2; i < (SZ * SCALE) / 2; i += SCALE)   \
+    {                                                                          \
+      a##TYPE[cnt] = i;                                                        \
+      b##TYPE[cnt] = i + 1;                                                    \
+      dst##TYPE[cnt++] = 0;                                                    \
+    }                                                                          \
+  vavg_##TYPE (dst##TYPE, a##TYPE, b##TYPE, SZ);                               \
+  for (int i = 0; i < SZ; i += SCALE)                                          \
+    assert (dst##TYPE[i] == floor ((a##TYPE[i] + b##TYPE[i]) / 2.0));
+
+#define RUNU1(TYPE, SCALE)                                                     \
+  TYPE a##TYPE[SZ + 1];                                                        \
+  TYPE b##TYPE[SZ + 1];                                                        \
+  TYPE dst##TYPE[SZ + 1];                                                      \
+  for (int cnt = 0, i = 0; i < (SZ * SCALE); i += SCALE)                       \
+    {                                                                          \
+      a##TYPE[cnt] = i;                                                        \
+      b##TYPE[cnt] = i + 1;                                                    \
+      dst##TYPE[cnt++] = 0;                                                    \
+    }                                                                          \
+  vavg_##TYPE (dst##TYPE, a##TYPE, b##TYPE, SZ);                               \
+  for (int i = 0; i < SZ; i += SCALE)                                          \
+    assert (dst##TYPE[i] == floor ((a##TYPE[i] + b##TYPE[i]) / 2.0));
+
+#define RUNS2(TYPE, SCALE)                                                     \
+  TYPE a2##TYPE[SZ + 1];                                                       \
+  TYPE b2##TYPE[SZ + 1];                                                       \
+  TYPE dst2##TYPE[SZ + 1];                                                     \
+  for (int cnt = 0, i = -(SZ * SCALE) / 2; i < (SZ * SCALE) / 2; i += SCALE)   \
+    {                                                                          \
+      a2##TYPE[cnt] = i;                                                       \
+      b2##TYPE[cnt] = i + 1;                                                   \
+      dst2##TYPE[cnt++] = 0;                                                   \
+    }                                                                          \
+  vavg2_##TYPE (dst2##TYPE, a2##TYPE, b2##TYPE, SZ);                           \
+  for (int i = 0; i < SZ; i += SCALE)                                          \
+    assert (dst2##TYPE[i] == ceil ((a2##TYPE[i] + b2##TYPE[i]) / 2.0));
+
+#define RUNU2(TYPE, SCALE)                                                     \
+  TYPE a2##TYPE[SZ + 1];                                                       \
+  TYPE b2##TYPE[SZ + 1];                                                       \
+  TYPE dst2##TYPE[SZ + 1];                                                     \
+  for (int cnt = 0, i = 0; i < (SZ * SCALE); i += SCALE)                       \
+    {                                                                          \
+      a2##TYPE[cnt] = i;                                                       \
+      b2##TYPE[cnt] = i + 1;                                                   \
+      dst2##TYPE[cnt++] = 0;                                                   \
+    }                                                                          \
+  vavg2_##TYPE (dst2##TYPE, a2##TYPE, b2##TYPE, SZ);                           \
+  for (int i = 0; i < SZ; i += SCALE)                                          \
+    assert (dst2##TYPE[i] == ceil ((a2##TYPE[i] + b2##TYPE[i]) / 2.0));
+
+#define RUN_ALL()                                                              \
+  RUNS1 (int8_t, 1)                                                            \
+  RUNS1 (int16_t, 256)                                                         \
+  RUNS1 (int32_t, 65536)                                                       \
+  RUNU1 (uint8_t, 1)                                                           \
+  RUNU1 (uint16_t, 256)                                                        \
+  RUNU1 (uint32_t, 65536)                                                      \
+  RUNS2 (int8_t, 1)                                                            \
+  RUNS2 (int16_t, 256)                                                         \
+  RUNS2 (int32_t, 65536)                                                       \
+  RUNU2 (uint8_t, 1)                                                           \
+  RUNU2 (uint16_t, 256)                                                        \
+  RUNU2 (uint32_t, 65536)\
+
+int main ()
+{
+  RUN_ALL ()
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-rv32gcv.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-rv32gcv.c
new file mode 100644
index 00000000000..e2754339d94
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-rv32gcv.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-std=c99 -fno-vect-cost-model -march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable" } */
+
+#include "vec-avg-template.h"
+
+/* { dg-final { scan-assembler-times {\tvwadd\.vv} 6 } } */
+/* { dg-final { scan-assembler-times {\tvwaddu\.vv} 6 } } */
+/* { dg-final { scan-assembler-times {\tvadd\.vi} 6 } } */
+/* { dg-final { scan-assembler-times {\tvnsrl.wi} 6 } } */
+/* { dg-final { scan-assembler-times {\tvnsra.wi} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-rv64gcv.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-rv64gcv.c
new file mode 100644
index 00000000000..210c0dc5460
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-rv64gcv.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-std=c99 -fno-vect-cost-model -march=rv64gcv_zvfh -mabi=lp64d --param=riscv-autovec-preference=scalable" } */
+
+#include "vec-avg-template.h"
+
+/* { dg-final { scan-assembler-times {\tvwadd\.vv} 6 } } */
+/* { dg-final { scan-assembler-times {\tvwaddu\.vv} 6 } } */
+/* { dg-final { scan-assembler-times {\tvadd\.vi} 6 } } */
+/* { dg-final { scan-assembler-times {\tvnsrl\.wi} 6 } } */
+/* { dg-final { scan-assembler-times {\tvnsra\.wi} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-template.h b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-template.h
new file mode 100644
index 00000000000..9c2a6f1b9cb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-template.h
@@ -0,0 +1,33 @@
+#include <stdint-gcc.h>
+
+#define TEST_TYPE(TYPE, TYPE2)                                                 \
+  __attribute__ ((noipa)) void vavg_##TYPE (TYPE *dst, TYPE *a, TYPE *b,       \
+					    int n)                             \
+  {                                                                            \
+    for (int i = 0; i < n; i++)                                                \
+      dst[i] = ((TYPE2) a[i] + b[i]) >> 1;                                     \
+  }
+
+#define TEST_TYPE2(TYPE, TYPE2)                                                \
+  __attribute__ ((noipa)) void vavg2_##TYPE (TYPE *dst, TYPE *a, TYPE *b,      \
+					     int n)                            \
+  {                                                                            \
+    for (int i = 0; i < n; i++)                                                \
+      dst[i] = ((TYPE2) a[i] + b[i] + 1) >> 1;                                 \
+  }
+
+#define TEST_ALL()                                                             \
+  TEST_TYPE (int8_t, int16_t)                                                  \
+  TEST_TYPE (uint8_t, uint16_t)                                                \
+  TEST_TYPE (int16_t, int32_t)                                                 \
+  TEST_TYPE (uint16_t, uint32_t)                                               \
+  TEST_TYPE (int32_t, int64_t)                                                 \
+  TEST_TYPE (uint32_t, uint64_t)                                               \
+  TEST_TYPE2 (int8_t, int16_t)                                                 \
+  TEST_TYPE2 (uint8_t, uint16_t)                                               \
+  TEST_TYPE2 (int16_t, int32_t)                                                \
+  TEST_TYPE2 (uint16_t, uint32_t)                                              \
+  TEST_TYPE2 (int32_t, int64_t)                                                \
+  TEST_TYPE2 (uint32_t, uint64_t)
+
+TEST_ALL()
-- 
2.41.0

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] RISC-V: Implement vector "average" autovec pattern.
  2023-08-01 14:31 [PATCH] RISC-V: Implement vector "average" autovec pattern Robin Dapp
@ 2023-08-02 14:03 ` 钟居哲
  2023-08-02 18:49   ` Robin Dapp
  0 siblings, 1 reply; 6+ messages in thread
From: 钟居哲 @ 2023-08-02 14:03 UTC (permalink / raw)
  To: rdapp.gcc, gcc-patches, palmer, kito.cheng, Jeff Law; +Cc: rdapp.gcc

[-- Attachment #1: Type: text/plain, Size: 16109 bytes --]

I am concerning:

1. How do you model round to +Inf (avg_floor) and round to -Inf (avg_ceil) ?
2. Is it possible we could use vaadd[u] to model avg ?



juzhe.zhong@rivai.ai
 
From: Robin Dapp
Date: 2023-08-01 22:31
To: gcc-patches; palmer; Kito Cheng; juzhe.zhong@rivai.ai; jeffreyalaw
CC: rdapp.gcc
Subject: [PATCH] RISC-V: Implement vector "average" autovec pattern.
Hi,
 
this patch adds vector average patterns
 
op[0] = (narrow) ((wide) op[1] + (wide) op[2]) >> 1;
op[0] = (narrow) ((wide) op[1] + (wide) op[2] + 1) >> 1;
 
If there is no direct support, the vectorizer can synthesize the patterns
but, presumably due to lack of narrowing operation support, won't try
a narrowing shift.  Therefore, this patch implements the expanders instead.
 
A synthesized pattern results in e.g:
vsrl.vi v2,v1,1
vsrl.vi v4,v3,1
vand.vv v1,v1,v3
vadd.vv v2,v2,v4
vand.vi v1,v1,1
vadd.vv v1,v2,v1
 
With this patch we generate:
vwadd.vv v2,v4,v1
vadd.vi v2,1
vnsrl.wi v2,v2,1
 
We manage to recover (i.e. create the latter sequence) for signed types
but not for unsigned.  I figured that offering both patterns might be the
safe thing to do but open to leaving the signed one out.  In the long
term we'd want full vectorizer support for this I suppose.
 
Regards
Robin
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (<u>avg<v_double_trunc>3_floor):
Implement expander.
(<u>avg<v_double_trunc>3_ceil): Ditto.
* config/riscv/vector-iterators.md (ashiftrt): New iterator.
(ASHIFTRT): Ditto.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/vec-avg-run.c: New test.
* gcc.target/riscv/rvv/autovec/vec-avg-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/vec-avg-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/vec-avg-template.h: New test.
---
gcc/config/riscv/autovec.md                   | 66 ++++++++++++++
gcc/config/riscv/vector-iterators.md          |  5 ++
.../riscv/rvv/autovec/vec-avg-run.c           | 85 +++++++++++++++++++
.../riscv/rvv/autovec/vec-avg-rv32gcv.c       | 10 +++
.../riscv/rvv/autovec/vec-avg-rv64gcv.c       | 10 +++
.../riscv/rvv/autovec/vec-avg-template.h      | 33 +++++++
6 files changed, 209 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-run.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-rv32gcv.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-rv64gcv.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-template.h
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 7b784437c7e..23d3c2feaff 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1752,3 +1752,69 @@ (define_expand "mask_len_fold_left_plus_<mode>"
    riscv_vector::reduction_type::MASK_LEN_FOLD_LEFT);
   DONE;
})
+
+;; -------------------------------------------------------------------------
+;; ---- [INT] Average.
+;; -------------------------------------------------------------------------
+;; Implements the following "average" patterns:
+;; floor:
+;;  op[0] = (narrow) ((wide) op[1] + (wide) op[2]) >> 1;
+;; ceil:
+;;  op[0] = (narrow) ((wide) op[1] + (wide) op[2] + 1)) >> 1;
+;; -------------------------------------------------------------------------
+
+(define_expand "<u>avg<v_double_trunc>3_floor"
+ [(set (match_operand:<V_DOUBLE_TRUNC> 0 "register_operand")
+   (truncate:<V_DOUBLE_TRUNC>
+    (<ext_to_rshift>:VWEXTI
+     (plus:VWEXTI
+      (any_extend:VWEXTI
+       (match_operand:<V_DOUBLE_TRUNC> 1 "register_operand"))
+      (any_extend:VWEXTI
+       (match_operand:<V_DOUBLE_TRUNC> 2 "register_operand"))))))]
+  "TARGET_VECTOR"
+{
+  /* First emit a widening addition.  */
+  rtx tmp1 = gen_reg_rtx (<MODE>mode);
+  rtx ops1[] = {tmp1, operands[1], operands[2]};
+  insn_code icode = code_for_pred_dual_widen (PLUS, <CODE>, <MODE>mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops1);
+
+  /* Then a narrowing shift.  */
+  rtx ops2[] = {operands[0], tmp1, const1_rtx};
+  icode = code_for_pred_narrow_scalar (<EXT_TO_RSHIFT>, <MODE>mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops2);
+  DONE;
+})
+
+(define_expand "<u>avg<v_double_trunc>3_ceil"
+ [(set (match_operand:<V_DOUBLE_TRUNC> 0 "register_operand")
+   (truncate:<V_DOUBLE_TRUNC>
+    (<ext_to_rshift>:VWEXTI
+     (plus:VWEXTI
+      (plus:VWEXTI
+       (any_extend:VWEXTI
+ (match_operand:<V_DOUBLE_TRUNC> 1 "register_operand"))
+       (any_extend:VWEXTI
+ (match_operand:<V_DOUBLE_TRUNC> 2 "register_operand")))
+      (const_int 1)))))]
+  "TARGET_VECTOR"
+{
+  /* First emit a widening addition.  */
+  rtx tmp1 = gen_reg_rtx (<MODE>mode);
+  rtx ops1[] = {tmp1, operands[1], operands[2]};
+  insn_code icode = code_for_pred_dual_widen (PLUS, <CODE>, <MODE>mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops1);
+
+  /* Then add 1.  */
+  rtx tmp2 = gen_reg_rtx (<MODE>mode);
+  rtx ops2[] = {tmp2, tmp1, const1_rtx};
+  icode = code_for_pred_scalar (PLUS, <MODE>mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops2);
+
+  /* Finally, a narrowing shift.  */
+  rtx ops3[] = {operands[0], tmp2, const1_rtx};
+  icode = code_for_pred_narrow_scalar (<EXT_TO_RSHIFT>, <MODE>mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops3);
+  DONE;
+})
diff --git a/gcc/config/riscv/vector-iterators.md b/gcc/config/riscv/vector-iterators.md
index 37c6337f1a3..409f63332c9 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -1968,6 +1968,11 @@ (define_code_attr macc_msac [(plus "macc") (minus "msac")])
(define_code_attr nmsub_nmadd [(plus "nmsub") (minus "nmadd")])
(define_code_attr nmsac_nmacc [(plus "nmsac") (minus "nmacc")])
+(define_code_attr ext_to_rshift [(sign_extend "ashiftrt")
+                                 (zero_extend "lshiftrt")])
+(define_code_attr EXT_TO_RSHIFT [(sign_extend "ASHIFTRT")
+                                 (zero_extend "LSHIFTRT")])
+
(define_code_iterator and_ior [and ior])
(define_code_iterator any_float_binop [plus mult minus div])
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-run.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-run.c
new file mode 100644
index 00000000000..7ca193ec2f2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-run.c
@@ -0,0 +1,85 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "-std=c99 -fno-vect-cost-model --param=riscv-autovec-preference=scalable -lm" } */
+
+#include <limits.h>
+#include <math.h>
+#include <assert.h>
+
+#include "vec-avg-template.h"
+
+#define SZ 256
+
+#define RUNS1(TYPE, SCALE)                                                     \
+  TYPE a##TYPE[SZ + 1];                                                        \
+  TYPE b##TYPE[SZ + 1];                                                        \
+  TYPE dst##TYPE[SZ + 1];                                                      \
+  for (int cnt = 0, i = -(SZ * SCALE) / 2; i < (SZ * SCALE) / 2; i += SCALE)   \
+    {                                                                          \
+      a##TYPE[cnt] = i;                                                        \
+      b##TYPE[cnt] = i + 1;                                                    \
+      dst##TYPE[cnt++] = 0;                                                    \
+    }                                                                          \
+  vavg_##TYPE (dst##TYPE, a##TYPE, b##TYPE, SZ);                               \
+  for (int i = 0; i < SZ; i += SCALE)                                          \
+    assert (dst##TYPE[i] == floor ((a##TYPE[i] + b##TYPE[i]) / 2.0));
+
+#define RUNU1(TYPE, SCALE)                                                     \
+  TYPE a##TYPE[SZ + 1];                                                        \
+  TYPE b##TYPE[SZ + 1];                                                        \
+  TYPE dst##TYPE[SZ + 1];                                                      \
+  for (int cnt = 0, i = 0; i < (SZ * SCALE); i += SCALE)                       \
+    {                                                                          \
+      a##TYPE[cnt] = i;                                                        \
+      b##TYPE[cnt] = i + 1;                                                    \
+      dst##TYPE[cnt++] = 0;                                                    \
+    }                                                                          \
+  vavg_##TYPE (dst##TYPE, a##TYPE, b##TYPE, SZ);                               \
+  for (int i = 0; i < SZ; i += SCALE)                                          \
+    assert (dst##TYPE[i] == floor ((a##TYPE[i] + b##TYPE[i]) / 2.0));
+
+#define RUNS2(TYPE, SCALE)                                                     \
+  TYPE a2##TYPE[SZ + 1];                                                       \
+  TYPE b2##TYPE[SZ + 1];                                                       \
+  TYPE dst2##TYPE[SZ + 1];                                                     \
+  for (int cnt = 0, i = -(SZ * SCALE) / 2; i < (SZ * SCALE) / 2; i += SCALE)   \
+    {                                                                          \
+      a2##TYPE[cnt] = i;                                                       \
+      b2##TYPE[cnt] = i + 1;                                                   \
+      dst2##TYPE[cnt++] = 0;                                                   \
+    }                                                                          \
+  vavg2_##TYPE (dst2##TYPE, a2##TYPE, b2##TYPE, SZ);                           \
+  for (int i = 0; i < SZ; i += SCALE)                                          \
+    assert (dst2##TYPE[i] == ceil ((a2##TYPE[i] + b2##TYPE[i]) / 2.0));
+
+#define RUNU2(TYPE, SCALE)                                                     \
+  TYPE a2##TYPE[SZ + 1];                                                       \
+  TYPE b2##TYPE[SZ + 1];                                                       \
+  TYPE dst2##TYPE[SZ + 1];                                                     \
+  for (int cnt = 0, i = 0; i < (SZ * SCALE); i += SCALE)                       \
+    {                                                                          \
+      a2##TYPE[cnt] = i;                                                       \
+      b2##TYPE[cnt] = i + 1;                                                   \
+      dst2##TYPE[cnt++] = 0;                                                   \
+    }                                                                          \
+  vavg2_##TYPE (dst2##TYPE, a2##TYPE, b2##TYPE, SZ);                           \
+  for (int i = 0; i < SZ; i += SCALE)                                          \
+    assert (dst2##TYPE[i] == ceil ((a2##TYPE[i] + b2##TYPE[i]) / 2.0));
+
+#define RUN_ALL()                                                              \
+  RUNS1 (int8_t, 1)                                                            \
+  RUNS1 (int16_t, 256)                                                         \
+  RUNS1 (int32_t, 65536)                                                       \
+  RUNU1 (uint8_t, 1)                                                           \
+  RUNU1 (uint16_t, 256)                                                        \
+  RUNU1 (uint32_t, 65536)                                                      \
+  RUNS2 (int8_t, 1)                                                            \
+  RUNS2 (int16_t, 256)                                                         \
+  RUNS2 (int32_t, 65536)                                                       \
+  RUNU2 (uint8_t, 1)                                                           \
+  RUNU2 (uint16_t, 256)                                                        \
+  RUNU2 (uint32_t, 65536)\
+
+int main ()
+{
+  RUN_ALL ()
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-rv32gcv.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-rv32gcv.c
new file mode 100644
index 00000000000..e2754339d94
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-rv32gcv.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-std=c99 -fno-vect-cost-model -march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable" } */
+
+#include "vec-avg-template.h"
+
+/* { dg-final { scan-assembler-times {\tvwadd\.vv} 6 } } */
+/* { dg-final { scan-assembler-times {\tvwaddu\.vv} 6 } } */
+/* { dg-final { scan-assembler-times {\tvadd\.vi} 6 } } */
+/* { dg-final { scan-assembler-times {\tvnsrl.wi} 6 } } */
+/* { dg-final { scan-assembler-times {\tvnsra.wi} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-rv64gcv.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-rv64gcv.c
new file mode 100644
index 00000000000..210c0dc5460
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-rv64gcv.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-std=c99 -fno-vect-cost-model -march=rv64gcv_zvfh -mabi=lp64d --param=riscv-autovec-preference=scalable" } */
+
+#include "vec-avg-template.h"
+
+/* { dg-final { scan-assembler-times {\tvwadd\.vv} 6 } } */
+/* { dg-final { scan-assembler-times {\tvwaddu\.vv} 6 } } */
+/* { dg-final { scan-assembler-times {\tvadd\.vi} 6 } } */
+/* { dg-final { scan-assembler-times {\tvnsrl\.wi} 6 } } */
+/* { dg-final { scan-assembler-times {\tvnsra\.wi} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-template.h b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-template.h
new file mode 100644
index 00000000000..9c2a6f1b9cb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-template.h
@@ -0,0 +1,33 @@
+#include <stdint-gcc.h>
+
+#define TEST_TYPE(TYPE, TYPE2)                                                 \
+  __attribute__ ((noipa)) void vavg_##TYPE (TYPE *dst, TYPE *a, TYPE *b,       \
+     int n)                             \
+  {                                                                            \
+    for (int i = 0; i < n; i++)                                                \
+      dst[i] = ((TYPE2) a[i] + b[i]) >> 1;                                     \
+  }
+
+#define TEST_TYPE2(TYPE, TYPE2)                                                \
+  __attribute__ ((noipa)) void vavg2_##TYPE (TYPE *dst, TYPE *a, TYPE *b,      \
+      int n)                            \
+  {                                                                            \
+    for (int i = 0; i < n; i++)                                                \
+      dst[i] = ((TYPE2) a[i] + b[i] + 1) >> 1;                                 \
+  }
+
+#define TEST_ALL()                                                             \
+  TEST_TYPE (int8_t, int16_t)                                                  \
+  TEST_TYPE (uint8_t, uint16_t)                                                \
+  TEST_TYPE (int16_t, int32_t)                                                 \
+  TEST_TYPE (uint16_t, uint32_t)                                               \
+  TEST_TYPE (int32_t, int64_t)                                                 \
+  TEST_TYPE (uint32_t, uint64_t)                                               \
+  TEST_TYPE2 (int8_t, int16_t)                                                 \
+  TEST_TYPE2 (uint8_t, uint16_t)                                               \
+  TEST_TYPE2 (int16_t, int32_t)                                                \
+  TEST_TYPE2 (uint16_t, uint32_t)                                              \
+  TEST_TYPE2 (int32_t, int64_t)                                                \
+  TEST_TYPE2 (uint32_t, uint64_t)
+
+TEST_ALL()
-- 
2.41.0
 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] RISC-V: Implement vector "average" autovec pattern.
  2023-08-02 14:03 ` 钟居哲
@ 2023-08-02 18:49   ` Robin Dapp
  2023-08-02 21:36     ` 钟居哲
  2023-08-02 21:44     ` 钟居哲
  0 siblings, 2 replies; 6+ messages in thread
From: Robin Dapp @ 2023-08-02 18:49 UTC (permalink / raw)
  To: 钟居哲, gcc-patches, palmer, kito.cheng, Jeff Law
  Cc: rdapp.gcc

> 1. How do you model round to +Inf (avg_floor) and round to -Inf (avg_ceil) ?

That's just specified by the +1 or the lack of it in the original pattern.
Actually the IFN is just a detour because we would create perfect code
if not for the fallback.  But as there is currently now way to check for
the existence of a narrowing shift we cannot circumvent the fallback.

> 2. Is it possible we could use vaadd[u] to model avg ?
In principle yes (I first read it wrong that overflow must not happen but the
specs actually say that it does not happen).
However, we would need to set a rounding mode before vaadd or check its current
value and provide a fallback.  Off the spot I can't imagine a workaround like
two vaadds or so.

Regards
 Robin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Re: [PATCH] RISC-V: Implement vector "average" autovec pattern.
  2023-08-02 18:49   ` Robin Dapp
@ 2023-08-02 21:36     ` 钟居哲
  2023-08-02 21:44     ` 钟居哲
  1 sibling, 0 replies; 6+ messages in thread
From: 钟居哲 @ 2023-08-02 21:36 UTC (permalink / raw)
  To: rdapp.gcc, gcc-patches, palmer, kito.cheng, Jeff Law; +Cc: rdapp.gcc

[-- Attachment #1: Type: text/plain, Size: 1083 bytes --]

I just checked LLVM:
https://godbolt.org/z/nMa6qnEeT 

This patch generally is reasonable so LGTM.



juzhe.zhong@rivai.ai
 
From: Robin Dapp
Date: 2023-08-03 02:49
To: 钟居哲; gcc-patches; palmer; kito.cheng; Jeff Law
CC: rdapp.gcc
Subject: Re: [PATCH] RISC-V: Implement vector "average" autovec pattern.
> 1. How do you model round to +Inf (avg_floor) and round to -Inf (avg_ceil) ?
 
That's just specified by the +1 or the lack of it in the original pattern.
Actually the IFN is just a detour because we would create perfect code
if not for the fallback.  But as there is currently now way to check for
the existence of a narrowing shift we cannot circumvent the fallback.
 
> 2. Is it possible we could use vaadd[u] to model avg ?
In principle yes (I first read it wrong that overflow must not happen but the
specs actually say that it does not happen).
However, we would need to set a rounding mode before vaadd or check its current
value and provide a fallback.  Off the spot I can't imagine a workaround like
two vaadds or so.
 
Regards
Robin
 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Re: [PATCH] RISC-V: Implement vector "average" autovec pattern.
  2023-08-02 18:49   ` Robin Dapp
  2023-08-02 21:36     ` 钟居哲
@ 2023-08-02 21:44     ` 钟居哲
  2023-08-15 11:36       ` Robin Dapp
  1 sibling, 1 reply; 6+ messages in thread
From: 钟居哲 @ 2023-08-02 21:44 UTC (permalink / raw)
  To: rdapp.gcc, gcc-patches, palmer, kito.cheng, Jeff Law; +Cc: rdapp.gcc

[-- Attachment #1: Type: text/plain, Size: 1720 bytes --]

Plz put your testcases into:

# widening operation only test on LMUL < 8
set AUTOVEC_TEST_OPTS [list \
  {-ftree-vectorize -O3 --param riscv-autovec-lmul=m1} \
  {-ftree-vectorize -O3 --param riscv-autovec-lmul=m2} \
  {-ftree-vectorize -O3 --param riscv-autovec-lmul=m4} \
  {-ftree-vectorize -O2 --param riscv-autovec-lmul=m1} \
  {-ftree-vectorize -O2 --param riscv-autovec-lmul=m2} \
  {-ftree-vectorize -O2 --param riscv-autovec-lmul=m4} ]
foreach op $AUTOVEC_TEST_OPTS {
  dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/widen/*.\[cS\]]] \
    "" "$op"
}

You could either simpilfy put them into "widen" directory or create a new directly.
Anyway, make sure you have fully tested it with LMUL = 1/2/4.



juzhe.zhong@rivai.ai
 
From: Robin Dapp
Date: 2023-08-03 02:49
To: 钟居哲; gcc-patches; palmer; kito.cheng; Jeff Law
CC: rdapp.gcc
Subject: Re: [PATCH] RISC-V: Implement vector "average" autovec pattern.
> 1. How do you model round to +Inf (avg_floor) and round to -Inf (avg_ceil) ?
 
That's just specified by the +1 or the lack of it in the original pattern.
Actually the IFN is just a detour because we would create perfect code
if not for the fallback.  But as there is currently now way to check for
the existence of a narrowing shift we cannot circumvent the fallback.
 
> 2. Is it possible we could use vaadd[u] to model avg ?
In principle yes (I first read it wrong that overflow must not happen but the
specs actually say that it does not happen).
However, we would need to set a rounding mode before vaadd or check its current
value and provide a fallback.  Off the spot I can't imagine a workaround like
two vaadds or so.
 
Regards
Robin
 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] RISC-V: Implement vector "average" autovec pattern.
  2023-08-02 21:44     ` 钟居哲
@ 2023-08-15 11:36       ` Robin Dapp
  0 siblings, 0 replies; 6+ messages in thread
From: Robin Dapp @ 2023-08-15 11:36 UTC (permalink / raw)
  To: 钟居哲, gcc-patches, palmer, kito.cheng, Jeff Law
  Cc: rdapp.gcc

> Plz put your testcases into:
> 
> # widening operation only test on LMUL < 8
> set AUTOVEC_TEST_OPTS [list \
>   {-ftree-vectorize -O3 --param riscv-autovec-lmul=m1} \
>   {-ftree-vectorize -O3 --param riscv-autovec-lmul=m2} \
>   {-ftree-vectorize -O3 --param riscv-autovec-lmul=m4} \
>   {-ftree-vectorize -O2 --param riscv-autovec-lmul=m1} \
>   {-ftree-vectorize -O2 --param riscv-autovec-lmul=m2} \
>   {-ftree-vectorize -O2 --param riscv-autovec-lmul=m4} ]
> foreach op $AUTOVEC_TEST_OPTS {
>   dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/widen/*.\[cS\]]] \
>     "" "$op"
> }
> 
> You could either simpilfy put them into "widen" directory or create a new directly.
> Anyway, make sure you have fully tested it with LMUL = 1/2/4.

Ah, almost forgot this.  I moved the tests to the widen directory
and will push it after testing.

Regards
 Robin

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-08-15 11:36 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-01 14:31 [PATCH] RISC-V: Implement vector "average" autovec pattern Robin Dapp
2023-08-02 14:03 ` 钟居哲
2023-08-02 18:49   ` Robin Dapp
2023-08-02 21:36     ` 钟居哲
2023-08-02 21:44     ` 钟居哲
2023-08-15 11:36       ` Robin Dapp

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).