* [PATCH v1] RISC-V: Support FP roundeven auto-vectorization
@ 2023-09-27 8:20 pan2.li
2023-09-27 8:23 ` juzhe.zhong
0 siblings, 1 reply; 3+ messages in thread
From: pan2.li @ 2023-09-27 8:20 UTC (permalink / raw)
To: gcc-patches; +Cc: juzhe.zhong, pan2.li, yanzhang.wang, kito.cheng
From: Pan Li <pan2.li@intel.com>
This patch would like to support auto-vectorization for the
roundeven API in math.h. It depends on the -ffast-math option.
When we would like to call roundeven like v2 = roundeven (v1), we will
convert it into below insns (reference the implementation of llvm).
* vfcvt.x.f v3, v1, RNE
* vfcvt.f.x v2, v3
However, the floating point value may not need the cvt as above if
its mantissa is zero. For example single precision floating point below.
+-----------+---------------+-----------------+
| raw float | binary layout | after roundeven |
+-----------+---------------+-----------------+
| 8388607.5 | 0x4affffff | 8388608.0 |
| 8388608.0 | 0x4b000000 | 8388608.0 |
| 8388609.0 | 0x4b000001 | 8388609.0 |
+-----------+---------------+-----------------+
All single floating point glte 8388608.0 will have all zero mantisaa.
We leverage vmflt and mask to filter them out in vector and only do the
cvt on mask.
Befor this patch:
math-roundeven-1.c:21:1: missed: couldn't vectorize loop
...
.L3:
flw fa0,0(s0)
addi s0,s0,4
addi s1,s1,4
call roundeven
fsw fa0,-4(s1)
bne s0,s2,.L3
After this patch:
...
fsrmi 0 // Rounding to nearest, ties to even
.L4:
vfabs.v v1,v2
vmflt.vf v0,v1,fa5
vfcvt.x.f.v v3,v2,v0.t
vfcvt.f.x.v v1,v3,v0.t
vfsgnj.vv v1,v1,v2
bne .L4
.L14:
fsrm a6
ret
Please note VLS mode is also involved in this patch and covered by the
test cases. We will add more run test with zfa support later.
gcc/ChangeLog:
* config/riscv/autovec.md (roundeven<mode>2): New pattern.
* config/riscv/riscv-protos.h (enum insn_flags): New enum type.
(enum insn_type): Ditto.
(expand_vec_roundeven): New func decl.
* config/riscv/riscv-v.cc (expand_vec_roundeven): New func impl.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-roundeven-1.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
---
gcc/config/riscv/autovec.md | 10 ++++
gcc/config/riscv/riscv-protos.h | 5 ++
gcc/config/riscv/riscv-v.cc | 24 ++++++++
.../riscv/rvv/autovec/unop/math-roundeven-0.c | 23 ++++++++
.../riscv/rvv/autovec/unop/math-roundeven-1.c | 23 ++++++++
.../riscv/rvv/autovec/unop/math-roundeven-2.c | 23 ++++++++
.../riscv/rvv/autovec/unop/math-roundeven-3.c | 25 +++++++++
.../riscv/rvv/autovec/vls/math-roundeven-1.c | 56 +++++++++++++++++++
8 files changed, 189 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-0.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-roundeven-1.c
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 680a3374972..cd0cbdd2889 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2271,3 +2271,13 @@ (define_expand "btrunc<mode>2"
DONE;
}
)
+
+(define_expand "roundeven<mode>2"
+ [(match_operand:V_VLSF 0 "register_operand")
+ (match_operand:V_VLSF 1 "register_operand")]
+ "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+ {
+ riscv_vector::expand_vec_roundeven (operands[0], operands[1], <MODE>mode, <VCONVERT>mode);
+ DONE;
+ }
+)
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 536e70bdcd3..368982a447b 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -259,6 +259,9 @@ enum insn_flags : unsigned int
/* Means INSN has FRM operand and the value is FRM_RMM. */
FRM_RMM_P = 1 << 18,
+
+ /* Means INSN has FRM operand and the value is FRM_RNE. */
+ FRM_RNE_P = 1 << 19,
};
enum insn_type : unsigned int
@@ -303,6 +306,7 @@ enum insn_type : unsigned int
UNARY_OP_TAMU_FRM_RUP = UNARY_OP_TAMU | FRM_RUP_P,
UNARY_OP_TAMU_FRM_RDN = UNARY_OP_TAMU | FRM_RDN_P,
UNARY_OP_TAMU_FRM_RMM = UNARY_OP_TAMU | FRM_RMM_P,
+ UNARY_OP_TAMU_FRM_RNE = UNARY_OP_TAMU | FRM_RNE_P,
/* Binary operator. */
BINARY_OP = __NORMAL_OP | BINARY_OP_P,
@@ -469,6 +473,7 @@ void expand_vec_nearbyint (rtx, rtx, machine_mode, machine_mode);
void expand_vec_rint (rtx, rtx, machine_mode, machine_mode);
void expand_vec_round (rtx, rtx, machine_mode, machine_mode);
void expand_vec_trunc (rtx, rtx, machine_mode, machine_mode);
+void expand_vec_roundeven (rtx, rtx, machine_mode, machine_mode);
#endif
bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode,
bool, void (*)(rtx *, rtx));
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 8992977a51d..359fb2ced8b 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -332,6 +332,8 @@ public:
add_rounding_mode_operand (FRM_RDN);
else if (m_insn_flags & FRM_RMM_P)
add_rounding_mode_operand (FRM_RMM);
+ else if (m_insn_flags & FRM_RNE_P)
+ add_rounding_mode_operand (FRM_RNE);
gcc_assert (insn_data[(int) icode].n_operands == m_opno);
expand (icode, any_mem_p);
@@ -3776,4 +3778,26 @@ expand_vec_trunc (rtx op_0, rtx op_1, machine_mode vec_fp_mode,
emit_vec_copysign (op_0, op_0, op_1, vec_fp_mode);
}
+void
+expand_vec_roundeven (rtx op_0, rtx op_1, machine_mode vec_fp_mode,
+ machine_mode vec_int_mode)
+{
+ /* Step-1: Get the abs float value for mask generation. */
+ emit_vec_abs (op_0, op_1, vec_fp_mode);
+
+ /* Step-2: Generate the mask on const fp. */
+ rtx const_fp = get_fp_rounding_coefficient (GET_MODE_INNER (vec_fp_mode));
+ rtx mask = emit_vec_float_cmp_mask (op_0, LT, const_fp, vec_fp_mode);
+
+ /* Step-3: Convert to integer on mask, rounding to nearest, ties to even. */
+ rtx tmp = gen_reg_rtx (vec_int_mode);
+ emit_vec_cvt_x_f (tmp, op_1, mask, UNARY_OP_TAMU_FRM_RNE, vec_fp_mode);
+
+ /* Step-4: Convert to floating-point on mask for the rint result. */
+ emit_vec_cvt_f_x (op_0, tmp, mask, UNARY_OP_TAMU_FRM_RNE, vec_fp_mode);
+
+ /* Step-5: Retrieve the sign bit for -0.0. */
+ emit_vec_copysign (op_0, op_0, op_1, vec_fp_mode);
+}
+
} // namespace riscv_vector
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-0.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-0.c
new file mode 100644
index 00000000000..ab65e372f0e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-0.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "test-math.h"
+
+/*
+** test__Float16___builtin_roundevenf16:
+** frrm\s+[atx][0-9]+
+** ...
+** fsrmi\s+0
+** ...
+** vsetvli\s+[atx][0-9]+,\s*zero,\s*e16,\s*m1,\s*ta,\s*mu
+** vfabs\.v\s+v[0-9]+,\s*v[0-9]+
+** vmflt\.vf\s+v0,\s*v[0-9]+,\s*[fa]+[0-9]+
+** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t
+** vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t
+** vfsgnj\.vv\s+v[0-9]+,v[0-9]+,v[0-9]+
+** ...
+** fsrm\s+[atx][0-9]+
+** ...
+*/
+TEST_UNARY_CALL (_Float16, __builtin_roundevenf16)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-1.c
new file mode 100644
index 00000000000..fac85ed0895
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-1.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "test-math.h"
+
+/*
+** test_float___builtin_roundevenf:
+** frrm\s+[atx][0-9]+
+** ...
+** fsrmi\s+0
+** ...
+** vsetvli\s+[atx][0-9]+,\s*zero,\s*e32,\s*m1,\s*ta,\s*mu
+** vfabs\.v\s+v[0-9]+,\s*v[0-9]+
+** vmflt\.vf\s+v0,\s*v[0-9]+,\s*[fa]+[0-9]+
+** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t
+** vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t
+** vfsgnj\.vv\s+v[0-9]+,v[0-9]+,v[0-9]+
+** ...
+** fsrm\s+[atx][0-9]+
+** ...
+*/
+TEST_UNARY_CALL (float, __builtin_roundevenf)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-2.c
new file mode 100644
index 00000000000..074f1b4a1ae
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-2.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "test-math.h"
+
+/*
+** test_double___builtin_roundeven:
+** frrm\s+[atx][0-9]+
+** ...
+** fsrmi\s+0
+** ...
+** vsetvli\s+[atx][0-9]+,\s*zero,\s*e64,\s*m1,\s*ta,\s*mu
+** vfabs\.v\s+v[0-9]+,\s*v[0-9]+
+** vmflt\.vf\s+v0,\s*v[0-9]+,\s*[fa]+[0-9]+
+** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t
+** vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t
+** vfsgnj\.vv\s+v[0-9]+,v[0-9]+,v[0-9]+
+** ...
+** fsrm\s+[atx][0-9]+
+** ...
+*/
+TEST_UNARY_CALL (double, __builtin_roundeven)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-3.c
new file mode 100644
index 00000000000..c95e8eca007
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-3.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "test-math.h"
+
+/*
+** test_float___builtin_roundevenf:
+** frrm\s+[atx][0-9]+
+** ...
+** fsrmi\s+0
+** ...
+** vsetvli\s+[atx][0-9]+,\s*zero,\s*e32,\s*m1,\s*ta,\s*mu
+** vfabs\.v\s+v[0-9]+,\s*v[0-9]+
+** vmflt\.vf\s+v0,\s*v[0-9]+,\s*[fa]+[0-9]+
+** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t
+** vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t
+** vfsgnj\.vv\s+v[0-9]+,v[0-9]+,v[0-9]+
+** ...
+** vmerge\.vvm\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+,\s*v0
+** ...
+** fsrm\s+[atx][0-9]+
+** ...
+*/
+TEST_COND_UNARY_CALL (float, __builtin_roundevenf)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-roundeven-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-roundeven-1.c
new file mode 100644
index 00000000000..8489d39481f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-roundeven-1.c
@@ -0,0 +1,56 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvfh_zvl4096b -mabi=lp64d -O3 --param=riscv-autovec-lmul=m8 -ffast-math -fdump-tree-optimized" } */
+
+#include "def.h"
+
+DEF_OP_V (roundevenf16, 1, _Float16, __builtin_roundevenf16)
+DEF_OP_V (roundevenf16, 2, _Float16, __builtin_roundevenf16)
+DEF_OP_V (roundevenf16, 4, _Float16, __builtin_roundevenf16)
+DEF_OP_V (roundevenf16, 8, _Float16, __builtin_roundevenf16)
+DEF_OP_V (roundevenf16, 16, _Float16, __builtin_roundevenf16)
+DEF_OP_V (roundevenf16, 32, _Float16, __builtin_roundevenf16)
+DEF_OP_V (roundevenf16, 64, _Float16, __builtin_roundevenf16)
+DEF_OP_V (roundevenf16, 128, _Float16, __builtin_roundevenf16)
+DEF_OP_V (roundevenf16, 256, _Float16, __builtin_roundevenf16)
+DEF_OP_V (roundevenf16, 512, _Float16, __builtin_roundevenf16)
+DEF_OP_V (roundevenf16, 1024, _Float16, __builtin_roundevenf16)
+DEF_OP_V (roundevenf16, 2048, _Float16, __builtin_roundevenf16)
+
+DEF_OP_V (roundevenf, 1, float, __builtin_roundevenf)
+DEF_OP_V (roundevenf, 2, float, __builtin_roundevenf)
+DEF_OP_V (roundevenf, 4, float, __builtin_roundevenf)
+DEF_OP_V (roundevenf, 8, float, __builtin_roundevenf)
+DEF_OP_V (roundevenf, 16, float, __builtin_roundevenf)
+DEF_OP_V (roundevenf, 32, float, __builtin_roundevenf)
+DEF_OP_V (roundevenf, 64, float, __builtin_roundevenf)
+DEF_OP_V (roundevenf, 128, float, __builtin_roundevenf)
+DEF_OP_V (roundevenf, 256, float, __builtin_roundevenf)
+DEF_OP_V (roundevenf, 512, float, __builtin_roundevenf)
+DEF_OP_V (roundevenf, 1024, float, __builtin_roundevenf)
+
+DEF_OP_V (roundeven, 1, double, __builtin_roundeven)
+DEF_OP_V (roundeven, 2, double, __builtin_roundeven)
+DEF_OP_V (roundeven, 4, double, __builtin_roundeven)
+DEF_OP_V (roundeven, 8, double, __builtin_roundeven)
+DEF_OP_V (roundeven, 16, double, __builtin_roundeven)
+DEF_OP_V (roundeven, 32, double, __builtin_roundeven)
+DEF_OP_V (roundeven, 64, double, __builtin_roundeven)
+DEF_OP_V (roundeven, 128, double, __builtin_roundeven)
+DEF_OP_V (roundeven, 256, double, __builtin_roundeven)
+DEF_OP_V (roundeven, 512, double, __builtin_roundeven)
+
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-not "1,1" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "2,2" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "4,4" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "16,16" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "32,32" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "64,64" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "128,128" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "256,256" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "512,512" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "1024,1024" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "2048,2048" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "4096,4096" "optimized" } } */
+/* { dg-final { scan-assembler-times {vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t} 30 } } */
+/* { dg-final { scan-assembler-times {vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t} 30 } } */
--
2.34.1
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v1] RISC-V: Support FP roundeven auto-vectorization
2023-09-27 8:20 [PATCH v1] RISC-V: Support FP roundeven auto-vectorization pan2.li
@ 2023-09-27 8:23 ` juzhe.zhong
2023-09-27 8:29 ` Li, Pan2
0 siblings, 1 reply; 3+ messages in thread
From: juzhe.zhong @ 2023-09-27 8:23 UTC (permalink / raw)
To: pan2.li, gcc-patches; +Cc: pan2.li, yanzhang.wang, kito.cheng
[-- Attachment #1: Type: text/plain, Size: 14837 bytes --]
LGTM
juzhe.zhong@rivai.ai
From: pan2.li
Date: 2023-09-27 16:20
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Support FP roundeven auto-vectorization
From: Pan Li <pan2.li@intel.com>
This patch would like to support auto-vectorization for the
roundeven API in math.h. It depends on the -ffast-math option.
When we would like to call roundeven like v2 = roundeven (v1), we will
convert it into below insns (reference the implementation of llvm).
* vfcvt.x.f v3, v1, RNE
* vfcvt.f.x v2, v3
However, the floating point value may not need the cvt as above if
its mantissa is zero. For example single precision floating point below.
+-----------+---------------+-----------------+
| raw float | binary layout | after roundeven |
+-----------+---------------+-----------------+
| 8388607.5 | 0x4affffff | 8388608.0 |
| 8388608.0 | 0x4b000000 | 8388608.0 |
| 8388609.0 | 0x4b000001 | 8388609.0 |
+-----------+---------------+-----------------+
All single floating point glte 8388608.0 will have all zero mantisaa.
We leverage vmflt and mask to filter them out in vector and only do the
cvt on mask.
Befor this patch:
math-roundeven-1.c:21:1: missed: couldn't vectorize loop
...
.L3:
flw fa0,0(s0)
addi s0,s0,4
addi s1,s1,4
call roundeven
fsw fa0,-4(s1)
bne s0,s2,.L3
After this patch:
...
fsrmi 0 // Rounding to nearest, ties to even
.L4:
vfabs.v v1,v2
vmflt.vf v0,v1,fa5
vfcvt.x.f.v v3,v2,v0.t
vfcvt.f.x.v v1,v3,v0.t
vfsgnj.vv v1,v1,v2
bne .L4
.L14:
fsrm a6
ret
Please note VLS mode is also involved in this patch and covered by the
test cases. We will add more run test with zfa support later.
gcc/ChangeLog:
* config/riscv/autovec.md (roundeven<mode>2): New pattern.
* config/riscv/riscv-protos.h (enum insn_flags): New enum type.
(enum insn_type): Ditto.
(expand_vec_roundeven): New func decl.
* config/riscv/riscv-v.cc (expand_vec_roundeven): New func impl.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-roundeven-1.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
---
gcc/config/riscv/autovec.md | 10 ++++
gcc/config/riscv/riscv-protos.h | 5 ++
gcc/config/riscv/riscv-v.cc | 24 ++++++++
.../riscv/rvv/autovec/unop/math-roundeven-0.c | 23 ++++++++
.../riscv/rvv/autovec/unop/math-roundeven-1.c | 23 ++++++++
.../riscv/rvv/autovec/unop/math-roundeven-2.c | 23 ++++++++
.../riscv/rvv/autovec/unop/math-roundeven-3.c | 25 +++++++++
.../riscv/rvv/autovec/vls/math-roundeven-1.c | 56 +++++++++++++++++++
8 files changed, 189 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-0.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-roundeven-1.c
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 680a3374972..cd0cbdd2889 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2271,3 +2271,13 @@ (define_expand "btrunc<mode>2"
DONE;
}
)
+
+(define_expand "roundeven<mode>2"
+ [(match_operand:V_VLSF 0 "register_operand")
+ (match_operand:V_VLSF 1 "register_operand")]
+ "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+ {
+ riscv_vector::expand_vec_roundeven (operands[0], operands[1], <MODE>mode, <VCONVERT>mode);
+ DONE;
+ }
+)
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 536e70bdcd3..368982a447b 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -259,6 +259,9 @@ enum insn_flags : unsigned int
/* Means INSN has FRM operand and the value is FRM_RMM. */
FRM_RMM_P = 1 << 18,
+
+ /* Means INSN has FRM operand and the value is FRM_RNE. */
+ FRM_RNE_P = 1 << 19,
};
enum insn_type : unsigned int
@@ -303,6 +306,7 @@ enum insn_type : unsigned int
UNARY_OP_TAMU_FRM_RUP = UNARY_OP_TAMU | FRM_RUP_P,
UNARY_OP_TAMU_FRM_RDN = UNARY_OP_TAMU | FRM_RDN_P,
UNARY_OP_TAMU_FRM_RMM = UNARY_OP_TAMU | FRM_RMM_P,
+ UNARY_OP_TAMU_FRM_RNE = UNARY_OP_TAMU | FRM_RNE_P,
/* Binary operator. */
BINARY_OP = __NORMAL_OP | BINARY_OP_P,
@@ -469,6 +473,7 @@ void expand_vec_nearbyint (rtx, rtx, machine_mode, machine_mode);
void expand_vec_rint (rtx, rtx, machine_mode, machine_mode);
void expand_vec_round (rtx, rtx, machine_mode, machine_mode);
void expand_vec_trunc (rtx, rtx, machine_mode, machine_mode);
+void expand_vec_roundeven (rtx, rtx, machine_mode, machine_mode);
#endif
bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode,
bool, void (*)(rtx *, rtx));
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 8992977a51d..359fb2ced8b 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -332,6 +332,8 @@ public:
add_rounding_mode_operand (FRM_RDN);
else if (m_insn_flags & FRM_RMM_P)
add_rounding_mode_operand (FRM_RMM);
+ else if (m_insn_flags & FRM_RNE_P)
+ add_rounding_mode_operand (FRM_RNE);
gcc_assert (insn_data[(int) icode].n_operands == m_opno);
expand (icode, any_mem_p);
@@ -3776,4 +3778,26 @@ expand_vec_trunc (rtx op_0, rtx op_1, machine_mode vec_fp_mode,
emit_vec_copysign (op_0, op_0, op_1, vec_fp_mode);
}
+void
+expand_vec_roundeven (rtx op_0, rtx op_1, machine_mode vec_fp_mode,
+ machine_mode vec_int_mode)
+{
+ /* Step-1: Get the abs float value for mask generation. */
+ emit_vec_abs (op_0, op_1, vec_fp_mode);
+
+ /* Step-2: Generate the mask on const fp. */
+ rtx const_fp = get_fp_rounding_coefficient (GET_MODE_INNER (vec_fp_mode));
+ rtx mask = emit_vec_float_cmp_mask (op_0, LT, const_fp, vec_fp_mode);
+
+ /* Step-3: Convert to integer on mask, rounding to nearest, ties to even. */
+ rtx tmp = gen_reg_rtx (vec_int_mode);
+ emit_vec_cvt_x_f (tmp, op_1, mask, UNARY_OP_TAMU_FRM_RNE, vec_fp_mode);
+
+ /* Step-4: Convert to floating-point on mask for the rint result. */
+ emit_vec_cvt_f_x (op_0, tmp, mask, UNARY_OP_TAMU_FRM_RNE, vec_fp_mode);
+
+ /* Step-5: Retrieve the sign bit for -0.0. */
+ emit_vec_copysign (op_0, op_0, op_1, vec_fp_mode);
+}
+
} // namespace riscv_vector
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-0.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-0.c
new file mode 100644
index 00000000000..ab65e372f0e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-0.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "test-math.h"
+
+/*
+** test__Float16___builtin_roundevenf16:
+** frrm\s+[atx][0-9]+
+** ...
+** fsrmi\s+0
+** ...
+** vsetvli\s+[atx][0-9]+,\s*zero,\s*e16,\s*m1,\s*ta,\s*mu
+** vfabs\.v\s+v[0-9]+,\s*v[0-9]+
+** vmflt\.vf\s+v0,\s*v[0-9]+,\s*[fa]+[0-9]+
+** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t
+** vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t
+** vfsgnj\.vv\s+v[0-9]+,v[0-9]+,v[0-9]+
+** ...
+** fsrm\s+[atx][0-9]+
+** ...
+*/
+TEST_UNARY_CALL (_Float16, __builtin_roundevenf16)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-1.c
new file mode 100644
index 00000000000..fac85ed0895
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-1.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "test-math.h"
+
+/*
+** test_float___builtin_roundevenf:
+** frrm\s+[atx][0-9]+
+** ...
+** fsrmi\s+0
+** ...
+** vsetvli\s+[atx][0-9]+,\s*zero,\s*e32,\s*m1,\s*ta,\s*mu
+** vfabs\.v\s+v[0-9]+,\s*v[0-9]+
+** vmflt\.vf\s+v0,\s*v[0-9]+,\s*[fa]+[0-9]+
+** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t
+** vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t
+** vfsgnj\.vv\s+v[0-9]+,v[0-9]+,v[0-9]+
+** ...
+** fsrm\s+[atx][0-9]+
+** ...
+*/
+TEST_UNARY_CALL (float, __builtin_roundevenf)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-2.c
new file mode 100644
index 00000000000..074f1b4a1ae
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-2.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "test-math.h"
+
+/*
+** test_double___builtin_roundeven:
+** frrm\s+[atx][0-9]+
+** ...
+** fsrmi\s+0
+** ...
+** vsetvli\s+[atx][0-9]+,\s*zero,\s*e64,\s*m1,\s*ta,\s*mu
+** vfabs\.v\s+v[0-9]+,\s*v[0-9]+
+** vmflt\.vf\s+v0,\s*v[0-9]+,\s*[fa]+[0-9]+
+** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t
+** vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t
+** vfsgnj\.vv\s+v[0-9]+,v[0-9]+,v[0-9]+
+** ...
+** fsrm\s+[atx][0-9]+
+** ...
+*/
+TEST_UNARY_CALL (double, __builtin_roundeven)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-3.c
new file mode 100644
index 00000000000..c95e8eca007
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-3.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "test-math.h"
+
+/*
+** test_float___builtin_roundevenf:
+** frrm\s+[atx][0-9]+
+** ...
+** fsrmi\s+0
+** ...
+** vsetvli\s+[atx][0-9]+,\s*zero,\s*e32,\s*m1,\s*ta,\s*mu
+** vfabs\.v\s+v[0-9]+,\s*v[0-9]+
+** vmflt\.vf\s+v0,\s*v[0-9]+,\s*[fa]+[0-9]+
+** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t
+** vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t
+** vfsgnj\.vv\s+v[0-9]+,v[0-9]+,v[0-9]+
+** ...
+** vmerge\.vvm\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+,\s*v0
+** ...
+** fsrm\s+[atx][0-9]+
+** ...
+*/
+TEST_COND_UNARY_CALL (float, __builtin_roundevenf)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-roundeven-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-roundeven-1.c
new file mode 100644
index 00000000000..8489d39481f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-roundeven-1.c
@@ -0,0 +1,56 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvfh_zvl4096b -mabi=lp64d -O3 --param=riscv-autovec-lmul=m8 -ffast-math -fdump-tree-optimized" } */
+
+#include "def.h"
+
+DEF_OP_V (roundevenf16, 1, _Float16, __builtin_roundevenf16)
+DEF_OP_V (roundevenf16, 2, _Float16, __builtin_roundevenf16)
+DEF_OP_V (roundevenf16, 4, _Float16, __builtin_roundevenf16)
+DEF_OP_V (roundevenf16, 8, _Float16, __builtin_roundevenf16)
+DEF_OP_V (roundevenf16, 16, _Float16, __builtin_roundevenf16)
+DEF_OP_V (roundevenf16, 32, _Float16, __builtin_roundevenf16)
+DEF_OP_V (roundevenf16, 64, _Float16, __builtin_roundevenf16)
+DEF_OP_V (roundevenf16, 128, _Float16, __builtin_roundevenf16)
+DEF_OP_V (roundevenf16, 256, _Float16, __builtin_roundevenf16)
+DEF_OP_V (roundevenf16, 512, _Float16, __builtin_roundevenf16)
+DEF_OP_V (roundevenf16, 1024, _Float16, __builtin_roundevenf16)
+DEF_OP_V (roundevenf16, 2048, _Float16, __builtin_roundevenf16)
+
+DEF_OP_V (roundevenf, 1, float, __builtin_roundevenf)
+DEF_OP_V (roundevenf, 2, float, __builtin_roundevenf)
+DEF_OP_V (roundevenf, 4, float, __builtin_roundevenf)
+DEF_OP_V (roundevenf, 8, float, __builtin_roundevenf)
+DEF_OP_V (roundevenf, 16, float, __builtin_roundevenf)
+DEF_OP_V (roundevenf, 32, float, __builtin_roundevenf)
+DEF_OP_V (roundevenf, 64, float, __builtin_roundevenf)
+DEF_OP_V (roundevenf, 128, float, __builtin_roundevenf)
+DEF_OP_V (roundevenf, 256, float, __builtin_roundevenf)
+DEF_OP_V (roundevenf, 512, float, __builtin_roundevenf)
+DEF_OP_V (roundevenf, 1024, float, __builtin_roundevenf)
+
+DEF_OP_V (roundeven, 1, double, __builtin_roundeven)
+DEF_OP_V (roundeven, 2, double, __builtin_roundeven)
+DEF_OP_V (roundeven, 4, double, __builtin_roundeven)
+DEF_OP_V (roundeven, 8, double, __builtin_roundeven)
+DEF_OP_V (roundeven, 16, double, __builtin_roundeven)
+DEF_OP_V (roundeven, 32, double, __builtin_roundeven)
+DEF_OP_V (roundeven, 64, double, __builtin_roundeven)
+DEF_OP_V (roundeven, 128, double, __builtin_roundeven)
+DEF_OP_V (roundeven, 256, double, __builtin_roundeven)
+DEF_OP_V (roundeven, 512, double, __builtin_roundeven)
+
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-not "1,1" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "2,2" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "4,4" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "16,16" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "32,32" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "64,64" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "128,128" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "256,256" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "512,512" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "1024,1024" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "2048,2048" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "4096,4096" "optimized" } } */
+/* { dg-final { scan-assembler-times {vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t} 30 } } */
+/* { dg-final { scan-assembler-times {vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t} 30 } } */
--
2.34.1
^ permalink raw reply [flat|nested] 3+ messages in thread
* RE: [PATCH v1] RISC-V: Support FP roundeven auto-vectorization
2023-09-27 8:23 ` juzhe.zhong
@ 2023-09-27 8:29 ` Li, Pan2
0 siblings, 0 replies; 3+ messages in thread
From: Li, Pan2 @ 2023-09-27 8:29 UTC (permalink / raw)
To: juzhe.zhong, gcc-patches; +Cc: Wang, Yanzhang, kito.cheng
[-- Attachment #1: Type: text/plain, Size: 15493 bytes --]
Committed, thanks Juzhe.
Pan
From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
Sent: Wednesday, September 27, 2023 4:24 PM
To: Li, Pan2 <pan2.li@intel.com>; gcc-patches <gcc-patches@gcc.gnu.org>
Cc: Li, Pan2 <pan2.li@intel.com>; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng <kito.cheng@gmail.com>
Subject: Re: [PATCH v1] RISC-V: Support FP roundeven auto-vectorization
LGTM
________________________________
juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>
From: pan2.li<mailto:pan2.li@intel.com>
Date: 2023-09-27 16:20
To: gcc-patches<mailto:gcc-patches@gcc.gnu.org>
CC: juzhe.zhong<mailto:juzhe.zhong@rivai.ai>; pan2.li<mailto:pan2.li@intel.com>; yanzhang.wang<mailto:yanzhang.wang@intel.com>; kito.cheng<mailto:kito.cheng@gmail.com>
Subject: [PATCH v1] RISC-V: Support FP roundeven auto-vectorization
From: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com>>
This patch would like to support auto-vectorization for the
roundeven API in math.h. It depends on the -ffast-math option.
When we would like to call roundeven like v2 = roundeven (v1), we will
convert it into below insns (reference the implementation of llvm).
* vfcvt.x.f v3, v1, RNE
* vfcvt.f.x v2, v3
However, the floating point value may not need the cvt as above if
its mantissa is zero. For example single precision floating point below.
+-----------+---------------+-----------------+
| raw float | binary layout | after roundeven |
+-----------+---------------+-----------------+
| 8388607.5 | 0x4affffff | 8388608.0 |
| 8388608.0 | 0x4b000000 | 8388608.0 |
| 8388609.0 | 0x4b000001 | 8388609.0 |
+-----------+---------------+-----------------+
All single floating point glte 8388608.0 will have all zero mantisaa.
We leverage vmflt and mask to filter them out in vector and only do the
cvt on mask.
Befor this patch:
math-roundeven-1.c:21:1: missed: couldn't vectorize loop
...
.L3:
flw fa0,0(s0)
addi s0,s0,4
addi s1,s1,4
call roundeven
fsw fa0,-4(s1)
bne s0,s2,.L3
After this patch:
...
fsrmi 0 // Rounding to nearest, ties to even
.L4:
vfabs.v v1,v2
vmflt.vf v0,v1,fa5
vfcvt.x.f.v v3,v2,v0.t
vfcvt.f.x.v v1,v3,v0.t
vfsgnj.vv v1,v1,v2
bne .L4
.L14:
fsrm a6
ret
Please note VLS mode is also involved in this patch and covered by the
test cases. We will add more run test with zfa support later.
gcc/ChangeLog:
* config/riscv/autovec.md (roundeven<mode>2): New pattern.
* config/riscv/riscv-protos.h (enum insn_flags): New enum type.
(enum insn_type): Ditto.
(expand_vec_roundeven): New func decl.
* config/riscv/riscv-v.cc (expand_vec_roundeven): New func impl.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-roundeven-1.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com>>
---
gcc/config/riscv/autovec.md | 10 ++++
gcc/config/riscv/riscv-protos.h | 5 ++
gcc/config/riscv/riscv-v.cc | 24 ++++++++
.../riscv/rvv/autovec/unop/math-roundeven-0.c | 23 ++++++++
.../riscv/rvv/autovec/unop/math-roundeven-1.c | 23 ++++++++
.../riscv/rvv/autovec/unop/math-roundeven-2.c | 23 ++++++++
.../riscv/rvv/autovec/unop/math-roundeven-3.c | 25 +++++++++
.../riscv/rvv/autovec/vls/math-roundeven-1.c | 56 +++++++++++++++++++
8 files changed, 189 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-0.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-roundeven-1.c
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 680a3374972..cd0cbdd2889 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2271,3 +2271,13 @@ (define_expand "btrunc<mode>2"
DONE;
}
)
+
+(define_expand "roundeven<mode>2"
+ [(match_operand:V_VLSF 0 "register_operand")
+ (match_operand:V_VLSF 1 "register_operand")]
+ "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+ {
+ riscv_vector::expand_vec_roundeven (operands[0], operands[1], <MODE>mode, <VCONVERT>mode);
+ DONE;
+ }
+)
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 536e70bdcd3..368982a447b 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -259,6 +259,9 @@ enum insn_flags : unsigned int
/* Means INSN has FRM operand and the value is FRM_RMM. */
FRM_RMM_P = 1 << 18,
+
+ /* Means INSN has FRM operand and the value is FRM_RNE. */
+ FRM_RNE_P = 1 << 19,
};
enum insn_type : unsigned int
@@ -303,6 +306,7 @@ enum insn_type : unsigned int
UNARY_OP_TAMU_FRM_RUP = UNARY_OP_TAMU | FRM_RUP_P,
UNARY_OP_TAMU_FRM_RDN = UNARY_OP_TAMU | FRM_RDN_P,
UNARY_OP_TAMU_FRM_RMM = UNARY_OP_TAMU | FRM_RMM_P,
+ UNARY_OP_TAMU_FRM_RNE = UNARY_OP_TAMU | FRM_RNE_P,
/* Binary operator. */
BINARY_OP = __NORMAL_OP | BINARY_OP_P,
@@ -469,6 +473,7 @@ void expand_vec_nearbyint (rtx, rtx, machine_mode, machine_mode);
void expand_vec_rint (rtx, rtx, machine_mode, machine_mode);
void expand_vec_round (rtx, rtx, machine_mode, machine_mode);
void expand_vec_trunc (rtx, rtx, machine_mode, machine_mode);
+void expand_vec_roundeven (rtx, rtx, machine_mode, machine_mode);
#endif
bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode,
bool, void (*)(rtx *, rtx));
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 8992977a51d..359fb2ced8b 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -332,6 +332,8 @@ public:
add_rounding_mode_operand (FRM_RDN);
else if (m_insn_flags & FRM_RMM_P)
add_rounding_mode_operand (FRM_RMM);
+ else if (m_insn_flags & FRM_RNE_P)
+ add_rounding_mode_operand (FRM_RNE);
gcc_assert (insn_data[(int) icode].n_operands == m_opno);
expand (icode, any_mem_p);
@@ -3776,4 +3778,26 @@ expand_vec_trunc (rtx op_0, rtx op_1, machine_mode vec_fp_mode,
emit_vec_copysign (op_0, op_0, op_1, vec_fp_mode);
}
+void
+expand_vec_roundeven (rtx op_0, rtx op_1, machine_mode vec_fp_mode,
+ machine_mode vec_int_mode)
+{
+ /* Step-1: Get the abs float value for mask generation. */
+ emit_vec_abs (op_0, op_1, vec_fp_mode);
+
+ /* Step-2: Generate the mask on const fp. */
+ rtx const_fp = get_fp_rounding_coefficient (GET_MODE_INNER (vec_fp_mode));
+ rtx mask = emit_vec_float_cmp_mask (op_0, LT, const_fp, vec_fp_mode);
+
+ /* Step-3: Convert to integer on mask, rounding to nearest, ties to even. */
+ rtx tmp = gen_reg_rtx (vec_int_mode);
+ emit_vec_cvt_x_f (tmp, op_1, mask, UNARY_OP_TAMU_FRM_RNE, vec_fp_mode);
+
+ /* Step-4: Convert to floating-point on mask for the rint result. */
+ emit_vec_cvt_f_x (op_0, tmp, mask, UNARY_OP_TAMU_FRM_RNE, vec_fp_mode);
+
+ /* Step-5: Retrieve the sign bit for -0.0. */
+ emit_vec_copysign (op_0, op_0, op_1, vec_fp_mode);
+}
+
} // namespace riscv_vector
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-0.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-0.c
new file mode 100644
index 00000000000..ab65e372f0e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-0.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "test-math.h"
+
+/*
+** test__Float16___builtin_roundevenf16:
+** frrm\s+[atx][0-9]+
+** ...
+** fsrmi\s+0
+** ...
+** vsetvli\s+[atx][0-9]+,\s*zero,\s*e16,\s*m1,\s*ta,\s*mu
+** vfabs\.v\s+v[0-9]+,\s*v[0-9]+
+** vmflt\.vf\s+v0,\s*v[0-9]+,\s*[fa]+[0-9]+
+** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t
+** vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t
+** vfsgnj\.vv\s+v[0-9]+,v[0-9]+,v[0-9]+
+** ...
+** fsrm\s+[atx][0-9]+
+** ...
+*/
+TEST_UNARY_CALL (_Float16, __builtin_roundevenf16)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-1.c
new file mode 100644
index 00000000000..fac85ed0895
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-1.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "test-math.h"
+
+/*
+** test_float___builtin_roundevenf:
+** frrm\s+[atx][0-9]+
+** ...
+** fsrmi\s+0
+** ...
+** vsetvli\s+[atx][0-9]+,\s*zero,\s*e32,\s*m1,\s*ta,\s*mu
+** vfabs\.v\s+v[0-9]+,\s*v[0-9]+
+** vmflt\.vf\s+v0,\s*v[0-9]+,\s*[fa]+[0-9]+
+** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t
+** vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t
+** vfsgnj\.vv\s+v[0-9]+,v[0-9]+,v[0-9]+
+** ...
+** fsrm\s+[atx][0-9]+
+** ...
+*/
+TEST_UNARY_CALL (float, __builtin_roundevenf)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-2.c
new file mode 100644
index 00000000000..074f1b4a1ae
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-2.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "test-math.h"
+
+/*
+** test_double___builtin_roundeven:
+** frrm\s+[atx][0-9]+
+** ...
+** fsrmi\s+0
+** ...
+** vsetvli\s+[atx][0-9]+,\s*zero,\s*e64,\s*m1,\s*ta,\s*mu
+** vfabs\.v\s+v[0-9]+,\s*v[0-9]+
+** vmflt\.vf\s+v0,\s*v[0-9]+,\s*[fa]+[0-9]+
+** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t
+** vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t
+** vfsgnj\.vv\s+v[0-9]+,v[0-9]+,v[0-9]+
+** ...
+** fsrm\s+[atx][0-9]+
+** ...
+*/
+TEST_UNARY_CALL (double, __builtin_roundeven)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-3.c
new file mode 100644
index 00000000000..c95e8eca007
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-roundeven-3.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "test-math.h"
+
+/*
+** test_float___builtin_roundevenf:
+** frrm\s+[atx][0-9]+
+** ...
+** fsrmi\s+0
+** ...
+** vsetvli\s+[atx][0-9]+,\s*zero,\s*e32,\s*m1,\s*ta,\s*mu
+** vfabs\.v\s+v[0-9]+,\s*v[0-9]+
+** vmflt\.vf\s+v0,\s*v[0-9]+,\s*[fa]+[0-9]+
+** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t
+** vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t
+** vfsgnj\.vv\s+v[0-9]+,v[0-9]+,v[0-9]+
+** ...
+** vmerge\.vvm\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+,\s*v0
+** ...
+** fsrm\s+[atx][0-9]+
+** ...
+*/
+TEST_COND_UNARY_CALL (float, __builtin_roundevenf)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-roundeven-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-roundeven-1.c
new file mode 100644
index 00000000000..8489d39481f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-roundeven-1.c
@@ -0,0 +1,56 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvfh_zvl4096b -mabi=lp64d -O3 --param=riscv-autovec-lmul=m8 -ffast-math -fdump-tree-optimized" } */
+
+#include "def.h"
+
+DEF_OP_V (roundevenf16, 1, _Float16, __builtin_roundevenf16)
+DEF_OP_V (roundevenf16, 2, _Float16, __builtin_roundevenf16)
+DEF_OP_V (roundevenf16, 4, _Float16, __builtin_roundevenf16)
+DEF_OP_V (roundevenf16, 8, _Float16, __builtin_roundevenf16)
+DEF_OP_V (roundevenf16, 16, _Float16, __builtin_roundevenf16)
+DEF_OP_V (roundevenf16, 32, _Float16, __builtin_roundevenf16)
+DEF_OP_V (roundevenf16, 64, _Float16, __builtin_roundevenf16)
+DEF_OP_V (roundevenf16, 128, _Float16, __builtin_roundevenf16)
+DEF_OP_V (roundevenf16, 256, _Float16, __builtin_roundevenf16)
+DEF_OP_V (roundevenf16, 512, _Float16, __builtin_roundevenf16)
+DEF_OP_V (roundevenf16, 1024, _Float16, __builtin_roundevenf16)
+DEF_OP_V (roundevenf16, 2048, _Float16, __builtin_roundevenf16)
+
+DEF_OP_V (roundevenf, 1, float, __builtin_roundevenf)
+DEF_OP_V (roundevenf, 2, float, __builtin_roundevenf)
+DEF_OP_V (roundevenf, 4, float, __builtin_roundevenf)
+DEF_OP_V (roundevenf, 8, float, __builtin_roundevenf)
+DEF_OP_V (roundevenf, 16, float, __builtin_roundevenf)
+DEF_OP_V (roundevenf, 32, float, __builtin_roundevenf)
+DEF_OP_V (roundevenf, 64, float, __builtin_roundevenf)
+DEF_OP_V (roundevenf, 128, float, __builtin_roundevenf)
+DEF_OP_V (roundevenf, 256, float, __builtin_roundevenf)
+DEF_OP_V (roundevenf, 512, float, __builtin_roundevenf)
+DEF_OP_V (roundevenf, 1024, float, __builtin_roundevenf)
+
+DEF_OP_V (roundeven, 1, double, __builtin_roundeven)
+DEF_OP_V (roundeven, 2, double, __builtin_roundeven)
+DEF_OP_V (roundeven, 4, double, __builtin_roundeven)
+DEF_OP_V (roundeven, 8, double, __builtin_roundeven)
+DEF_OP_V (roundeven, 16, double, __builtin_roundeven)
+DEF_OP_V (roundeven, 32, double, __builtin_roundeven)
+DEF_OP_V (roundeven, 64, double, __builtin_roundeven)
+DEF_OP_V (roundeven, 128, double, __builtin_roundeven)
+DEF_OP_V (roundeven, 256, double, __builtin_roundeven)
+DEF_OP_V (roundeven, 512, double, __builtin_roundeven)
+
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-not "1,1" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "2,2" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "4,4" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "16,16" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "32,32" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "64,64" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "128,128" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "256,256" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "512,512" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "1024,1024" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "2048,2048" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "4096,4096" "optimized" } } */
+/* { dg-final { scan-assembler-times {vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t} 30 } } */
+/* { dg-final { scan-assembler-times {vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t} 30 } } */
--
2.34.1
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2023-09-27 8:29 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-27 8:20 [PATCH v1] RISC-V: Support FP roundeven auto-vectorization pan2.li
2023-09-27 8:23 ` juzhe.zhong
2023-09-27 8:29 ` Li, Pan2
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).