* [PATCH 1/2] SVE intrinsics: Add strength reduction for division by constant.
@ 2024-07-17 7:02 Jennifer Schmitz
2024-07-17 7:57 ` Richard Sandiford
0 siblings, 1 reply; 6+ messages in thread
From: Jennifer Schmitz @ 2024-07-17 7:02 UTC (permalink / raw)
To: gcc-patches; +Cc: Richard Sandiford, Kyrylo Tkachov
[-- Attachment #1.1: Type: text/plain, Size: 995 bytes --]
This patch folds signed SVE division where all divisor elements are the same
power of 2 to svasrd. Tests were added to check 1) whether the transform is
applied, i.e. asrd is used, and 2) correctness for all possible input types
for svdiv, predication, and a variety of values. As the transform is applied
only to signed integers, correctness for predication and values was only
tested for svint32_t and svint64_t.
Existing svdiv tests were adjusted such that the divisor is no longer a
power of 2.
The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?
Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
gcc/
* config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl): Implement
fold and expand.
gcc/testsuite/
* gcc.target/aarch64/sve/div_const_1.c: New test.
* gcc.target/aarch64/sve/div_const_1_run.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/div_s32.c: Adjust expected output.
* gcc.target/aarch64/sve/acle/asm/div_s64.c: Likewise.
[-- Attachment #1.2: 0001-SVE-intrinsics-Add-strength-reduction-for-division-b.patch --]
[-- Type: application/octet-stream, Size: 14622 bytes --]
From e8ffbab52ad7b9307cbfc9dbca4ef4d20e08804b Mon Sep 17 00:00:00 2001
From: Jennifer Schmitz <jschmitz@nvidia.com>
Date: Tue, 16 Jul 2024 01:59:50 -0700
Subject: [PATCH 1/2] SVE intrinsics: Add strength reduction for division by
constant.
This patch folds signed SVE division where all divisor elements are the same
power of 2 to svasrd. Tests were added to check 1) whether the transform is
applied, i.e. asrd is used, and 2) correctness for all possible input types
for svdiv, predication, and a variety of values. As the transform is applied
only to signed integers, correctness for predication and values was only
tested for svint32_t and svint64_t.
Existing svdiv tests were adjusted such that the divisor is no longer a
power of 2.
The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?
Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
gcc/
* config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl): Implement
fold and expand.
gcc/testsuite/
* gcc.target/aarch64/sve/div_const_1.c: New test.
* gcc.target/aarch64/sve/div_const_1_run.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/div_s32.c: Adjust expected output.
* gcc.target/aarch64/sve/acle/asm/div_s64.c: Likewise.
---
.../aarch64/aarch64-sve-builtins-base.cc | 44 ++++++++-
.../gcc.target/aarch64/sve/acle/asm/div_s32.c | 60 ++++++------
.../gcc.target/aarch64/sve/acle/asm/div_s64.c | 60 ++++++------
.../gcc.target/aarch64/sve/div_const_1.c | 34 +++++++
.../gcc.target/aarch64/sve/div_const_1_run.c | 91 +++++++++++++++++++
5 files changed, 228 insertions(+), 61 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/div_const_1.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/div_const_1_run.c
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index aa26370d397..d821cc96588 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -746,6 +746,48 @@ public:
}
};
+class svdiv_impl : public unspec_based_function
+{
+public:
+ CONSTEXPR svdiv_impl ()
+ : unspec_based_function (DIV, UDIV, UNSPEC_COND_FDIV) {}
+
+ gimple *
+ fold (gimple_folder &f) const override
+ {
+ tree divisor = gimple_call_arg (f.call, 2);
+ tree divisor_cst = uniform_integer_cst_p (divisor);
+
+ if (f.type_suffix (0).unsigned_p)
+ {
+ return NULL;
+ }
+
+ if (!divisor_cst)
+ {
+ return NULL;
+ }
+
+ if (!integer_pow2p (divisor_cst))
+ {
+ return NULL;
+ }
+
+ function_instance instance ("svasrd", functions::svasrd, shapes::shift_right_imm, MODE_n, f.type_suffix_ids, GROUP_none, f.pred);
+ gcall *call = as_a <gcall *> (f.redirect_call (instance));
+ tree shift_amt = wide_int_to_tree (TREE_TYPE (divisor_cst), tree_log2 (divisor_cst));
+ gimple_call_set_arg (call, 2, shift_amt);
+ return call;
+ }
+
+ rtx
+ expand (function_expander &e) const override
+ {
+ return e.map_to_rtx_codes (DIV, UDIV, UNSPEC_COND_FDIV, -1, DEFAULT_MERGE_ARGNO);
+ }
+};
+
+
class svdot_impl : public function_base
{
public:
@@ -3043,7 +3085,7 @@ FUNCTION (svcreate3, svcreate_impl, (3))
FUNCTION (svcreate4, svcreate_impl, (4))
FUNCTION (svcvt, svcvt_impl,)
FUNCTION (svcvtnt, CODE_FOR_MODE0 (aarch64_sve_cvtnt),)
-FUNCTION (svdiv, rtx_code_function, (DIV, UDIV, UNSPEC_COND_FDIV))
+FUNCTION (svdiv, svdiv_impl,)
FUNCTION (svdivr, rtx_code_function_rotated, (DIV, UDIV, UNSPEC_COND_FDIV))
FUNCTION (svdot, svdot_impl,)
FUNCTION (svdot_lane, svdotprod_lane_impl, (UNSPEC_SDOT, UNSPEC_UDOT,
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c
index c49ca1aa524..da2fe7c5451 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c
@@ -54,25 +54,25 @@ TEST_UNIFORM_ZX (div_w0_s32_m_untied, svint32_t, int32_t,
z0 = svdiv_m (p0, z1, x0))
/*
-** div_2_s32_m_tied1:
-** mov (z[0-9]+\.s), #2
+** div_3_s32_m_tied1:
+** mov (z[0-9]+\.s), #3
** sdiv z0\.s, p0/m, z0\.s, \1
** ret
*/
-TEST_UNIFORM_Z (div_2_s32_m_tied1, svint32_t,
- z0 = svdiv_n_s32_m (p0, z0, 2),
- z0 = svdiv_m (p0, z0, 2))
+TEST_UNIFORM_Z (div_3_s32_m_tied1, svint32_t,
+ z0 = svdiv_n_s32_m (p0, z0, 3),
+ z0 = svdiv_m (p0, z0, 3))
/*
-** div_2_s32_m_untied:
-** mov (z[0-9]+\.s), #2
+** div_3_s32_m_untied:
+** mov (z[0-9]+\.s), #3
** movprfx z0, z1
** sdiv z0\.s, p0/m, z0\.s, \1
** ret
*/
-TEST_UNIFORM_Z (div_2_s32_m_untied, svint32_t,
- z0 = svdiv_n_s32_m (p0, z1, 2),
- z0 = svdiv_m (p0, z1, 2))
+TEST_UNIFORM_Z (div_3_s32_m_untied, svint32_t,
+ z0 = svdiv_n_s32_m (p0, z1, 3),
+ z0 = svdiv_m (p0, z1, 3))
/*
** div_s32_z_tied1:
@@ -137,19 +137,19 @@ TEST_UNIFORM_ZX (div_w0_s32_z_untied, svint32_t, int32_t,
z0 = svdiv_z (p0, z1, x0))
/*
-** div_2_s32_z_tied1:
-** mov (z[0-9]+\.s), #2
+** div_3_s32_z_tied1:
+** mov (z[0-9]+\.s), #3
** movprfx z0\.s, p0/z, z0\.s
** sdiv z0\.s, p0/m, z0\.s, \1
** ret
*/
-TEST_UNIFORM_Z (div_2_s32_z_tied1, svint32_t,
- z0 = svdiv_n_s32_z (p0, z0, 2),
- z0 = svdiv_z (p0, z0, 2))
+TEST_UNIFORM_Z (div_3_s32_z_tied1, svint32_t,
+ z0 = svdiv_n_s32_z (p0, z0, 3),
+ z0 = svdiv_z (p0, z0, 3))
/*
-** div_2_s32_z_untied:
-** mov (z[0-9]+\.s), #2
+** div_3_s32_z_untied:
+** mov (z[0-9]+\.s), #3
** (
** movprfx z0\.s, p0/z, z1\.s
** sdiv z0\.s, p0/m, z0\.s, \1
@@ -159,9 +159,9 @@ TEST_UNIFORM_Z (div_2_s32_z_tied1, svint32_t,
** )
** ret
*/
-TEST_UNIFORM_Z (div_2_s32_z_untied, svint32_t,
- z0 = svdiv_n_s32_z (p0, z1, 2),
- z0 = svdiv_z (p0, z1, 2))
+TEST_UNIFORM_Z (div_3_s32_z_untied, svint32_t,
+ z0 = svdiv_n_s32_z (p0, z1, 3),
+ z0 = svdiv_z (p0, z1, 3))
/*
** div_s32_x_tied1:
@@ -217,21 +217,21 @@ TEST_UNIFORM_ZX (div_w0_s32_x_untied, svint32_t, int32_t,
z0 = svdiv_x (p0, z1, x0))
/*
-** div_2_s32_x_tied1:
-** mov (z[0-9]+\.s), #2
+** div_3_s32_x_tied1:
+** mov (z[0-9]+\.s), #3
** sdiv z0\.s, p0/m, z0\.s, \1
** ret
*/
-TEST_UNIFORM_Z (div_2_s32_x_tied1, svint32_t,
- z0 = svdiv_n_s32_x (p0, z0, 2),
- z0 = svdiv_x (p0, z0, 2))
+TEST_UNIFORM_Z (div_3_s32_x_tied1, svint32_t,
+ z0 = svdiv_n_s32_x (p0, z0, 3),
+ z0 = svdiv_x (p0, z0, 3))
/*
-** div_2_s32_x_untied:
-** mov z0\.s, #2
+** div_3_s32_x_untied:
+** mov z0\.s, #3
** sdivr z0\.s, p0/m, z0\.s, z1\.s
** ret
*/
-TEST_UNIFORM_Z (div_2_s32_x_untied, svint32_t,
- z0 = svdiv_n_s32_x (p0, z1, 2),
- z0 = svdiv_x (p0, z1, 2))
+TEST_UNIFORM_Z (div_3_s32_x_untied, svint32_t,
+ z0 = svdiv_n_s32_x (p0, z1, 3),
+ z0 = svdiv_x (p0, z1, 3))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s64.c
index 464dca28d74..e4af406344b 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s64.c
@@ -54,25 +54,25 @@ TEST_UNIFORM_ZX (div_x0_s64_m_untied, svint64_t, int64_t,
z0 = svdiv_m (p0, z1, x0))
/*
-** div_2_s64_m_tied1:
-** mov (z[0-9]+\.d), #2
+** div_3_s64_m_tied1:
+** mov (z[0-9]+\.d), #3
** sdiv z0\.d, p0/m, z0\.d, \1
** ret
*/
-TEST_UNIFORM_Z (div_2_s64_m_tied1, svint64_t,
- z0 = svdiv_n_s64_m (p0, z0, 2),
- z0 = svdiv_m (p0, z0, 2))
+TEST_UNIFORM_Z (div_3_s64_m_tied1, svint64_t,
+ z0 = svdiv_n_s64_m (p0, z0, 3),
+ z0 = svdiv_m (p0, z0, 3))
/*
-** div_2_s64_m_untied:
-** mov (z[0-9]+\.d), #2
+** div_3_s64_m_untied:
+** mov (z[0-9]+\.d), #3
** movprfx z0, z1
** sdiv z0\.d, p0/m, z0\.d, \1
** ret
*/
-TEST_UNIFORM_Z (div_2_s64_m_untied, svint64_t,
- z0 = svdiv_n_s64_m (p0, z1, 2),
- z0 = svdiv_m (p0, z1, 2))
+TEST_UNIFORM_Z (div_3_s64_m_untied, svint64_t,
+ z0 = svdiv_n_s64_m (p0, z1, 3),
+ z0 = svdiv_m (p0, z1, 3))
/*
** div_s64_z_tied1:
@@ -137,19 +137,19 @@ TEST_UNIFORM_ZX (div_x0_s64_z_untied, svint64_t, int64_t,
z0 = svdiv_z (p0, z1, x0))
/*
-** div_2_s64_z_tied1:
-** mov (z[0-9]+\.d), #2
+** div_3_s64_z_tied1:
+** mov (z[0-9]+\.d), #3
** movprfx z0\.d, p0/z, z0\.d
** sdiv z0\.d, p0/m, z0\.d, \1
** ret
*/
-TEST_UNIFORM_Z (div_2_s64_z_tied1, svint64_t,
- z0 = svdiv_n_s64_z (p0, z0, 2),
- z0 = svdiv_z (p0, z0, 2))
+TEST_UNIFORM_Z (div_3_s64_z_tied1, svint64_t,
+ z0 = svdiv_n_s64_z (p0, z0, 3),
+ z0 = svdiv_z (p0, z0, 3))
/*
-** div_2_s64_z_untied:
-** mov (z[0-9]+\.d), #2
+** div_3_s64_z_untied:
+** mov (z[0-9]+\.d), #3
** (
** movprfx z0\.d, p0/z, z1\.d
** sdiv z0\.d, p0/m, z0\.d, \1
@@ -159,9 +159,9 @@ TEST_UNIFORM_Z (div_2_s64_z_tied1, svint64_t,
** )
** ret
*/
-TEST_UNIFORM_Z (div_2_s64_z_untied, svint64_t,
- z0 = svdiv_n_s64_z (p0, z1, 2),
- z0 = svdiv_z (p0, z1, 2))
+TEST_UNIFORM_Z (div_3_s64_z_untied, svint64_t,
+ z0 = svdiv_n_s64_z (p0, z1, 3),
+ z0 = svdiv_z (p0, z1, 3))
/*
** div_s64_x_tied1:
@@ -217,21 +217,21 @@ TEST_UNIFORM_ZX (div_x0_s64_x_untied, svint64_t, int64_t,
z0 = svdiv_x (p0, z1, x0))
/*
-** div_2_s64_x_tied1:
-** mov (z[0-9]+\.d), #2
+** div_3_s64_x_tied1:
+** mov (z[0-9]+\.d), #3
** sdiv z0\.d, p0/m, z0\.d, \1
** ret
*/
-TEST_UNIFORM_Z (div_2_s64_x_tied1, svint64_t,
- z0 = svdiv_n_s64_x (p0, z0, 2),
- z0 = svdiv_x (p0, z0, 2))
+TEST_UNIFORM_Z (div_3_s64_x_tied1, svint64_t,
+ z0 = svdiv_n_s64_x (p0, z0, 3),
+ z0 = svdiv_x (p0, z0, 3))
/*
-** div_2_s64_x_untied:
-** mov z0\.d, #2
+** div_3_s64_x_untied:
+** mov z0\.d, #3
** sdivr z0\.d, p0/m, z0\.d, z1\.d
** ret
*/
-TEST_UNIFORM_Z (div_2_s64_x_untied, svint64_t,
- z0 = svdiv_n_s64_x (p0, z1, 2),
- z0 = svdiv_x (p0, z1, 2))
+TEST_UNIFORM_Z (div_3_s64_x_untied, svint64_t,
+ z0 = svdiv_n_s64_x (p0, z1, 3),
+ z0 = svdiv_x (p0, z1, 3))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/div_const_1.c b/gcc/testsuite/gcc.target/aarch64/sve/div_const_1.c
new file mode 100644
index 00000000000..ac6ef1c73d4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/div_const_1.c
@@ -0,0 +1,34 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msve-vector-bits=128" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <arm_sve.h>
+
+typedef svbool_t pred __attribute__((arm_sve_vector_bits(128)));
+typedef svint64_t svint64_2 __attribute__((arm_sve_vector_bits(128)));
+typedef svuint64_t svuint64_2 __attribute__((arm_sve_vector_bits(128)));
+
+/*
+** f1:
+** ptrue (p[0-7])\.b, vl16
+** asrd (z[0-9]+\.d), \1/m, \2, #2
+** ret
+*/
+svint64_2 f1 (svint64_2 p)
+{
+ const pred pg = svptrue_b64 ();
+ return svdiv_x (pg, p, (svint64_2) {4, 4});
+}
+
+/*
+** f2:
+** ptrue (p[0-7])\.b, vl16
+** mov (z[0-9]+\.d), #4
+** udiv (z[0-9]+\.d), \1/m, \3, \2
+** ret
+*/
+svuint64_2 f2 (svuint64_2 p)
+{
+ const pred pg = svptrue_b64 ();
+ return svdiv_x (pg, p, (svuint64_2) {4, 4});
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/div_const_1_run.c b/gcc/testsuite/gcc.target/aarch64/sve/div_const_1_run.c
new file mode 100644
index 00000000000..a15c597d5bd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/div_const_1_run.c
@@ -0,0 +1,91 @@
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -msve-vector-bits=128" } */
+
+#include <arm_sve.h>
+#include <stdint.h>
+
+typedef svbool_t pred __attribute__((arm_sve_vector_bits(128)));
+typedef svfloat16_t svfloat16_ __attribute__((arm_sve_vector_bits(128)));
+typedef svfloat32_t svfloat32_ __attribute__((arm_sve_vector_bits(128)));
+typedef svfloat64_t svfloat64_ __attribute__((arm_sve_vector_bits(128)));
+typedef svint32_t svint32_ __attribute__((arm_sve_vector_bits(128)));
+typedef svint64_t svint64_ __attribute__((arm_sve_vector_bits(128)));
+typedef svuint32_t svuint32_ __attribute__((arm_sve_vector_bits(128)));
+typedef svuint64_t svuint64_ __attribute__((arm_sve_vector_bits(128)));
+
+#define T1(TY, TYS, P) \
+{ \
+ TY##_t a = (TY##_t) 79; \
+ TY##_t b = (TY##_t) 16; \
+ sv##TY##_ res = svdiv_##P (pg, svdup_##TYS (a), svdup_##TYS (b)); \
+ sv##TY##_ exp = svdup_##TYS (a / b); \
+ if (svptest_any (pg, svcmpne (pg, exp, res))) \
+ __builtin_abort (); \
+}
+
+#define T2(B) \
+{ \
+ int##B##_t a[] = {0, -1, 1, INT##B##_MAX, INT##B##_MIN, -5, 5}; \
+ int##B##_t b[] = {-1, 1, -4, 4, -5, 5, INT##B##_MAX, INT##B##_MIN}; \
+ int length_a = sizeof (a) / sizeof (a[0]); \
+ int length_b = sizeof (b) / sizeof (b[0]); \
+ for (int i = 0; i < length_a; ++i) \
+ { \
+ for (int j = 0; j < length_b; ++j) \
+ { \
+ svint##B##_ op1 = svdup_s##B (a[i]); \
+ svint##B##_ op2 = svdup_s##B (b[j]); \
+ svint##B##_ res = svdiv_x (pg, op1, op2); \
+ svint##B##_ exp = svdup_s##B (a[i] / b[j]); \
+ if (svptest_any (pg, svcmpne (pg, exp, res))) \
+ __builtin_abort (); \
+ } \
+ } \
+}
+
+#define TEST_VALUES_ASRD2 \
+{ \
+ svint32_ op1_32 = (svint32_) {0, 16, -79, -1}; \
+ svint32_ op2_32 = (svint32_) {5, 8, -32, 1}; \
+ svint32_ res_32 = svdiv_x (pg, op1_32, op2_32); \
+ svint32_ exp_32 = (svint32_) {0 / 5, 16 / 8, -79 / -32, -1 / 1}; \
+ if (svptest_any (pg, svcmpne (pg, exp_32, res_32))) \
+ __builtin_abort (); \
+ \
+ svint64_ op1_64 = (svint64_) {83, -11}; \
+ svint64_ op2_64 = (svint64_) {16, 5}; \
+ svint64_ res_64 = svdiv_x (pg, op1_64, op2_64); \
+ svint64_ exp_64 = (svint64_) {83 / 16, -11 / 5}; \
+ if (svptest_any (pg, svcmpne (pg, exp_64, res_64))) \
+ __builtin_abort (); \
+}
+
+#define TEST_TYPES(T) \
+ T (float16, f16, x) \
+ T (float32, f32, x) \
+ T (float64, f64, x) \
+ T (int32, s32, x) \
+ T (int64, s64, x) \
+ T (uint32, u32, x) \
+ T (uint64, u64, x) \
+
+#define TEST_PREDICATION(T) \
+ T (int32, s32, z) \
+ T (int32, s32, m) \
+ T (int64, s64, z) \
+ T (int64, s64, m) \
+
+#define TEST_VALUES_ASRD1(T) \
+ T (32) \
+ T (64) \
+
+int
+main (void)
+{
+ const pred pg = svptrue_b64 ();
+ TEST_TYPES (T1)
+ TEST_PREDICATION (T1)
+ TEST_VALUES_ASRD1 (T2)
+ TEST_VALUES_ASRD2
+ return 0;
+}
--
2.44.0
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 4641 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] SVE intrinsics: Add strength reduction for division by constant.
2024-07-17 7:02 [PATCH 1/2] SVE intrinsics: Add strength reduction for division by constant Jennifer Schmitz
@ 2024-07-17 7:57 ` Richard Sandiford
2024-07-29 14:07 ` Jennifer Schmitz
0 siblings, 1 reply; 6+ messages in thread
From: Richard Sandiford @ 2024-07-17 7:57 UTC (permalink / raw)
To: Jennifer Schmitz; +Cc: gcc-patches, Kyrylo Tkachov
Jennifer Schmitz <jschmitz@nvidia.com> writes:
> This patch folds signed SVE division where all divisor elements are the same
> power of 2 to svasrd. Tests were added to check 1) whether the transform is
> applied, i.e. asrd is used, and 2) correctness for all possible input types
> for svdiv, predication, and a variety of values. As the transform is applied
> only to signed integers, correctness for predication and values was only
> tested for svint32_t and svint64_t.
> Existing svdiv tests were adjusted such that the divisor is no longer a
> power of 2.
>
> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
> OK for mainline?
>
> Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
>
> gcc/
>
> * config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl): Implement
> fold and expand.
>
> gcc/testsuite/
>
> * gcc.target/aarch64/sve/div_const_1.c: New test.
> * gcc.target/aarch64/sve/div_const_1_run.c: Likewise.
> * gcc.target/aarch64/sve/acle/asm/div_s32.c: Adjust expected output.
> * gcc.target/aarch64/sve/acle/asm/div_s64.c: Likewise.
>
> From e8ffbab52ad7b9307cbfc9dbca4ef4d20e08804b Mon Sep 17 00:00:00 2001
> From: Jennifer Schmitz <jschmitz@nvidia.com>
> Date: Tue, 16 Jul 2024 01:59:50 -0700
> Subject: [PATCH 1/2] SVE intrinsics: Add strength reduction for division by
> constant.
>
> This patch folds signed SVE division where all divisor elements are the same
> power of 2 to svasrd. Tests were added to check 1) whether the transform is
> applied, i.e. asrd is used, and 2) correctness for all possible input types
> for svdiv, predication, and a variety of values. As the transform is applied
> only to signed integers, correctness for predication and values was only
> tested for svint32_t and svint64_t.
> Existing svdiv tests were adjusted such that the divisor is no longer a
> power of 2.
>
> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
> OK for mainline?
>
> Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
>
> gcc/
>
> * config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl): Implement
> fold and expand.
>
> gcc/testsuite/
>
> * gcc.target/aarch64/sve/div_const_1.c: New test.
> * gcc.target/aarch64/sve/div_const_1_run.c: Likewise.
> * gcc.target/aarch64/sve/acle/asm/div_s32.c: Adjust expected output.
> * gcc.target/aarch64/sve/acle/asm/div_s64.c: Likewise.
> ---
> .../aarch64/aarch64-sve-builtins-base.cc | 44 ++++++++-
> .../gcc.target/aarch64/sve/acle/asm/div_s32.c | 60 ++++++------
> .../gcc.target/aarch64/sve/acle/asm/div_s64.c | 60 ++++++------
> .../gcc.target/aarch64/sve/div_const_1.c | 34 +++++++
> .../gcc.target/aarch64/sve/div_const_1_run.c | 91 +++++++++++++++++++
> 5 files changed, 228 insertions(+), 61 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/div_const_1.c
> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/div_const_1_run.c
>
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> index aa26370d397..d821cc96588 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> @@ -746,6 +746,48 @@ public:
> }
> };
>
> +class svdiv_impl : public unspec_based_function
> +{
> +public:
> + CONSTEXPR svdiv_impl ()
> + : unspec_based_function (DIV, UDIV, UNSPEC_COND_FDIV) {}
> +
> + gimple *
> + fold (gimple_folder &f) const override
> + {
> + tree divisor = gimple_call_arg (f.call, 2);
> + tree divisor_cst = uniform_integer_cst_p (divisor);
> +
> + if (f.type_suffix (0).unsigned_p)
> + {
> + return NULL;
> + }
We might as well test this first, since it doesn't depend on the
divisor_cst result.
Formatting nit: should be no braces for single statements, so:
if (f.type_suffix (0).unsigned_p)
return NULL;
Same for the others.
> +
> + if (!divisor_cst)
> + {
> + return NULL;
> + }
> +
> + if (!integer_pow2p (divisor_cst))
> + {
> + return NULL;
> + }
> +
> + function_instance instance ("svasrd", functions::svasrd, shapes::shift_right_imm, MODE_n, f.type_suffix_ids, GROUP_none, f.pred);
This line is above the 80 character limit. Maybe:
function_instance instance ("svasrd", functions::svasrd,
shapes::shift_right_imm, MODE_n,
f.type_suffix_ids, GROUP_none, f.pred);
> + gcall *call = as_a <gcall *> (f.redirect_call (instance));
Looks like an oversight that redirect_call doesn't return a gcall directly.
IMO it'd better to fix that instead.
> + tree shift_amt = wide_int_to_tree (TREE_TYPE (divisor_cst), tree_log2 (divisor_cst));
This ought to have type uint64_t instead, to match the function prototype.
That can be had from scalar_types[VECTOR_TYPE_svuint64_t].
> + gimple_call_set_arg (call, 2, shift_amt);
> + return call;
> + }
> +
> + rtx
> + expand (function_expander &e) const override
> + {
> + return e.map_to_rtx_codes (DIV, UDIV, UNSPEC_COND_FDIV, -1, DEFAULT_MERGE_ARGNO);
> + }
This shouldn't be necessary, given the inheritance from unspec_based_function.
> +};
> +
> +
> class svdot_impl : public function_base
> {
> public:
> @@ -3043,7 +3085,7 @@ FUNCTION (svcreate3, svcreate_impl, (3))
> FUNCTION (svcreate4, svcreate_impl, (4))
> FUNCTION (svcvt, svcvt_impl,)
> FUNCTION (svcvtnt, CODE_FOR_MODE0 (aarch64_sve_cvtnt),)
> -FUNCTION (svdiv, rtx_code_function, (DIV, UDIV, UNSPEC_COND_FDIV))
> +FUNCTION (svdiv, svdiv_impl,)
> FUNCTION (svdivr, rtx_code_function_rotated, (DIV, UDIV, UNSPEC_COND_FDIV))
> FUNCTION (svdot, svdot_impl,)
> FUNCTION (svdot_lane, svdotprod_lane_impl, (UNSPEC_SDOT, UNSPEC_UDOT,
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c
> index c49ca1aa524..da2fe7c5451 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c
> @@ -54,25 +54,25 @@ TEST_UNIFORM_ZX (div_w0_s32_m_untied, svint32_t, int32_t,
> z0 = svdiv_m (p0, z1, x0))
>
> /*
> -** div_2_s32_m_tied1:
> -** mov (z[0-9]+\.s), #2
> +** div_3_s32_m_tied1:
> +** mov (z[0-9]+\.s), #3
> ** sdiv z0\.s, p0/m, z0\.s, \1
> ** ret
> */
> -TEST_UNIFORM_Z (div_2_s32_m_tied1, svint32_t,
> - z0 = svdiv_n_s32_m (p0, z0, 2),
> - z0 = svdiv_m (p0, z0, 2))
> +TEST_UNIFORM_Z (div_3_s32_m_tied1, svint32_t,
> + z0 = svdiv_n_s32_m (p0, z0, 3),
> + z0 = svdiv_m (p0, z0, 3))
I think we should test both 2 and 3, using this harness to make sure
that svdiv of 2 does become svasrd. (Especially since the new test
is specific to fixed-length vectors.)
It would be good to test the limits too: 1 and 1<<30. Presumably
0b1000... (-1<<31) shouldn't be optimised, so we should test that too.
Same idea (with adjusted limits) for s64.
Thanks,
Richard
>
> /*
> -** div_2_s32_m_untied:
> -** mov (z[0-9]+\.s), #2
> +** div_3_s32_m_untied:
> +** mov (z[0-9]+\.s), #3
> ** movprfx z0, z1
> ** sdiv z0\.s, p0/m, z0\.s, \1
> ** ret
> */
> -TEST_UNIFORM_Z (div_2_s32_m_untied, svint32_t,
> - z0 = svdiv_n_s32_m (p0, z1, 2),
> - z0 = svdiv_m (p0, z1, 2))
> +TEST_UNIFORM_Z (div_3_s32_m_untied, svint32_t,
> + z0 = svdiv_n_s32_m (p0, z1, 3),
> + z0 = svdiv_m (p0, z1, 3))
>
> /*
> ** div_s32_z_tied1:
> @@ -137,19 +137,19 @@ TEST_UNIFORM_ZX (div_w0_s32_z_untied, svint32_t, int32_t,
> z0 = svdiv_z (p0, z1, x0))
>
> /*
> -** div_2_s32_z_tied1:
> -** mov (z[0-9]+\.s), #2
> +** div_3_s32_z_tied1:
> +** mov (z[0-9]+\.s), #3
> ** movprfx z0\.s, p0/z, z0\.s
> ** sdiv z0\.s, p0/m, z0\.s, \1
> ** ret
> */
> -TEST_UNIFORM_Z (div_2_s32_z_tied1, svint32_t,
> - z0 = svdiv_n_s32_z (p0, z0, 2),
> - z0 = svdiv_z (p0, z0, 2))
> +TEST_UNIFORM_Z (div_3_s32_z_tied1, svint32_t,
> + z0 = svdiv_n_s32_z (p0, z0, 3),
> + z0 = svdiv_z (p0, z0, 3))
>
> /*
> -** div_2_s32_z_untied:
> -** mov (z[0-9]+\.s), #2
> +** div_3_s32_z_untied:
> +** mov (z[0-9]+\.s), #3
> ** (
> ** movprfx z0\.s, p0/z, z1\.s
> ** sdiv z0\.s, p0/m, z0\.s, \1
> @@ -159,9 +159,9 @@ TEST_UNIFORM_Z (div_2_s32_z_tied1, svint32_t,
> ** )
> ** ret
> */
> -TEST_UNIFORM_Z (div_2_s32_z_untied, svint32_t,
> - z0 = svdiv_n_s32_z (p0, z1, 2),
> - z0 = svdiv_z (p0, z1, 2))
> +TEST_UNIFORM_Z (div_3_s32_z_untied, svint32_t,
> + z0 = svdiv_n_s32_z (p0, z1, 3),
> + z0 = svdiv_z (p0, z1, 3))
>
> /*
> ** div_s32_x_tied1:
> @@ -217,21 +217,21 @@ TEST_UNIFORM_ZX (div_w0_s32_x_untied, svint32_t, int32_t,
> z0 = svdiv_x (p0, z1, x0))
>
> /*
> -** div_2_s32_x_tied1:
> -** mov (z[0-9]+\.s), #2
> +** div_3_s32_x_tied1:
> +** mov (z[0-9]+\.s), #3
> ** sdiv z0\.s, p0/m, z0\.s, \1
> ** ret
> */
> -TEST_UNIFORM_Z (div_2_s32_x_tied1, svint32_t,
> - z0 = svdiv_n_s32_x (p0, z0, 2),
> - z0 = svdiv_x (p0, z0, 2))
> +TEST_UNIFORM_Z (div_3_s32_x_tied1, svint32_t,
> + z0 = svdiv_n_s32_x (p0, z0, 3),
> + z0 = svdiv_x (p0, z0, 3))
>
> /*
> -** div_2_s32_x_untied:
> -** mov z0\.s, #2
> +** div_3_s32_x_untied:
> +** mov z0\.s, #3
> ** sdivr z0\.s, p0/m, z0\.s, z1\.s
> ** ret
> */
> -TEST_UNIFORM_Z (div_2_s32_x_untied, svint32_t,
> - z0 = svdiv_n_s32_x (p0, z1, 2),
> - z0 = svdiv_x (p0, z1, 2))
> +TEST_UNIFORM_Z (div_3_s32_x_untied, svint32_t,
> + z0 = svdiv_n_s32_x (p0, z1, 3),
> + z0 = svdiv_x (p0, z1, 3))
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s64.c
> index 464dca28d74..e4af406344b 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s64.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s64.c
> @@ -54,25 +54,25 @@ TEST_UNIFORM_ZX (div_x0_s64_m_untied, svint64_t, int64_t,
> z0 = svdiv_m (p0, z1, x0))
>
> /*
> -** div_2_s64_m_tied1:
> -** mov (z[0-9]+\.d), #2
> +** div_3_s64_m_tied1:
> +** mov (z[0-9]+\.d), #3
> ** sdiv z0\.d, p0/m, z0\.d, \1
> ** ret
> */
> -TEST_UNIFORM_Z (div_2_s64_m_tied1, svint64_t,
> - z0 = svdiv_n_s64_m (p0, z0, 2),
> - z0 = svdiv_m (p0, z0, 2))
> +TEST_UNIFORM_Z (div_3_s64_m_tied1, svint64_t,
> + z0 = svdiv_n_s64_m (p0, z0, 3),
> + z0 = svdiv_m (p0, z0, 3))
>
> /*
> -** div_2_s64_m_untied:
> -** mov (z[0-9]+\.d), #2
> +** div_3_s64_m_untied:
> +** mov (z[0-9]+\.d), #3
> ** movprfx z0, z1
> ** sdiv z0\.d, p0/m, z0\.d, \1
> ** ret
> */
> -TEST_UNIFORM_Z (div_2_s64_m_untied, svint64_t,
> - z0 = svdiv_n_s64_m (p0, z1, 2),
> - z0 = svdiv_m (p0, z1, 2))
> +TEST_UNIFORM_Z (div_3_s64_m_untied, svint64_t,
> + z0 = svdiv_n_s64_m (p0, z1, 3),
> + z0 = svdiv_m (p0, z1, 3))
>
> /*
> ** div_s64_z_tied1:
> @@ -137,19 +137,19 @@ TEST_UNIFORM_ZX (div_x0_s64_z_untied, svint64_t, int64_t,
> z0 = svdiv_z (p0, z1, x0))
>
> /*
> -** div_2_s64_z_tied1:
> -** mov (z[0-9]+\.d), #2
> +** div_3_s64_z_tied1:
> +** mov (z[0-9]+\.d), #3
> ** movprfx z0\.d, p0/z, z0\.d
> ** sdiv z0\.d, p0/m, z0\.d, \1
> ** ret
> */
> -TEST_UNIFORM_Z (div_2_s64_z_tied1, svint64_t,
> - z0 = svdiv_n_s64_z (p0, z0, 2),
> - z0 = svdiv_z (p0, z0, 2))
> +TEST_UNIFORM_Z (div_3_s64_z_tied1, svint64_t,
> + z0 = svdiv_n_s64_z (p0, z0, 3),
> + z0 = svdiv_z (p0, z0, 3))
>
> /*
> -** div_2_s64_z_untied:
> -** mov (z[0-9]+\.d), #2
> +** div_3_s64_z_untied:
> +** mov (z[0-9]+\.d), #3
> ** (
> ** movprfx z0\.d, p0/z, z1\.d
> ** sdiv z0\.d, p0/m, z0\.d, \1
> @@ -159,9 +159,9 @@ TEST_UNIFORM_Z (div_2_s64_z_tied1, svint64_t,
> ** )
> ** ret
> */
> -TEST_UNIFORM_Z (div_2_s64_z_untied, svint64_t,
> - z0 = svdiv_n_s64_z (p0, z1, 2),
> - z0 = svdiv_z (p0, z1, 2))
> +TEST_UNIFORM_Z (div_3_s64_z_untied, svint64_t,
> + z0 = svdiv_n_s64_z (p0, z1, 3),
> + z0 = svdiv_z (p0, z1, 3))
>
> /*
> ** div_s64_x_tied1:
> @@ -217,21 +217,21 @@ TEST_UNIFORM_ZX (div_x0_s64_x_untied, svint64_t, int64_t,
> z0 = svdiv_x (p0, z1, x0))
>
> /*
> -** div_2_s64_x_tied1:
> -** mov (z[0-9]+\.d), #2
> +** div_3_s64_x_tied1:
> +** mov (z[0-9]+\.d), #3
> ** sdiv z0\.d, p0/m, z0\.d, \1
> ** ret
> */
> -TEST_UNIFORM_Z (div_2_s64_x_tied1, svint64_t,
> - z0 = svdiv_n_s64_x (p0, z0, 2),
> - z0 = svdiv_x (p0, z0, 2))
> +TEST_UNIFORM_Z (div_3_s64_x_tied1, svint64_t,
> + z0 = svdiv_n_s64_x (p0, z0, 3),
> + z0 = svdiv_x (p0, z0, 3))
>
> /*
> -** div_2_s64_x_untied:
> -** mov z0\.d, #2
> +** div_3_s64_x_untied:
> +** mov z0\.d, #3
> ** sdivr z0\.d, p0/m, z0\.d, z1\.d
> ** ret
> */
> -TEST_UNIFORM_Z (div_2_s64_x_untied, svint64_t,
> - z0 = svdiv_n_s64_x (p0, z1, 2),
> - z0 = svdiv_x (p0, z1, 2))
> +TEST_UNIFORM_Z (div_3_s64_x_untied, svint64_t,
> + z0 = svdiv_n_s64_x (p0, z1, 3),
> + z0 = svdiv_x (p0, z1, 3))
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/div_const_1.c b/gcc/testsuite/gcc.target/aarch64/sve/div_const_1.c
> new file mode 100644
> index 00000000000..ac6ef1c73d4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/div_const_1.c
> @@ -0,0 +1,34 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -msve-vector-bits=128" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +#include <arm_sve.h>
> +
> +typedef svbool_t pred __attribute__((arm_sve_vector_bits(128)));
> +typedef svint64_t svint64_2 __attribute__((arm_sve_vector_bits(128)));
> +typedef svuint64_t svuint64_2 __attribute__((arm_sve_vector_bits(128)));
> +
> +/*
> +** f1:
> +** ptrue (p[0-7])\.b, vl16
> +** asrd (z[0-9]+\.d), \1/m, \2, #2
> +** ret
> +*/
> +svint64_2 f1 (svint64_2 p)
> +{
> + const pred pg = svptrue_b64 ();
> + return svdiv_x (pg, p, (svint64_2) {4, 4});
> +}
> +
> +/*
> +** f2:
> +** ptrue (p[0-7])\.b, vl16
> +** mov (z[0-9]+\.d), #4
> +** udiv (z[0-9]+\.d), \1/m, \3, \2
> +** ret
> +*/
> +svuint64_2 f2 (svuint64_2 p)
> +{
> + const pred pg = svptrue_b64 ();
> + return svdiv_x (pg, p, (svuint64_2) {4, 4});
> +}
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/div_const_1_run.c b/gcc/testsuite/gcc.target/aarch64/sve/div_const_1_run.c
> new file mode 100644
> index 00000000000..a15c597d5bd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/div_const_1_run.c
> @@ -0,0 +1,91 @@
> +/* { dg-do run { target aarch64_sve_hw } } */
> +/* { dg-options "-O2 -msve-vector-bits=128" } */
> +
> +#include <arm_sve.h>
> +#include <stdint.h>
> +
> +typedef svbool_t pred __attribute__((arm_sve_vector_bits(128)));
> +typedef svfloat16_t svfloat16_ __attribute__((arm_sve_vector_bits(128)));
> +typedef svfloat32_t svfloat32_ __attribute__((arm_sve_vector_bits(128)));
> +typedef svfloat64_t svfloat64_ __attribute__((arm_sve_vector_bits(128)));
> +typedef svint32_t svint32_ __attribute__((arm_sve_vector_bits(128)));
> +typedef svint64_t svint64_ __attribute__((arm_sve_vector_bits(128)));
> +typedef svuint32_t svuint32_ __attribute__((arm_sve_vector_bits(128)));
> +typedef svuint64_t svuint64_ __attribute__((arm_sve_vector_bits(128)));
> +
> +#define T1(TY, TYS, P) \
> +{ \
> + TY##_t a = (TY##_t) 79; \
> + TY##_t b = (TY##_t) 16; \
> + sv##TY##_ res = svdiv_##P (pg, svdup_##TYS (a), svdup_##TYS (b)); \
> + sv##TY##_ exp = svdup_##TYS (a / b); \
> + if (svptest_any (pg, svcmpne (pg, exp, res))) \
> + __builtin_abort (); \
> +}
> +
> +#define T2(B) \
> +{ \
> + int##B##_t a[] = {0, -1, 1, INT##B##_MAX, INT##B##_MIN, -5, 5}; \
> + int##B##_t b[] = {-1, 1, -4, 4, -5, 5, INT##B##_MAX, INT##B##_MIN}; \
> + int length_a = sizeof (a) / sizeof (a[0]); \
> + int length_b = sizeof (b) / sizeof (b[0]); \
> + for (int i = 0; i < length_a; ++i) \
> + { \
> + for (int j = 0; j < length_b; ++j) \
> + { \
> + svint##B##_ op1 = svdup_s##B (a[i]); \
> + svint##B##_ op2 = svdup_s##B (b[j]); \
> + svint##B##_ res = svdiv_x (pg, op1, op2); \
> + svint##B##_ exp = svdup_s##B (a[i] / b[j]); \
> + if (svptest_any (pg, svcmpne (pg, exp, res))) \
> + __builtin_abort (); \
> + } \
> + } \
> +}
> +
> +#define TEST_VALUES_ASRD2 \
> +{ \
> + svint32_ op1_32 = (svint32_) {0, 16, -79, -1}; \
> + svint32_ op2_32 = (svint32_) {5, 8, -32, 1}; \
> + svint32_ res_32 = svdiv_x (pg, op1_32, op2_32); \
> + svint32_ exp_32 = (svint32_) {0 / 5, 16 / 8, -79 / -32, -1 / 1}; \
> + if (svptest_any (pg, svcmpne (pg, exp_32, res_32))) \
> + __builtin_abort (); \
> + \
> + svint64_ op1_64 = (svint64_) {83, -11}; \
> + svint64_ op2_64 = (svint64_) {16, 5}; \
> + svint64_ res_64 = svdiv_x (pg, op1_64, op2_64); \
> + svint64_ exp_64 = (svint64_) {83 / 16, -11 / 5}; \
> + if (svptest_any (pg, svcmpne (pg, exp_64, res_64))) \
> + __builtin_abort (); \
> +}
> +
> +#define TEST_TYPES(T) \
> + T (float16, f16, x) \
> + T (float32, f32, x) \
> + T (float64, f64, x) \
> + T (int32, s32, x) \
> + T (int64, s64, x) \
> + T (uint32, u32, x) \
> + T (uint64, u64, x) \
> +
> +#define TEST_PREDICATION(T) \
> + T (int32, s32, z) \
> + T (int32, s32, m) \
> + T (int64, s64, z) \
> + T (int64, s64, m) \
> +
> +#define TEST_VALUES_ASRD1(T) \
> + T (32) \
> + T (64) \
> +
> +int
> +main (void)
> +{
> + const pred pg = svptrue_b64 ();
> + TEST_TYPES (T1)
> + TEST_PREDICATION (T1)
> + TEST_VALUES_ASRD1 (T2)
> + TEST_VALUES_ASRD2
> + return 0;
> +}
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] SVE intrinsics: Add strength reduction for division by constant.
2024-07-17 7:57 ` Richard Sandiford
@ 2024-07-29 14:07 ` Jennifer Schmitz
2024-07-29 20:55 ` Richard Sandiford
0 siblings, 1 reply; 6+ messages in thread
From: Jennifer Schmitz @ 2024-07-29 14:07 UTC (permalink / raw)
To: Richard Sandiford; +Cc: gcc-patches, Kyrylo Tkachov
[-- Attachment #1.1: Type: text/plain, Size: 255 bytes --]
Dear Richard,
I revised the patch according to your comments and also implemented the transform for unsigned division; more comments inline below.
The new patch was bootstrapped and tested again.
Looking forward to your feedback.
Thanks,
Jennifer
[-- Attachment #1.2: 0001-SVE-intrinsics-Add-strength-reduction-for-division-b.patch --]
[-- Type: application/octet-stream, Size: 38019 bytes --]
From 0198ca15cd892e5d9a495c7f364af458cb7011d2 Mon Sep 17 00:00:00 2001
From: Jennifer Schmitz <jschmitz@nvidia.com>
Date: Tue, 16 Jul 2024 01:59:50 -0700
Subject: [PATCH] SVE intrinsics: Add strength reduction for division by
constant.
This patch folds SVE division where all divisor elements are the same
power of 2 to svasrd (signed) or svlsr (unsigned).
Tests were added to check
1) whether the transform is applied (existing test harness was amended), and
2) correctness using runtime tests for all input types of svdiv; for signed
and unsigned integers, several corner cases were covered.
The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?
Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
gcc/
* config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl::fold):
Implement strength reduction.
gcc/testsuite/
* gcc.target/aarch64/sve/div_const_run.c: New test.
* gcc.target/aarch64/sve/acle/asm/div_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/div_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/div_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/div_u64.c: Likewise.
---
.../aarch64/aarch64-sve-builtins-base.cc | 49 +++-
.../gcc.target/aarch64/sve/acle/asm/div_s32.c | 274 +++++++++++++++++-
.../gcc.target/aarch64/sve/acle/asm/div_s64.c | 274 +++++++++++++++++-
.../gcc.target/aarch64/sve/acle/asm/div_u32.c | 201 ++++++++++++-
.../gcc.target/aarch64/sve/acle/asm/div_u64.c | 201 ++++++++++++-
.../gcc.target/aarch64/sve/div_const_run.c | 91 ++++++
6 files changed, 1033 insertions(+), 57 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/div_const_run.c
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index aa26370d397..41a7b4cd861 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -746,6 +746,53 @@ public:
}
};
+class svdiv_impl : public rtx_code_function
+{
+public:
+ CONSTEXPR svdiv_impl ()
+ : rtx_code_function (DIV, UDIV, UNSPEC_COND_FDIV) {}
+
+ gimple *
+ fold (gimple_folder &f) const override
+ {
+ tree divisor = gimple_call_arg (f.call, 2);
+ tree divisor_cst = uniform_integer_cst_p (divisor);
+
+ if (!divisor_cst || !integer_pow2p (divisor_cst))
+ return NULL;
+
+ tree new_divisor;
+ gcall *call;
+
+ if (f.type_suffix (0).unsigned_p && tree_to_uhwi (divisor_cst) != 1)
+ {
+ function_instance instance ("svlsr", functions::svlsr,
+ shapes::binary_uint_opt_n, MODE_n,
+ f.type_suffix_ids, GROUP_none, f.pred);
+ call = f.redirect_call (instance);
+ tree d = INTEGRAL_TYPE_P (TREE_TYPE (divisor)) ? divisor : divisor_cst;
+ new_divisor = wide_int_to_tree (TREE_TYPE (d), tree_log2 (d));
+ }
+ else
+ {
+ if (tree_int_cst_sign_bit (divisor_cst)
+ || tree_to_shwi (divisor_cst) == 1)
+ return NULL;
+
+ function_instance instance ("svasrd", functions::svasrd,
+ shapes::shift_right_imm, MODE_n,
+ f.type_suffix_ids, GROUP_none, f.pred);
+ call = f.redirect_call (instance);
+ new_divisor = wide_int_to_tree (scalar_types[VECTOR_TYPE_svuint64_t],
+ tree_log2 (divisor_cst));
+ }
+
+ gimple_call_set_arg (call, 2, new_divisor);
+ return call;
+ }
+};
+
+
class svdot_impl : public function_base
{
public:
@@ -3043,7 +3090,7 @@ FUNCTION (svcreate3, svcreate_impl, (3))
FUNCTION (svcreate4, svcreate_impl, (4))
FUNCTION (svcvt, svcvt_impl,)
FUNCTION (svcvtnt, CODE_FOR_MODE0 (aarch64_sve_cvtnt),)
-FUNCTION (svdiv, rtx_code_function, (DIV, UDIV, UNSPEC_COND_FDIV))
+FUNCTION (svdiv, svdiv_impl,)
FUNCTION (svdivr, rtx_code_function_rotated, (DIV, UDIV, UNSPEC_COND_FDIV))
FUNCTION (svdot, svdot_impl,)
FUNCTION (svdot_lane, svdotprod_lane_impl, (UNSPEC_SDOT, UNSPEC_UDOT,
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c
index c49ca1aa524..6500b64c41b 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c
@@ -1,6 +1,9 @@
/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
#include "test_sve_acle.h"
+#include <stdint.h>
+
+#define MAXPOW 1<<30
/*
** div_s32_m_tied1:
@@ -53,10 +56,27 @@ TEST_UNIFORM_ZX (div_w0_s32_m_untied, svint32_t, int32_t,
z0 = svdiv_n_s32_m (p0, z1, x0),
z0 = svdiv_m (p0, z1, x0))
+/*
+** div_1_s32_m_tied1:
+** sel z0\.s, p0, z0\.s, z0\.s
+** ret
+*/
+TEST_UNIFORM_Z (div_1_s32_m_tied1, svint32_t,
+ z0 = svdiv_n_s32_m (p0, z0, 1),
+ z0 = svdiv_m (p0, z0, 1))
+
+/*
+** div_1_s32_m_untied:
+** sel z0\.s, p0, z1\.s, z1\.s
+** ret
+*/
+TEST_UNIFORM_Z (div_1_s32_m_untied, svint32_t,
+ z0 = svdiv_n_s32_m (p0, z1, 1),
+ z0 = svdiv_m (p0, z1, 1))
+
/*
** div_2_s32_m_tied1:
-** mov (z[0-9]+\.s), #2
-** sdiv z0\.s, p0/m, z0\.s, \1
+** asrd z0\.s, p0/m, z0\.s, #1
** ret
*/
TEST_UNIFORM_Z (div_2_s32_m_tied1, svint32_t,
@@ -65,15 +85,75 @@ TEST_UNIFORM_Z (div_2_s32_m_tied1, svint32_t,
/*
** div_2_s32_m_untied:
-** mov (z[0-9]+\.s), #2
** movprfx z0, z1
-** sdiv z0\.s, p0/m, z0\.s, \1
+** asrd z0\.s, p0/m, z0\.s, #1
** ret
*/
TEST_UNIFORM_Z (div_2_s32_m_untied, svint32_t,
z0 = svdiv_n_s32_m (p0, z1, 2),
z0 = svdiv_m (p0, z1, 2))
+/*
+** div_3_s32_m_tied1:
+** mov (z[0-9]+\.s), #3
+** sdiv z0\.s, p0/m, z0\.s, \1
+** ret
+*/
+TEST_UNIFORM_Z (div_3_s32_m_tied1, svint32_t,
+ z0 = svdiv_n_s32_m (p0, z0, 3),
+ z0 = svdiv_m (p0, z0, 3))
+
+/*
+** div_3_s32_m_untied:
+** mov (z[0-9]+\.s), #3
+** movprfx z0, z1
+** sdiv z0\.s, p0/m, z0\.s, \1
+** ret
+*/
+TEST_UNIFORM_Z (div_3_s32_m_untied, svint32_t,
+ z0 = svdiv_n_s32_m (p0, z1, 3),
+ z0 = svdiv_m (p0, z1, 3))
+
+/*
+** div_maxpow_s32_m_tied1:
+** asrd z0\.s, p0/m, z0\.s, #30
+** ret
+*/
+TEST_UNIFORM_Z (div_maxpow_s32_m_tied1, svint32_t,
+ z0 = svdiv_n_s32_m (p0, z0, MAXPOW),
+ z0 = svdiv_m (p0, z0, MAXPOW))
+
+/*
+** div_maxpow_s32_m_untied:
+** movprfx z0, z1
+** asrd z0\.s, p0/m, z0\.s, #30
+** ret
+*/
+TEST_UNIFORM_Z (div_maxpow_s32_m_untied, svint32_t,
+ z0 = svdiv_n_s32_m (p0, z1, MAXPOW),
+ z0 = svdiv_m (p0, z1, MAXPOW))
+
+/*
+** div_intmin_s32_m_tied1:
+** mov (z[0-9]+\.s), #-2147483648
+** sdiv z0\.s, p0/m, z0\.s, \1
+** ret
+*/
+TEST_UNIFORM_Z (div_intmin_s32_m_tied1, svint32_t,
+ z0 = svdiv_n_s32_m (p0, z0, INT32_MIN),
+ z0 = svdiv_m (p0, z0, INT32_MIN))
+
+/*
+** div_intmin_s32_m_untied:
+** mov (z[0-9]+\.s), #-2147483648
+** movprfx z0, z1
+** sdiv z0\.s, p0/m, z0\.s, \1
+** ret
+*/
+TEST_UNIFORM_Z (div_intmin_s32_m_untied, svint32_t,
+ z0 = svdiv_n_s32_m (p0, z1, INT32_MIN),
+ z0 = svdiv_m (p0, z1, INT32_MIN))
+
/*
** div_s32_z_tied1:
** movprfx z0\.s, p0/z, z0\.s
@@ -137,19 +217,61 @@ TEST_UNIFORM_ZX (div_w0_s32_z_untied, svint32_t, int32_t,
z0 = svdiv_z (p0, z1, x0))
/*
-** div_2_s32_z_tied1:
-** mov (z[0-9]+\.s), #2
+** div_1_s32_z_tied1:
+** mov (z[0-9]+\.s), #1
** movprfx z0\.s, p0/z, z0\.s
** sdiv z0\.s, p0/m, z0\.s, \1
** ret
*/
+TEST_UNIFORM_Z (div_1_s32_z_tied1, svint32_t,
+ z0 = svdiv_n_s32_z (p0, z0, 1),
+ z0 = svdiv_z (p0, z0, 1))
+
+/*
+** div_1_s32_z_untied:
+** mov z0\.s, #1
+** movprfx z0\.s, p0/z, z0\.s
+** sdivr z0\.s, p0/m, z0\.s, z1\.s
+** ret
+*/
+TEST_UNIFORM_Z (div_1_s32_z_untied, svint32_t,
+ z0 = svdiv_n_s32_z (p0, z1, 1),
+ z0 = svdiv_z (p0, z1, 1))
+
+/*
+** div_2_s32_z_tied1:
+** movprfx z0\.s, p0/z, z0\.s
+** asrd z0\.s, p0/m, z0\.s, #1
+** ret
+*/
TEST_UNIFORM_Z (div_2_s32_z_tied1, svint32_t,
z0 = svdiv_n_s32_z (p0, z0, 2),
z0 = svdiv_z (p0, z0, 2))
/*
** div_2_s32_z_untied:
-** mov (z[0-9]+\.s), #2
+** movprfx z0\.s, p0/z, z1\.s
+** asrd z0\.s, p0/m, z0\.s, #1
+** ret
+*/
+TEST_UNIFORM_Z (div_2_s32_z_untied, svint32_t,
+ z0 = svdiv_n_s32_z (p0, z1, 2),
+ z0 = svdiv_z (p0, z1, 2))
+
+/*
+** div_3_s32_z_tied1:
+** mov (z[0-9]+\.s), #3
+** movprfx z0\.s, p0/z, z0\.s
+** sdiv z0\.s, p0/m, z0\.s, \1
+** ret
+*/
+TEST_UNIFORM_Z (div_3_s32_z_tied1, svint32_t,
+ z0 = svdiv_n_s32_z (p0, z0, 3),
+ z0 = svdiv_z (p0, z0, 3))
+
+/*
+** div_3_s32_z_untied:
+** mov (z[0-9]+\.s), #3
** (
** movprfx z0\.s, p0/z, z1\.s
** sdiv z0\.s, p0/m, z0\.s, \1
@@ -159,9 +281,56 @@ TEST_UNIFORM_Z (div_2_s32_z_tied1, svint32_t,
** )
** ret
*/
-TEST_UNIFORM_Z (div_2_s32_z_untied, svint32_t,
- z0 = svdiv_n_s32_z (p0, z1, 2),
- z0 = svdiv_z (p0, z1, 2))
+TEST_UNIFORM_Z (div_3_s32_z_untied, svint32_t,
+ z0 = svdiv_n_s32_z (p0, z1, 3),
+ z0 = svdiv_z (p0, z1, 3))
+
+/*
+** div_maxpow_s32_z_tied1:
+** movprfx z0\.s, p0/z, z0\.s
+** asrd z0\.s, p0/m, z0\.s, #30
+** ret
+*/
+TEST_UNIFORM_Z (div_maxpow_s32_z_tied1, svint32_t,
+ z0 = svdiv_n_s32_z (p0, z0, MAXPOW),
+ z0 = svdiv_z (p0, z0, MAXPOW))
+
+/*
+** div_maxpow_s32_z_untied:
+** movprfx z0\.s, p0/z, z1\.s
+** asrd z0\.s, p0/m, z0\.s, #30
+** ret
+*/
+TEST_UNIFORM_Z (div_maxpow_s32_z_untied, svint32_t,
+ z0 = svdiv_n_s32_z (p0, z1, MAXPOW),
+ z0 = svdiv_z (p0, z1, MAXPOW))
+
+/*
+** div_intmin_s32_z_tied1:
+** mov (z[0-9]+\.s), #-2147483648
+** movprfx z0\.s, p0/z, z0\.s
+** sdiv z0\.s, p0/m, z0\.s, \1
+** ret
+*/
+TEST_UNIFORM_Z (div_intmin_s32_z_tied1, svint32_t,
+ z0 = svdiv_n_s32_z (p0, z0, INT32_MIN),
+ z0 = svdiv_z (p0, z0, INT32_MIN))
+
+/*
+** div_intmin_s32_z_untied:
+** mov (z[0-9]+\.s), #-2147483648
+** (
+** movprfx z0\.s, p0/z, z1\.s
+** sdiv z0\.s, p0/m, z0\.s, \1
+** |
+** movprfx z0\.s, p0/z, \1
+** sdivr z0\.s, p0/m, z0\.s, z1\.s
+** )
+** ret
+*/
+TEST_UNIFORM_Z (div_intmin_s32_z_untied, svint32_t,
+ z0 = svdiv_n_s32_z (p0, z1, INT32_MIN),
+ z0 = svdiv_z (p0, z1, INT32_MIN))
/*
** div_s32_x_tied1:
@@ -216,10 +385,26 @@ TEST_UNIFORM_ZX (div_w0_s32_x_untied, svint32_t, int32_t,
z0 = svdiv_n_s32_x (p0, z1, x0),
z0 = svdiv_x (p0, z1, x0))
+/*
+** div_1_s32_x_tied1:
+** ret
+*/
+TEST_UNIFORM_Z (div_1_s32_x_tied1, svint32_t,
+ z0 = svdiv_n_s32_x (p0, z0, 1),
+ z0 = svdiv_x (p0, z0, 1))
+
+/*
+** div_1_s32_x_untied:
+** mov z0\.d, z1\.d
+** ret
+*/
+TEST_UNIFORM_Z (div_1_s32_x_untied, svint32_t,
+ z0 = svdiv_n_s32_x (p0, z1, 1),
+ z0 = svdiv_x (p0, z1, 1))
+
/*
** div_2_s32_x_tied1:
-** mov (z[0-9]+\.s), #2
-** sdiv z0\.s, p0/m, z0\.s, \1
+** asrd z0\.s, p0/m, z0\.s, #1
** ret
*/
TEST_UNIFORM_Z (div_2_s32_x_tied1, svint32_t,
@@ -228,10 +413,71 @@ TEST_UNIFORM_Z (div_2_s32_x_tied1, svint32_t,
/*
** div_2_s32_x_untied:
-** mov z0\.s, #2
-** sdivr z0\.s, p0/m, z0\.s, z1\.s
+** movprfx z0, z1
+** asrd z0\.s, p0/m, z0\.s, #1
** ret
*/
TEST_UNIFORM_Z (div_2_s32_x_untied, svint32_t,
z0 = svdiv_n_s32_x (p0, z1, 2),
z0 = svdiv_x (p0, z1, 2))
+
+/*
+** div_3_s32_x_tied1:
+** mov (z[0-9]+\.s), #3
+** sdiv z0\.s, p0/m, z0\.s, \1
+** ret
+*/
+TEST_UNIFORM_Z (div_3_s32_x_tied1, svint32_t,
+ z0 = svdiv_n_s32_x (p0, z0, 3),
+ z0 = svdiv_x (p0, z0, 3))
+
+/*
+** div_3_s32_x_untied:
+** mov z0\.s, #3
+** sdivr z0\.s, p0/m, z0\.s, z1\.s
+** ret
+*/
+TEST_UNIFORM_Z (div_3_s32_x_untied, svint32_t,
+ z0 = svdiv_n_s32_x (p0, z1, 3),
+ z0 = svdiv_x (p0, z1, 3))
+
+/*
+** div_maxpow_s32_x_tied1:
+** asrd z0\.s, p0/m, z0\.s, #30
+** ret
+*/
+TEST_UNIFORM_Z (div_maxpow_s32_x_tied1, svint32_t,
+ z0 = svdiv_n_s32_x (p0, z0, MAXPOW),
+ z0 = svdiv_x (p0, z0, MAXPOW))
+
+/*
+** div_maxpow_s32_x_untied:
+** movprfx z0, z1
+** asrd z0\.s, p0/m, z0\.s, #30
+** ret
+*/
+TEST_UNIFORM_Z (div_maxpow_s32_x_untied, svint32_t,
+ z0 = svdiv_n_s32_x (p0, z1, MAXPOW),
+ z0 = svdiv_x (p0, z1, MAXPOW))
+
+/*
+** div_intmin_s32_x_tied1:
+** mov (z[0-9]+\.s), #-2147483648
+** sdiv z0\.s, p0/m, z0\.s, \1
+** ret
+*/
+TEST_UNIFORM_Z (div_intmin_s32_x_tied1, svint32_t,
+ z0 = svdiv_n_s32_x (p0, z0, INT32_MIN),
+ z0 = svdiv_x (p0, z0, INT32_MIN))
+
+/*
+** div_intmin_s32_x_untied:
+** mov z0\.s, #-2147483648
+** sdivr z0\.s, p0/m, z0\.s, z1\.s
+** ret
+*/
+TEST_UNIFORM_Z (div_intmin_s32_x_untied, svint32_t,
+ z0 = svdiv_n_s32_x (p0, z1, INT32_MIN),
+ z0 = svdiv_x (p0, z1, INT32_MIN))
+
+
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s64.c
index 464dca28d74..d7188640b42 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s64.c
@@ -1,6 +1,9 @@
/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
#include "test_sve_acle.h"
+#include <stdint.h>
+
+#define MAXPOW 1ULL<<62
/*
** div_s64_m_tied1:
@@ -53,10 +56,27 @@ TEST_UNIFORM_ZX (div_x0_s64_m_untied, svint64_t, int64_t,
z0 = svdiv_n_s64_m (p0, z1, x0),
z0 = svdiv_m (p0, z1, x0))
+/*
+** div_1_s64_m_tied1:
+** sel z0\.d, p0, z0\.d, z0\.d
+** ret
+*/
+TEST_UNIFORM_Z (div_1_s64_m_tied1, svint64_t,
+ z0 = svdiv_n_s64_m (p0, z0, 1),
+ z0 = svdiv_m (p0, z0, 1))
+
+/*
+** div_1_s64_m_untied:
+** sel z0\.d, p0, z1\.d, z1\.d
+** ret
+*/
+TEST_UNIFORM_Z (div_1_s64_m_untied, svint64_t,
+ z0 = svdiv_n_s64_m (p0, z1, 1),
+ z0 = svdiv_m (p0, z1, 1))
+
/*
** div_2_s64_m_tied1:
-** mov (z[0-9]+\.d), #2
-** sdiv z0\.d, p0/m, z0\.d, \1
+** asrd z0\.d, p0/m, z0\.d, #1
** ret
*/
TEST_UNIFORM_Z (div_2_s64_m_tied1, svint64_t,
@@ -65,15 +85,75 @@ TEST_UNIFORM_Z (div_2_s64_m_tied1, svint64_t,
/*
** div_2_s64_m_untied:
-** mov (z[0-9]+\.d), #2
** movprfx z0, z1
-** sdiv z0\.d, p0/m, z0\.d, \1
+** asrd z0\.d, p0/m, z0\.d, #1
** ret
*/
TEST_UNIFORM_Z (div_2_s64_m_untied, svint64_t,
z0 = svdiv_n_s64_m (p0, z1, 2),
z0 = svdiv_m (p0, z1, 2))
+/*
+** div_3_s64_m_tied1:
+** mov (z[0-9]+\.d), #3
+** sdiv z0\.d, p0/m, z0\.d, \1
+** ret
+*/
+TEST_UNIFORM_Z (div_3_s64_m_tied1, svint64_t,
+ z0 = svdiv_n_s64_m (p0, z0, 3),
+ z0 = svdiv_m (p0, z0, 3))
+
+/*
+** div_3_s64_m_untied:
+** mov (z[0-9]+\.d), #3
+** movprfx z0, z1
+** sdiv z0\.d, p0/m, z0\.d, \1
+** ret
+*/
+TEST_UNIFORM_Z (div_3_s64_m_untied, svint64_t,
+ z0 = svdiv_n_s64_m (p0, z1, 3),
+ z0 = svdiv_m (p0, z1, 3))
+
+/*
+** div_maxpow_s64_m_tied1:
+** asrd z0\.d, p0/m, z0\.d, #62
+** ret
+*/
+TEST_UNIFORM_Z (div_maxpow_s64_m_tied1, svint64_t,
+ z0 = svdiv_n_s64_m (p0, z0, MAXPOW),
+ z0 = svdiv_m (p0, z0, MAXPOW))
+
+/*
+** div_maxpow_s64_m_untied:
+** movprfx z0, z1
+** asrd z0\.d, p0/m, z0\.d, #62
+** ret
+*/
+TEST_UNIFORM_Z (div_maxpow_s64_m_untied, svint64_t,
+ z0 = svdiv_n_s64_m (p0, z1, MAXPOW),
+ z0 = svdiv_m (p0, z1, MAXPOW))
+
+/*
+** div_intmin_s64_m_tied1:
+** mov (z[0-9]+\.d), #-9223372036854775808
+** sdiv z0\.d, p0/m, z0\.d, \1
+** ret
+*/
+TEST_UNIFORM_Z (div_intmin_s64_m_tied1, svint64_t,
+ z0 = svdiv_n_s64_m (p0, z0, INT64_MIN),
+ z0 = svdiv_m (p0, z0, INT64_MIN))
+
+/*
+** div_intmin_s64_m_untied:
+** mov (z[0-9]+\.d), #-9223372036854775808
+** movprfx z0, z1
+** sdiv z0\.d, p0/m, z0\.d, \1
+** ret
+*/
+TEST_UNIFORM_Z (div_intmin_s64_m_untied, svint64_t,
+ z0 = svdiv_n_s64_m (p0, z1, INT64_MIN),
+ z0 = svdiv_m (p0, z1, INT64_MIN))
+
/*
** div_s64_z_tied1:
** movprfx z0\.d, p0/z, z0\.d
@@ -137,19 +217,61 @@ TEST_UNIFORM_ZX (div_x0_s64_z_untied, svint64_t, int64_t,
z0 = svdiv_z (p0, z1, x0))
/*
-** div_2_s64_z_tied1:
-** mov (z[0-9]+\.d), #2
+** div_1_s64_z_tied1:
+** mov (z[0-9]+\.d), #1
** movprfx z0\.d, p0/z, z0\.d
** sdiv z0\.d, p0/m, z0\.d, \1
** ret
*/
+TEST_UNIFORM_Z (div_1_s64_z_tied1, svint64_t,
+ z0 = svdiv_n_s64_z (p0, z0, 1),
+ z0 = svdiv_z (p0, z0, 1))
+
+/*
+** div_1_s64_z_untied:
+** mov z0\.d, #1
+** movprfx z0\.d, p0/z, z0\.d
+** sdivr z0\.d, p0/m, z0\.d, z1\.d
+** ret
+*/
+TEST_UNIFORM_Z (div_1_s64_z_untied, svint64_t,
+ z0 = svdiv_n_s64_z (p0, z1, 1),
+ z0 = svdiv_z (p0, z1, 1))
+
+/*
+** div_2_s64_z_tied1:
+** movprfx z0\.d, p0/z, z0\.d
+** asrd z0\.d, p0/m, z0\.d, #1
+** ret
+*/
TEST_UNIFORM_Z (div_2_s64_z_tied1, svint64_t,
z0 = svdiv_n_s64_z (p0, z0, 2),
z0 = svdiv_z (p0, z0, 2))
/*
** div_2_s64_z_untied:
-** mov (z[0-9]+\.d), #2
+** movprfx z0\.d, p0/z, z1\.d
+** asrd z0\.d, p0/m, z0\.d, #1
+** ret
+*/
+TEST_UNIFORM_Z (div_2_s64_z_untied, svint64_t,
+ z0 = svdiv_n_s64_z (p0, z1, 2),
+ z0 = svdiv_z (p0, z1, 2))
+
+/*
+** div_3_s64_z_tied1:
+** mov (z[0-9]+\.d), #3
+** movprfx z0\.d, p0/z, z0\.d
+** sdiv z0\.d, p0/m, z0\.d, \1
+** ret
+*/
+TEST_UNIFORM_Z (div_3_s64_z_tied1, svint64_t,
+ z0 = svdiv_n_s64_z (p0, z0, 3),
+ z0 = svdiv_z (p0, z0, 3))
+
+/*
+** div_3_s64_z_untied:
+** mov (z[0-9]+\.d), #3
** (
** movprfx z0\.d, p0/z, z1\.d
** sdiv z0\.d, p0/m, z0\.d, \1
@@ -159,9 +281,56 @@ TEST_UNIFORM_Z (div_2_s64_z_tied1, svint64_t,
** )
** ret
*/
-TEST_UNIFORM_Z (div_2_s64_z_untied, svint64_t,
- z0 = svdiv_n_s64_z (p0, z1, 2),
- z0 = svdiv_z (p0, z1, 2))
+TEST_UNIFORM_Z (div_3_s64_z_untied, svint64_t,
+ z0 = svdiv_n_s64_z (p0, z1, 3),
+ z0 = svdiv_z (p0, z1, 3))
+
+/*
+** div_maxpow_s64_z_tied1:
+** movprfx z0\.d, p0/z, z0\.d
+** asrd z0\.d, p0/m, z0\.d, #62
+** ret
+*/
+TEST_UNIFORM_Z (div_maxpow_s64_z_tied1, svint64_t,
+ z0 = svdiv_n_s64_z (p0, z0, MAXPOW),
+ z0 = svdiv_z (p0, z0, MAXPOW))
+
+/*
+** div_maxpow_s64_z_untied:
+** movprfx z0\.d, p0/z, z1\.d
+** asrd z0\.d, p0/m, z0\.d, #62
+** ret
+*/
+TEST_UNIFORM_Z (div_maxpow_s64_z_untied, svint64_t,
+ z0 = svdiv_n_s64_z (p0, z1, MAXPOW),
+ z0 = svdiv_z (p0, z1, MAXPOW))
+
+/*
+** div_intmin_s64_z_tied1:
+** mov (z[0-9]+\.d), #-9223372036854775808
+** movprfx z0\.d, p0/z, z0\.d
+** sdiv z0\.d, p0/m, z0\.d, \1
+** ret
+*/
+TEST_UNIFORM_Z (div_intmin_s64_z_tied1, svint64_t,
+ z0 = svdiv_n_s64_z (p0, z0, INT64_MIN),
+ z0 = svdiv_z (p0, z0, INT64_MIN))
+
+/*
+** div_intmin_s64_z_untied:
+** mov (z[0-9]+\.d), #-9223372036854775808
+** (
+** movprfx z0\.d, p0/z, z1\.d
+** sdiv z0\.d, p0/m, z0\.d, \1
+** |
+** movprfx z0\.d, p0/z, \1
+** sdivr z0\.d, p0/m, z0\.d, z1\.d
+** )
+** ret
+*/
+TEST_UNIFORM_Z (div_intmin_s64_z_untied, svint64_t,
+ z0 = svdiv_n_s64_z (p0, z1, INT64_MIN),
+ z0 = svdiv_z (p0, z1, INT64_MIN))
/*
** div_s64_x_tied1:
@@ -216,10 +385,26 @@ TEST_UNIFORM_ZX (div_x0_s64_x_untied, svint64_t, int64_t,
z0 = svdiv_n_s64_x (p0, z1, x0),
z0 = svdiv_x (p0, z1, x0))
+/*
+** div_1_s64_x_tied1:
+** ret
+*/
+TEST_UNIFORM_Z (div_1_s64_x_tied1, svint64_t,
+ z0 = svdiv_n_s64_x (p0, z0, 1),
+ z0 = svdiv_x (p0, z0, 1))
+
+/*
+** div_1_s64_x_untied:
+** mov z0\.d, z1\.d
+** ret
+*/
+TEST_UNIFORM_Z (div_1_s64_x_untied, svint64_t,
+ z0 = svdiv_n_s64_x (p0, z1, 1),
+ z0 = svdiv_x (p0, z1, 1))
+
/*
** div_2_s64_x_tied1:
-** mov (z[0-9]+\.d), #2
-** sdiv z0\.d, p0/m, z0\.d, \1
+** asrd z0\.d, p0/m, z0\.d, #1
** ret
*/
TEST_UNIFORM_Z (div_2_s64_x_tied1, svint64_t,
@@ -228,10 +413,71 @@ TEST_UNIFORM_Z (div_2_s64_x_tied1, svint64_t,
/*
** div_2_s64_x_untied:
-** mov z0\.d, #2
-** sdivr z0\.d, p0/m, z0\.d, z1\.d
+** movprfx z0, z1
+** asrd z0\.d, p0/m, z0\.d, #1
** ret
*/
TEST_UNIFORM_Z (div_2_s64_x_untied, svint64_t,
z0 = svdiv_n_s64_x (p0, z1, 2),
z0 = svdiv_x (p0, z1, 2))
+
+/*
+** div_3_s64_x_tied1:
+** mov (z[0-9]+\.d), #3
+** sdiv z0\.d, p0/m, z0\.d, \1
+** ret
+*/
+TEST_UNIFORM_Z (div_3_s64_x_tied1, svint64_t,
+ z0 = svdiv_n_s64_x (p0, z0, 3),
+ z0 = svdiv_x (p0, z0, 3))
+
+/*
+** div_3_s64_x_untied:
+** mov z0\.d, #3
+** sdivr z0\.d, p0/m, z0\.d, z1\.d
+** ret
+*/
+TEST_UNIFORM_Z (div_3_s64_x_untied, svint64_t,
+ z0 = svdiv_n_s64_x (p0, z1, 3),
+ z0 = svdiv_x (p0, z1, 3))
+
+/*
+** div_maxpow_s64_x_tied1:
+** asrd z0\.d, p0/m, z0\.d, #62
+** ret
+*/
+TEST_UNIFORM_Z (div_maxpow_s64_x_tied1, svint64_t,
+ z0 = svdiv_n_s64_x (p0, z0, MAXPOW),
+ z0 = svdiv_x (p0, z0, MAXPOW))
+
+/*
+** div_maxpow_s64_x_untied:
+** movprfx z0, z1
+** asrd z0\.d, p0/m, z0\.d, #62
+** ret
+*/
+TEST_UNIFORM_Z (div_maxpow_s64_x_untied, svint64_t,
+ z0 = svdiv_n_s64_x (p0, z1, MAXPOW),
+ z0 = svdiv_x (p0, z1, MAXPOW))
+
+/*
+** div_intmin_s64_x_tied1:
+** mov (z[0-9]+\.d), #-9223372036854775808
+** sdiv z0\.d, p0/m, z0\.d, \1
+** ret
+*/
+TEST_UNIFORM_Z (div_intmin_s64_x_tied1, svint64_t,
+ z0 = svdiv_n_s64_x (p0, z0, INT64_MIN),
+ z0 = svdiv_x (p0, z0, INT64_MIN))
+
+/*
+** div_intmin_s64_x_untied:
+** mov z0\.d, #-9223372036854775808
+** sdivr z0\.d, p0/m, z0\.d, z1\.d
+** ret
+*/
+TEST_UNIFORM_Z (div_intmin_s64_x_untied, svint64_t,
+ z0 = svdiv_n_s64_x (p0, z1, INT64_MIN),
+ z0 = svdiv_x (p0, z1, INT64_MIN))
+
+
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u32.c
index 232ccacf524..9707664caf4 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u32.c
@@ -2,6 +2,8 @@
#include "test_sve_acle.h"
+#define MAXPOW 1<<31
+
/*
** div_u32_m_tied1:
** udiv z0\.s, p0/m, z0\.s, z1\.s
@@ -53,10 +55,27 @@ TEST_UNIFORM_ZX (div_w0_u32_m_untied, svuint32_t, uint32_t,
z0 = svdiv_n_u32_m (p0, z1, x0),
z0 = svdiv_m (p0, z1, x0))
+/*
+** div_1_u32_m_tied1:
+** sel z0\.s, p0, z0\.s, z0\.s
+** ret
+*/
+TEST_UNIFORM_Z (div_1_u32_m_tied1, svuint32_t,
+ z0 = svdiv_n_u32_m (p0, z0, 1),
+ z0 = svdiv_m (p0, z0, 1))
+
+/*
+** div_1_u32_m_untied:
+** sel z0\.s, p0, z1\.s, z1\.s
+** ret
+*/
+TEST_UNIFORM_Z (div_1_u32_m_untied, svuint32_t,
+ z0 = svdiv_n_u32_m (p0, z1, 1),
+ z0 = svdiv_m (p0, z1, 1))
+
/*
** div_2_u32_m_tied1:
-** mov (z[0-9]+\.s), #2
-** udiv z0\.s, p0/m, z0\.s, \1
+** lsr z0\.s, p0/m, z0\.s, #1
** ret
*/
TEST_UNIFORM_Z (div_2_u32_m_tied1, svuint32_t,
@@ -65,15 +84,54 @@ TEST_UNIFORM_Z (div_2_u32_m_tied1, svuint32_t,
/*
** div_2_u32_m_untied:
-** mov (z[0-9]+\.s), #2
** movprfx z0, z1
-** udiv z0\.s, p0/m, z0\.s, \1
+** lsr z0\.s, p0/m, z0\.s, #1
** ret
*/
TEST_UNIFORM_Z (div_2_u32_m_untied, svuint32_t,
z0 = svdiv_n_u32_m (p0, z1, 2),
z0 = svdiv_m (p0, z1, 2))
+/*
+** div_3_u32_m_tied1:
+** mov (z[0-9]+\.s), #3
+** udiv z0\.s, p0/m, z0\.s, \1
+** ret
+*/
+TEST_UNIFORM_Z (div_3_u32_m_tied1, svuint32_t,
+ z0 = svdiv_n_u32_m (p0, z0, 3),
+ z0 = svdiv_m (p0, z0, 3))
+
+/*
+** div_3_u32_m_untied:
+** mov (z[0-9]+\.s), #3
+** movprfx z0, z1
+** udiv z0\.s, p0/m, z0\.s, \1
+** ret
+*/
+TEST_UNIFORM_Z (div_3_u32_m_untied, svuint32_t,
+ z0 = svdiv_n_u32_m (p0, z1, 3),
+ z0 = svdiv_m (p0, z1, 3))
+
+/*
+** div_maxpow_u32_m_tied1:
+** lsr z0\.s, p0/m, z0\.s, #31
+** ret
+*/
+TEST_UNIFORM_Z (div_maxpow_u32_m_tied1, svuint32_t,
+ z0 = svdiv_n_u32_m (p0, z0, MAXPOW),
+ z0 = svdiv_m (p0, z0, MAXPOW))
+
+/*
+** div_maxpow_u32_m_untied:
+** movprfx z0, z1
+** lsr z0\.s, p0/m, z0\.s, #31
+** ret
+*/
+TEST_UNIFORM_Z (div_maxpow_u32_m_untied, svuint32_t,
+ z0 = svdiv_n_u32_m (p0, z1, MAXPOW),
+ z0 = svdiv_m (p0, z1, MAXPOW))
+
/*
** div_u32_z_tied1:
** movprfx z0\.s, p0/z, z0\.s
@@ -137,19 +195,61 @@ TEST_UNIFORM_ZX (div_w0_u32_z_untied, svuint32_t, uint32_t,
z0 = svdiv_z (p0, z1, x0))
/*
-** div_2_u32_z_tied1:
-** mov (z[0-9]+\.s), #2
+** div_1_u32_z_tied1:
+** mov (z[0-9]+\.s), #1
** movprfx z0\.s, p0/z, z0\.s
** udiv z0\.s, p0/m, z0\.s, \1
** ret
*/
+TEST_UNIFORM_Z (div_1_u32_z_tied1, svuint32_t,
+ z0 = svdiv_n_u32_z (p0, z0, 1),
+ z0 = svdiv_z (p0, z0, 1))
+
+/*
+** div_1_u32_z_untied:
+** mov z0\.s, #1
+** movprfx z0\.s, p0/z, z0\.s
+** udivr z0\.s, p0/m, z0\.s, z1\.s
+** ret
+*/
+TEST_UNIFORM_Z (div_1_u32_z_untied, svuint32_t,
+ z0 = svdiv_n_u32_z (p0, z1, 1),
+ z0 = svdiv_z (p0, z1, 1))
+
+/*
+** div_2_u32_z_tied1:
+** movprfx z0\.s, p0/z, z0\.s
+** lsr z0\.s, p0/m, z0\.s, #1
+** ret
+*/
TEST_UNIFORM_Z (div_2_u32_z_tied1, svuint32_t,
z0 = svdiv_n_u32_z (p0, z0, 2),
z0 = svdiv_z (p0, z0, 2))
/*
** div_2_u32_z_untied:
-** mov (z[0-9]+\.s), #2
+** movprfx z0\.s, p0/z, z1\.s
+** lsr z0\.s, p0/m, z0\.s, #1
+** ret
+*/
+TEST_UNIFORM_Z (div_2_u32_z_untied, svuint32_t,
+ z0 = svdiv_n_u32_z (p0, z1, 2),
+ z0 = svdiv_z (p0, z1, 2))
+
+/*
+** div_3_u32_z_tied1:
+** mov (z[0-9]+\.s), #3
+** movprfx z0\.s, p0/z, z0\.s
+** udiv z0\.s, p0/m, z0\.s, \1
+** ret
+*/
+TEST_UNIFORM_Z (div_3_u32_z_tied1, svuint32_t,
+ z0 = svdiv_n_u32_z (p0, z0, 3),
+ z0 = svdiv_z (p0, z0, 3))
+
+/*
+** div_3_u32_z_untied:
+** mov (z[0-9]+\.s), #3
** (
** movprfx z0\.s, p0/z, z1\.s
** udiv z0\.s, p0/m, z0\.s, \1
@@ -159,9 +259,29 @@ TEST_UNIFORM_Z (div_2_u32_z_tied1, svuint32_t,
** )
** ret
*/
-TEST_UNIFORM_Z (div_2_u32_z_untied, svuint32_t,
- z0 = svdiv_n_u32_z (p0, z1, 2),
- z0 = svdiv_z (p0, z1, 2))
+TEST_UNIFORM_Z (div_3_u32_z_untied, svuint32_t,
+ z0 = svdiv_n_u32_z (p0, z1, 3),
+ z0 = svdiv_z (p0, z1, 3))
+
+/*
+** div_maxpow_u32_z_tied1:
+** movprfx z0\.s, p0/z, z0\.s
+** lsr z0\.s, p0/m, z0\.s, #31
+** ret
+*/
+TEST_UNIFORM_Z (div_maxpow_u32_z_tied1, svuint32_t,
+ z0 = svdiv_n_u32_z (p0, z0, MAXPOW),
+ z0 = svdiv_z (p0, z0, MAXPOW))
+
+/*
+** div_maxpow_u32_z_untied:
+** movprfx z0\.s, p0/z, z1\.s
+** lsr z0\.s, p0/m, z0\.s, #31
+** ret
+*/
+TEST_UNIFORM_Z (div_maxpow_u32_z_untied, svuint32_t,
+ z0 = svdiv_n_u32_z (p0, z1, MAXPOW),
+ z0 = svdiv_z (p0, z1, MAXPOW))
/*
** div_u32_x_tied1:
@@ -216,10 +336,26 @@ TEST_UNIFORM_ZX (div_w0_u32_x_untied, svuint32_t, uint32_t,
z0 = svdiv_n_u32_x (p0, z1, x0),
z0 = svdiv_x (p0, z1, x0))
+/*
+** div_1_u32_x_tied1:
+** ret
+*/
+TEST_UNIFORM_Z (div_1_u32_x_tied1, svuint32_t,
+ z0 = svdiv_n_u32_x (p0, z0, 1),
+ z0 = svdiv_x (p0, z0, 1))
+
+/*
+** div_1_u32_x_untied:
+** mov z0\.d, z1\.d
+** ret
+*/
+TEST_UNIFORM_Z (div_1_u32_x_untied, svuint32_t,
+ z0 = svdiv_n_u32_x (p0, z1, 1),
+ z0 = svdiv_x (p0, z1, 1))
+
/*
** div_2_u32_x_tied1:
-** mov (z[0-9]+\.s), #2
-** udiv z0\.s, p0/m, z0\.s, \1
+** lsr z0\.s, z0\.s, #1
** ret
*/
TEST_UNIFORM_Z (div_2_u32_x_tied1, svuint32_t,
@@ -228,10 +364,47 @@ TEST_UNIFORM_Z (div_2_u32_x_tied1, svuint32_t,
/*
** div_2_u32_x_untied:
-** mov z0\.s, #2
-** udivr z0\.s, p0/m, z0\.s, z1\.s
+** lsr z0\.s, z1\.s, #1
** ret
*/
TEST_UNIFORM_Z (div_2_u32_x_untied, svuint32_t,
z0 = svdiv_n_u32_x (p0, z1, 2),
z0 = svdiv_x (p0, z1, 2))
+
+/*
+** div_3_u32_x_tied1:
+** mov (z[0-9]+\.s), #3
+** udiv z0\.s, p0/m, z0\.s, \1
+** ret
+*/
+TEST_UNIFORM_Z (div_3_u32_x_tied1, svuint32_t,
+ z0 = svdiv_n_u32_x (p0, z0, 3),
+ z0 = svdiv_x (p0, z0, 3))
+
+/*
+** div_3_u32_x_untied:
+** mov z0\.s, #3
+** udivr z0\.s, p0/m, z0\.s, z1\.s
+** ret
+*/
+TEST_UNIFORM_Z (div_3_u32_x_untied, svuint32_t,
+ z0 = svdiv_n_u32_x (p0, z1, 3),
+ z0 = svdiv_x (p0, z1, 3))
+
+/*
+** div_maxpow_u32_x_tied1:
+** lsr z0\.s, z0\.s, #31
+** ret
+*/
+TEST_UNIFORM_Z (div_maxpow_u32_x_tied1, svuint32_t,
+ z0 = svdiv_n_u32_x (p0, z0, MAXPOW),
+ z0 = svdiv_x (p0, z0, MAXPOW))
+
+/*
+** div_maxpow_u32_x_untied:
+** lsr z0\.s, z1\.s, #31
+** ret
+*/
+TEST_UNIFORM_Z (div_maxpow_u32_x_untied, svuint32_t,
+ z0 = svdiv_n_u32_x (p0, z1, MAXPOW),
+ z0 = svdiv_x (p0, z1, MAXPOW))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u64.c
index ac7c026eea3..5247ebdac7a 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u64.c
@@ -2,6 +2,8 @@
#include "test_sve_acle.h"
+#define MAXPOW 1ULL<<63
+
/*
** div_u64_m_tied1:
** udiv z0\.d, p0/m, z0\.d, z1\.d
@@ -53,10 +55,27 @@ TEST_UNIFORM_ZX (div_x0_u64_m_untied, svuint64_t, uint64_t,
z0 = svdiv_n_u64_m (p0, z1, x0),
z0 = svdiv_m (p0, z1, x0))
+/*
+** div_1_u64_m_tied1:
+** sel z0\.d, p0, z0\.d, z0\.d
+** ret
+*/
+TEST_UNIFORM_Z (div_1_u64_m_tied1, svuint64_t,
+ z0 = svdiv_n_u64_m (p0, z0, 1),
+ z0 = svdiv_m (p0, z0, 1))
+
+/*
+** div_1_u64_m_untied:
+** sel z0\.d, p0, z1\.d, z1\.d
+** ret
+*/
+TEST_UNIFORM_Z (div_1_u64_m_untied, svuint64_t,
+ z0 = svdiv_n_u64_m (p0, z1, 1),
+ z0 = svdiv_m (p0, z1, 1))
+
/*
** div_2_u64_m_tied1:
-** mov (z[0-9]+\.d), #2
-** udiv z0\.d, p0/m, z0\.d, \1
+** lsr z0\.d, p0/m, z0\.d, #1
** ret
*/
TEST_UNIFORM_Z (div_2_u64_m_tied1, svuint64_t,
@@ -65,15 +84,54 @@ TEST_UNIFORM_Z (div_2_u64_m_tied1, svuint64_t,
/*
** div_2_u64_m_untied:
-** mov (z[0-9]+\.d), #2
** movprfx z0, z1
-** udiv z0\.d, p0/m, z0\.d, \1
+** lsr z0\.d, p0/m, z0\.d, #1
** ret
*/
TEST_UNIFORM_Z (div_2_u64_m_untied, svuint64_t,
z0 = svdiv_n_u64_m (p0, z1, 2),
z0 = svdiv_m (p0, z1, 2))
+/*
+** div_3_u64_m_tied1:
+** mov (z[0-9]+\.d), #3
+** udiv z0\.d, p0/m, z0\.d, \1
+** ret
+*/
+TEST_UNIFORM_Z (div_3_u64_m_tied1, svuint64_t,
+ z0 = svdiv_n_u64_m (p0, z0, 3),
+ z0 = svdiv_m (p0, z0, 3))
+
+/*
+** div_3_u64_m_untied:
+** mov (z[0-9]+\.d), #3
+** movprfx z0, z1
+** udiv z0\.d, p0/m, z0\.d, \1
+** ret
+*/
+TEST_UNIFORM_Z (div_3_u64_m_untied, svuint64_t,
+ z0 = svdiv_n_u64_m (p0, z1, 3),
+ z0 = svdiv_m (p0, z1, 3))
+
+/*
+** div_maxpow_u64_m_tied1:
+** lsr z0\.d, p0/m, z0\.d, #63
+** ret
+*/
+TEST_UNIFORM_Z (div_maxpow_u64_m_tied1, svuint64_t,
+ z0 = svdiv_n_u64_m (p0, z0, MAXPOW),
+ z0 = svdiv_m (p0, z0, MAXPOW))
+
+/*
+** div_maxpow_u64_m_untied:
+** movprfx z0, z1
+** lsr z0\.d, p0/m, z0\.d, #63
+** ret
+*/
+TEST_UNIFORM_Z (div_maxpow_u64_m_untied, svuint64_t,
+ z0 = svdiv_n_u64_m (p0, z1, MAXPOW),
+ z0 = svdiv_m (p0, z1, MAXPOW))
+
/*
** div_u64_z_tied1:
** movprfx z0\.d, p0/z, z0\.d
@@ -137,19 +195,61 @@ TEST_UNIFORM_ZX (div_x0_u64_z_untied, svuint64_t, uint64_t,
z0 = svdiv_z (p0, z1, x0))
/*
-** div_2_u64_z_tied1:
-** mov (z[0-9]+\.d), #2
+** div_1_u64_z_tied1:
+** mov (z[0-9]+\.d), #1
** movprfx z0\.d, p0/z, z0\.d
** udiv z0\.d, p0/m, z0\.d, \1
** ret
*/
+TEST_UNIFORM_Z (div_1_u64_z_tied1, svuint64_t,
+ z0 = svdiv_n_u64_z (p0, z0, 1),
+ z0 = svdiv_z (p0, z0, 1))
+
+/*
+** div_1_u64_z_untied:
+** mov z0\.d, #1
+** movprfx z0\.d, p0/z, z0\.d
+** udivr z0\.d, p0/m, z0\.d, z1\.d
+** ret
+*/
+TEST_UNIFORM_Z (div_1_u64_z_untied, svuint64_t,
+ z0 = svdiv_n_u64_z (p0, z1, 1),
+ z0 = svdiv_z (p0, z1, 1))
+
+/*
+** div_2_u64_z_tied1:
+** movprfx z0\.d, p0/z, z0\.d
+** lsr z0\.d, p0/m, z0\.d, #1
+** ret
+*/
TEST_UNIFORM_Z (div_2_u64_z_tied1, svuint64_t,
z0 = svdiv_n_u64_z (p0, z0, 2),
z0 = svdiv_z (p0, z0, 2))
/*
** div_2_u64_z_untied:
-** mov (z[0-9]+\.d), #2
+** movprfx z0\.d, p0/z, z1\.d
+** lsr z0\.d, p0/m, z0\.d, #1
+** ret
+*/
+TEST_UNIFORM_Z (div_2_u64_z_untied, svuint64_t,
+ z0 = svdiv_n_u64_z (p0, z1, 2),
+ z0 = svdiv_z (p0, z1, 2))
+
+/*
+** div_3_u64_z_tied1:
+** mov (z[0-9]+\.d), #3
+** movprfx z0\.d, p0/z, z0\.d
+** udiv z0\.d, p0/m, z0\.d, \1
+** ret
+*/
+TEST_UNIFORM_Z (div_3_u64_z_tied1, svuint64_t,
+ z0 = svdiv_n_u64_z (p0, z0, 3),
+ z0 = svdiv_z (p0, z0, 3))
+
+/*
+** div_3_u64_z_untied:
+** mov (z[0-9]+\.d), #3
** (
** movprfx z0\.d, p0/z, z1\.d
** udiv z0\.d, p0/m, z0\.d, \1
@@ -159,9 +259,29 @@ TEST_UNIFORM_Z (div_2_u64_z_tied1, svuint64_t,
** )
** ret
*/
-TEST_UNIFORM_Z (div_2_u64_z_untied, svuint64_t,
- z0 = svdiv_n_u64_z (p0, z1, 2),
- z0 = svdiv_z (p0, z1, 2))
+TEST_UNIFORM_Z (div_3_u64_z_untied, svuint64_t,
+ z0 = svdiv_n_u64_z (p0, z1, 3),
+ z0 = svdiv_z (p0, z1, 3))
+
+/*
+** div_maxpow_u64_z_tied1:
+** movprfx z0\.d, p0/z, z0\.d
+** lsr z0\.d, p0/m, z0\.d, #63
+** ret
+*/
+TEST_UNIFORM_Z (div_maxpow_u64_z_tied1, svuint64_t,
+ z0 = svdiv_n_u64_z (p0, z0, MAXPOW),
+ z0 = svdiv_z (p0, z0, MAXPOW))
+
+/*
+** div_maxpow_u64_z_untied:
+** movprfx z0\.d, p0/z, z1\.d
+** lsr z0\.d, p0/m, z0\.d, #63
+** ret
+*/
+TEST_UNIFORM_Z (div_maxpow_u64_z_untied, svuint64_t,
+ z0 = svdiv_n_u64_z (p0, z1, MAXPOW),
+ z0 = svdiv_z (p0, z1, MAXPOW))
/*
** div_u64_x_tied1:
@@ -216,10 +336,26 @@ TEST_UNIFORM_ZX (div_x0_u64_x_untied, svuint64_t, uint64_t,
z0 = svdiv_n_u64_x (p0, z1, x0),
z0 = svdiv_x (p0, z1, x0))
+/*
+** div_1_u64_x_tied1:
+** ret
+*/
+TEST_UNIFORM_Z (div_1_u64_x_tied1, svuint64_t,
+ z0 = svdiv_n_u64_x (p0, z0, 1),
+ z0 = svdiv_x (p0, z0, 1))
+
+/*
+** div_1_u64_x_untied:
+** mov z0\.d, z1\.d
+** ret
+*/
+TEST_UNIFORM_Z (div_1_u64_x_untied, svuint64_t,
+ z0 = svdiv_n_u64_x (p0, z1, 1),
+ z0 = svdiv_x (p0, z1, 1))
+
/*
** div_2_u64_x_tied1:
-** mov (z[0-9]+\.d), #2
-** udiv z0\.d, p0/m, z0\.d, \1
+** lsr z0\.d, z0\.d, #1
** ret
*/
TEST_UNIFORM_Z (div_2_u64_x_tied1, svuint64_t,
@@ -228,10 +364,47 @@ TEST_UNIFORM_Z (div_2_u64_x_tied1, svuint64_t,
/*
** div_2_u64_x_untied:
-** mov z0\.d, #2
-** udivr z0\.d, p0/m, z0\.d, z1\.d
+** lsr z0\.d, z1\.d, #1
** ret
*/
TEST_UNIFORM_Z (div_2_u64_x_untied, svuint64_t,
z0 = svdiv_n_u64_x (p0, z1, 2),
z0 = svdiv_x (p0, z1, 2))
+
+/*
+** div_3_u64_x_tied1:
+** mov (z[0-9]+\.d), #3
+** udiv z0\.d, p0/m, z0\.d, \1
+** ret
+*/
+TEST_UNIFORM_Z (div_3_u64_x_tied1, svuint64_t,
+ z0 = svdiv_n_u64_x (p0, z0, 3),
+ z0 = svdiv_x (p0, z0, 3))
+
+/*
+** div_3_u64_x_untied:
+** mov z0\.d, #3
+** udivr z0\.d, p0/m, z0\.d, z1\.d
+** ret
+*/
+TEST_UNIFORM_Z (div_3_u64_x_untied, svuint64_t,
+ z0 = svdiv_n_u64_x (p0, z1, 3),
+ z0 = svdiv_x (p0, z1, 3))
+
+/*
+** div_maxpow_u64_x_tied1:
+** lsr z0\.d, z0\.d, #63
+** ret
+*/
+TEST_UNIFORM_Z (div_maxpow_u64_x_tied1, svuint64_t,
+ z0 = svdiv_n_u64_x (p0, z0, MAXPOW),
+ z0 = svdiv_x (p0, z0, MAXPOW))
+
+/*
+** div_maxpow_u64_x_untied:
+** lsr z0\.d, z1\.d, #63
+** ret
+*/
+TEST_UNIFORM_Z (div_maxpow_u64_x_untied, svuint64_t,
+ z0 = svdiv_n_u64_x (p0, z1, MAXPOW),
+ z0 = svdiv_x (p0, z1, MAXPOW))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/div_const_run.c b/gcc/testsuite/gcc.target/aarch64/sve/div_const_run.c
new file mode 100644
index 00000000000..1a3c25b817d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/div_const_run.c
@@ -0,0 +1,91 @@
+/* { dg-do run { target aarch64_sve128_hw } } */
+/* { dg-options "-O2 -msve-vector-bits=128" } */
+
+#include <arm_sve.h>
+#include <stdint.h>
+
+typedef svbool_t pred __attribute__((arm_sve_vector_bits(128)));
+typedef svfloat16_t svfloat16_ __attribute__((arm_sve_vector_bits(128)));
+typedef svfloat32_t svfloat32_ __attribute__((arm_sve_vector_bits(128)));
+typedef svfloat64_t svfloat64_ __attribute__((arm_sve_vector_bits(128)));
+typedef svint32_t svint32_ __attribute__((arm_sve_vector_bits(128)));
+typedef svint64_t svint64_ __attribute__((arm_sve_vector_bits(128)));
+typedef svuint32_t svuint32_ __attribute__((arm_sve_vector_bits(128)));
+typedef svuint64_t svuint64_ __attribute__((arm_sve_vector_bits(128)));
+
+#define F(T, TS, P, OP1, OP2) \
+{ \
+ T##_t op1 = (T##_t) OP1; \
+ T##_t op2 = (T##_t) OP2; \
+ sv##T##_ res = svdiv_##P (pg, svdup_##TS (op1), svdup_##TS (op2)); \
+ sv##T##_ exp = svdup_##TS (op1 / op2); \
+ if (svptest_any (pg, svcmpne (pg, exp, res))) \
+ __builtin_abort (); \
+ \
+ sv##T##_ res_n = svdiv_##P (pg, svdup_##TS (op1), op2); \
+ if (svptest_any (pg, svcmpne (pg, exp, res_n))) \
+ __builtin_abort (); \
+}
+
+#define TEST_TYPES_1(T, TS) \
+ F (T, TS, m, 79, 16) \
+ F (T, TS, z, 79, 16) \
+ F (T, TS, x, 79, 16)
+
+#define TEST_TYPES \
+ TEST_TYPES_1 (float16, f16) \
+ TEST_TYPES_1 (float32, f32) \
+ TEST_TYPES_1 (float64, f64) \
+ TEST_TYPES_1 (int32, s32) \
+ TEST_TYPES_1 (int64, s64) \
+ TEST_TYPES_1 (uint32, u32) \
+ TEST_TYPES_1 (uint64, u64)
+
+#define TEST_VALUES_S_1(B, OP1, OP2) \
+ F (int##B, s##B, x, OP1, OP2)
+
+#define TEST_VALUES_S \
+ TEST_VALUES_S_1 (32, INT32_MIN, INT32_MIN) \
+ TEST_VALUES_S_1 (64, INT64_MIN, INT64_MIN) \
+ TEST_VALUES_S_1 (32, -7, 4) \
+ TEST_VALUES_S_1 (64, -7, 4) \
+ TEST_VALUES_S_1 (32, INT32_MAX, (1 << 30)) \
+ TEST_VALUES_S_1 (64, INT64_MAX, (1ULL << 62)) \
+ TEST_VALUES_S_1 (32, INT32_MIN, (1 << 30)) \
+ TEST_VALUES_S_1 (64, INT64_MIN, (1ULL << 62)) \
+ TEST_VALUES_S_1 (32, INT32_MAX, 1) \
+ TEST_VALUES_S_1 (64, INT64_MAX, 1) \
+ TEST_VALUES_S_1 (32, INT32_MIN, 16) \
+ TEST_VALUES_S_1 (64, INT64_MIN, 16) \
+ TEST_VALUES_S_1 (32, INT32_MAX, -5) \
+ TEST_VALUES_S_1 (64, INT64_MAX, -5) \
+ TEST_VALUES_S_1 (32, INT32_MIN, -4) \
+ TEST_VALUES_S_1 (64, INT64_MIN, -4)
+
+#define TEST_VALUES_U_1(B, OP1, OP2) \
+ F (uint##B, u##B, x, OP1, OP2)
+
+#define TEST_VALUES_U \
+ TEST_VALUES_U_1 (32, UINT32_MAX, UINT32_MAX) \
+ TEST_VALUES_U_1 (64, UINT64_MAX, UINT64_MAX) \
+ TEST_VALUES_U_1 (32, UINT32_MAX, (1 << 31)) \
+ TEST_VALUES_U_1 (64, UINT64_MAX, (1ULL << 63)) \
+ TEST_VALUES_U_1 (32, 7, 4) \
+ TEST_VALUES_U_1 (64, 7, 4) \
+ TEST_VALUES_U_1 (32, 7, 3) \
+ TEST_VALUES_U_1 (64, 7, 3) \
+ TEST_VALUES_U_1 (32, 11, 1) \
+ TEST_VALUES_U_1 (64, 11, 1)
+
+#define TEST_VALUES \
+ TEST_VALUES_S \
+ TEST_VALUES_U
+
+int
+main (void)
+{
+ const pred pg = svptrue_b64 ();
+ TEST_TYPES
+ TEST_VALUES
+ return 0;
+}
--
2.44.0
[-- Attachment #1.3: Type: text/plain, Size: 21930 bytes --]
> On 17 Jul 2024, at 09:57, Richard Sandiford <richard.sandiford@arm.com> wrote:
>
> External email: Use caution opening links or attachments
>
>
> Jennifer Schmitz <jschmitz@nvidia.com> writes:
>> This patch folds signed SVE division where all divisor elements are the same
>> power of 2 to svasrd. Tests were added to check 1) whether the transform is
>> applied, i.e. asrd is used, and 2) correctness for all possible input types
>> for svdiv, predication, and a variety of values. As the transform is applied
>> only to signed integers, correctness for predication and values was only
>> tested for svint32_t and svint64_t.
>> Existing svdiv tests were adjusted such that the divisor is no longer a
>> power of 2.
>>
>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
>> OK for mainline?
>>
>> Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
>>
>> gcc/
>>
>> * config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl): Implement
>> fold and expand.
>>
>> gcc/testsuite/
>>
>> * gcc.target/aarch64/sve/div_const_1.c: New test.
>> * gcc.target/aarch64/sve/div_const_1_run.c: Likewise.
>> * gcc.target/aarch64/sve/acle/asm/div_s32.c: Adjust expected output.
>> * gcc.target/aarch64/sve/acle/asm/div_s64.c: Likewise.
>>
>> From e8ffbab52ad7b9307cbfc9dbca4ef4d20e08804b Mon Sep 17 00:00:00 2001
>> From: Jennifer Schmitz <jschmitz@nvidia.com>
>> Date: Tue, 16 Jul 2024 01:59:50 -0700
>> Subject: [PATCH 1/2] SVE intrinsics: Add strength reduction for division by
>> constant.
>>
>> This patch folds signed SVE division where all divisor elements are the same
>> power of 2 to svasrd. Tests were added to check 1) whether the transform is
>> applied, i.e. asrd is used, and 2) correctness for all possible input types
>> for svdiv, predication, and a variety of values. As the transform is applied
>> only to signed integers, correctness for predication and values was only
>> tested for svint32_t and svint64_t.
>> Existing svdiv tests were adjusted such that the divisor is no longer a
>> power of 2.
>>
>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
>> OK for mainline?
>>
>> Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
>>
>> gcc/
>>
>> * config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl): Implement
>> fold and expand.
>>
>> gcc/testsuite/
>>
>> * gcc.target/aarch64/sve/div_const_1.c: New test.
>> * gcc.target/aarch64/sve/div_const_1_run.c: Likewise.
>> * gcc.target/aarch64/sve/acle/asm/div_s32.c: Adjust expected output.
>> * gcc.target/aarch64/sve/acle/asm/div_s64.c: Likewise.
>> ---
>> .../aarch64/aarch64-sve-builtins-base.cc | 44 ++++++++-
>> .../gcc.target/aarch64/sve/acle/asm/div_s32.c | 60 ++++++------
>> .../gcc.target/aarch64/sve/acle/asm/div_s64.c | 60 ++++++------
>> .../gcc.target/aarch64/sve/div_const_1.c | 34 +++++++
>> .../gcc.target/aarch64/sve/div_const_1_run.c | 91 +++++++++++++++++++
>> 5 files changed, 228 insertions(+), 61 deletions(-)
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/div_const_1.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/div_const_1_run.c
>>
>> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.ccb/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>> index aa26370d397..d821cc96588 100644
>> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>> @@ -746,6 +746,48 @@ public:
>> }
>> };
>>
>> +class svdiv_impl : public unspec_based_function
>> +{
>> +public:
>> + CONSTEXPR svdiv_impl ()
>> + : unspec_based_function (DIV, UDIV, UNSPEC_COND_FDIV) {}
>> +
>> + gimple *
>> + fold (gimple_folder &f) const override
>> + {
>> + tree divisor = gimple_call_arg (f.call, 2);
>> + tree divisor_cst = uniform_integer_cst_p (divisor);
>> +
>> + if (f.type_suffix (0).unsigned_p)
>> + {
>> + return NULL;
>> + }
>
> We might as well test this first, since it doesn't depend on the
> divisor_cst result.
>
> Formatting nit: should be no braces for single statements, so:
>
> if (f.type_suffix (0).unsigned_p)
> return NULL;
>
> Same for the others.
Done.
>
>> +
>> + if (!divisor_cst)
>> + {
>> + return NULL;
>> + }
>> +
>> + if (!integer_pow2p (divisor_cst))
>> + {
>> + return NULL;
>> + }
>> +
>> + function_instance instance ("svasrd", functions::svasrd, shapes::shift_right_imm, MODE_n, f.type_suffix_ids, GROUP_none, f.pred);
>
> This line is above the 80 character limit. Maybe:
>
> function_instance instance ("svasrd", functions::svasrd,
> shapes::shift_right_imm, MODE_n,
> f.type_suffix_ids, GROUP_none, f.pred);
Done.
>> + gcall *call = as_a <gcall *> (f.redirect_call (instance));
>
> Looks like an oversight that redirect_call doesn't return a gcall directly.
> IMO it'd better to fix that instead.
I submitted a patch fixing this.
>> + tree shift_amt = wide_int_to_tree (TREE_TYPE (divisor_cst), tree_log2 (divisor_cst));
>
> This ought to have type uint64_t instead, to match the function prototype.
> That can be had from scalar_types[VECTOR_TYPE_svuint64_t].
Done, thank you.
>
>> + gimple_call_set_arg (call, 2, shift_amt);
>> + return call;
>> + }
>> +
>> + rtx
>> + expand (function_expander &e) const override
>> + {
>> + return e.map_to_rtx_codes (DIV, UDIV, UNSPEC_COND_FDIV, -1, DEFAULT_MERGE_ARGNO);
>> + }
>
> This shouldn't be necessary, given the inheritance from unspec_based_function.
>
I should have used inheritance from rtx_code_function to make it work. Now, that is fixed.
>> +};
>> +
>> +
>> class svdot_impl : public function_base
>> {
>> public:
>> @@ -3043,7 +3085,7 @@ FUNCTION (svcreate3, svcreate_impl, (3))
>> FUNCTION (svcreate4, svcreate_impl, (4))
>> FUNCTION (svcvt, svcvt_impl,)
>> FUNCTION (svcvtnt, CODE_FOR_MODE0 (aarch64_sve_cvtnt),)
>> -FUNCTION (svdiv, rtx_code_function, (DIV, UDIV, UNSPEC_COND_FDIV))
>> +FUNCTION (svdiv, svdiv_impl,)
>> FUNCTION (svdivr, rtx_code_function_rotated, (DIV, UDIV, UNSPEC_COND_FDIV))
>> FUNCTION (svdot, svdot_impl,)
>> FUNCTION (svdot_lane, svdotprod_lane_impl, (UNSPEC_SDOT, UNSPEC_UDOT,
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c
>> index c49ca1aa524..da2fe7c5451 100644
>> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c
>> @@ -54,25 +54,25 @@ TEST_UNIFORM_ZX (div_w0_s32_m_untied, svint32_t, int32_t,
>> z0 = svdiv_m (p0, z1, x0))
>>
>> /*
>> -** div_2_s32_m_tied1:
>> -** mov (z[0-9]+\.s), #2
>> +** div_3_s32_m_tied1:
>> +** mov (z[0-9]+\.s), #3
>> ** sdiv z0\.s, p0/m, z0\.s, \1
>> ** ret
>> */
>> -TEST_UNIFORM_Z (div_2_s32_m_tied1, svint32_t,
>> - z0 = svdiv_n_s32_m (p0, z0, 2),
>> - z0 = svdiv_m (p0, z0, 2))
>> +TEST_UNIFORM_Z (div_3_s32_m_tied1, svint32_t,
>> + z0 = svdiv_n_s32_m (p0, z0, 3),
>> + z0 = svdiv_m (p0, z0, 3))
>
> I think we should test both 2 and 3, using this harness to make sure
> that svdiv of 2 does become svasrd. (Especially since the new test
> is specific to fixed-length vectors.)
>
> It would be good to test the limits too: 1 and 1<<30. Presumably
> 0b1000... (-1<<31) shouldn't be optimised, so we should test that too.
>
> Same idea (with adjusted limits) for s64.
Now, there are tests for 1, 2, 3, 1<<30 ( or 1ULL<<62) for u32, u64, s32, and s64, and also INTMIN for s32 and s64.
>
> Thanks,
> Richard
>
>>
>> /*
>> -** div_2_s32_m_untied:
>> -** mov (z[0-9]+\.s), #2
>> +** div_3_s32_m_untied:
>> +** mov (z[0-9]+\.s), #3
>> ** movprfx z0, z1
>> ** sdiv z0\.s, p0/m, z0\.s, \1
>> ** ret
>> */
>> -TEST_UNIFORM_Z (div_2_s32_m_untied, svint32_t,
>> - z0 = svdiv_n_s32_m (p0, z1, 2),
>> - z0 = svdiv_m (p0, z1, 2))
>> +TEST_UNIFORM_Z (div_3_s32_m_untied, svint32_t,
>> + z0 = svdiv_n_s32_m (p0, z1, 3),
>> + z0 = svdiv_m (p0, z1, 3))
>>
>> /*
>> ** div_s32_z_tied1:
>> @@ -137,19 +137,19 @@ TEST_UNIFORM_ZX (div_w0_s32_z_untied, svint32_t, int32_t,
>> z0 = svdiv_z (p0, z1, x0))
>>
>> /*
>> -** div_2_s32_z_tied1:
>> -** mov (z[0-9]+\.s), #2
>> +** div_3_s32_z_tied1:
>> +** mov (z[0-9]+\.s), #3
>> ** movprfx z0\.s, p0/z, z0\.s
>> ** sdiv z0\.s, p0/m, z0\.s, \1
>> ** ret
>> */
>> -TEST_UNIFORM_Z (div_2_s32_z_tied1, svint32_t,
>> - z0 = svdiv_n_s32_z (p0, z0, 2),
>> - z0 = svdiv_z (p0, z0, 2))
>> +TEST_UNIFORM_Z (div_3_s32_z_tied1, svint32_t,
>> + z0 = svdiv_n_s32_z (p0, z0, 3),
>> + z0 = svdiv_z (p0, z0, 3))
>>
>> /*
>> -** div_2_s32_z_untied:
>> -** mov (z[0-9]+\.s), #2
>> +** div_3_s32_z_untied:
>> +** mov (z[0-9]+\.s), #3
>> ** (
>> ** movprfx z0\.s, p0/z, z1\.s
>> ** sdiv z0\.s, p0/m, z0\.s, \1
>> @@ -159,9 +159,9 @@ TEST_UNIFORM_Z (div_2_s32_z_tied1, svint32_t,
>> ** )
>> ** ret
>> */
>> -TEST_UNIFORM_Z (div_2_s32_z_untied, svint32_t,
>> - z0 = svdiv_n_s32_z (p0, z1, 2),
>> - z0 = svdiv_z (p0, z1, 2))
>> +TEST_UNIFORM_Z (div_3_s32_z_untied, svint32_t,
>> + z0 = svdiv_n_s32_z (p0, z1, 3),
>> + z0 = svdiv_z (p0, z1, 3))
>>
>> /*
>> ** div_s32_x_tied1:
>> @@ -217,21 +217,21 @@ TEST_UNIFORM_ZX (div_w0_s32_x_untied, svint32_t, int32_t,
>> z0 = svdiv_x (p0, z1, x0))
>>
>> /*
>> -** div_2_s32_x_tied1:
>> -** mov (z[0-9]+\.s), #2
>> +** div_3_s32_x_tied1:
>> +** mov (z[0-9]+\.s), #3
>> ** sdiv z0\.s, p0/m, z0\.s, \1
>> ** ret
>> */
>> -TEST_UNIFORM_Z (div_2_s32_x_tied1, svint32_t,
>> - z0 = svdiv_n_s32_x (p0, z0, 2),
>> - z0 = svdiv_x (p0, z0, 2))
>> +TEST_UNIFORM_Z (div_3_s32_x_tied1, svint32_t,
>> + z0 = svdiv_n_s32_x (p0, z0, 3),
>> + z0 = svdiv_x (p0, z0, 3))
>>
>> /*
>> -** div_2_s32_x_untied:
>> -** mov z0\.s, #2
>> +** div_3_s32_x_untied:
>> +** mov z0\.s, #3
>> ** sdivr z0\.s, p0/m, z0\.s, z1\.s
>> ** ret
>> */
>> -TEST_UNIFORM_Z (div_2_s32_x_untied, svint32_t,
>> - z0 = svdiv_n_s32_x (p0, z1, 2),
>> - z0 = svdiv_x (p0, z1, 2))
>> +TEST_UNIFORM_Z (div_3_s32_x_untied, svint32_t,
>> + z0 = svdiv_n_s32_x (p0, z1, 3),
>> + z0 = svdiv_x (p0, z1, 3))
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s64.c
>> index 464dca28d74..e4af406344b 100644
>> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s64.c
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s64.c
>> @@ -54,25 +54,25 @@ TEST_UNIFORM_ZX (div_x0_s64_m_untied, svint64_t, int64_t,
>> z0 = svdiv_m (p0, z1, x0))
>>
>> /*
>> -** div_2_s64_m_tied1:
>> -** mov (z[0-9]+\.d), #2
>> +** div_3_s64_m_tied1:
>> +** mov (z[0-9]+\.d), #3
>> ** sdiv z0\.d, p0/m, z0\.d, \1
>> ** ret
>> */
>> -TEST_UNIFORM_Z (div_2_s64_m_tied1, svint64_t,
>> - z0 = svdiv_n_s64_m (p0, z0, 2),
>> - z0 = svdiv_m (p0, z0, 2))
>> +TEST_UNIFORM_Z (div_3_s64_m_tied1, svint64_t,
>> + z0 = svdiv_n_s64_m (p0, z0, 3),
>> + z0 = svdiv_m (p0, z0, 3))
>>
>> /*
>> -** div_2_s64_m_untied:
>> -** mov (z[0-9]+\.d), #2
>> +** div_3_s64_m_untied:
>> +** mov (z[0-9]+\.d), #3
>> ** movprfx z0, z1
>> ** sdiv z0\.d, p0/m, z0\.d, \1
>> ** ret
>> */
>> -TEST_UNIFORM_Z (div_2_s64_m_untied, svint64_t,
>> - z0 = svdiv_n_s64_m (p0, z1, 2),
>> - z0 = svdiv_m (p0, z1, 2))
>> +TEST_UNIFORM_Z (div_3_s64_m_untied, svint64_t,
>> + z0 = svdiv_n_s64_m (p0, z1, 3),
>> + z0 = svdiv_m (p0, z1, 3))
>>
>> /*
>> ** div_s64_z_tied1:
>> @@ -137,19 +137,19 @@ TEST_UNIFORM_ZX (div_x0_s64_z_untied, svint64_t, int64_t,
>> z0 = svdiv_z (p0, z1, x0))
>>
>> /*
>> -** div_2_s64_z_tied1:
>> -** mov (z[0-9]+\.d), #2
>> +** div_3_s64_z_tied1:
>> +** mov (z[0-9]+\.d), #3
>> ** movprfx z0\.d, p0/z, z0\.d
>> ** sdiv z0\.d, p0/m, z0\.d, \1
>> ** ret
>> */
>> -TEST_UNIFORM_Z (div_2_s64_z_tied1, svint64_t,
>> - z0 = svdiv_n_s64_z (p0, z0, 2),
>> - z0 = svdiv_z (p0, z0, 2))
>> +TEST_UNIFORM_Z (div_3_s64_z_tied1, svint64_t,
>> + z0 = svdiv_n_s64_z (p0, z0, 3),
>> + z0 = svdiv_z (p0, z0, 3))
>>
>> /*
>> -** div_2_s64_z_untied:
>> -** mov (z[0-9]+\.d), #2
>> +** div_3_s64_z_untied:
>> +** mov (z[0-9]+\.d), #3
>> ** (
>> ** movprfx z0\.d, p0/z, z1\.d
>> ** sdiv z0\.d, p0/m, z0\.d, \1
>> @@ -159,9 +159,9 @@ TEST_UNIFORM_Z (div_2_s64_z_tied1, svint64_t,
>> ** )
>> ** ret
>> */
>> -TEST_UNIFORM_Z (div_2_s64_z_untied, svint64_t,
>> - z0 = svdiv_n_s64_z (p0, z1, 2),
>> - z0 = svdiv_z (p0, z1, 2))
>> +TEST_UNIFORM_Z (div_3_s64_z_untied, svint64_t,
>> + z0 = svdiv_n_s64_z (p0, z1, 3),
>> + z0 = svdiv_z (p0, z1, 3))
>>
>> /*
>> ** div_s64_x_tied1:
>> @@ -217,21 +217,21 @@ TEST_UNIFORM_ZX (div_x0_s64_x_untied, svint64_t, int64_t,
>> z0 = svdiv_x (p0, z1, x0))
>>
>> /*
>> -** div_2_s64_x_tied1:
>> -** mov (z[0-9]+\.d), #2
>> +** div_3_s64_x_tied1:
>> +** mov (z[0-9]+\.d), #3
>> ** sdiv z0\.d, p0/m, z0\.d, \1
>> ** ret
>> */
>> -TEST_UNIFORM_Z (div_2_s64_x_tied1, svint64_t,
>> - z0 = svdiv_n_s64_x (p0, z0, 2),
>> - z0 = svdiv_x (p0, z0, 2))
>> +TEST_UNIFORM_Z (div_3_s64_x_tied1, svint64_t,
>> + z0 = svdiv_n_s64_x (p0, z0, 3),
>> + z0 = svdiv_x (p0, z0, 3))
>>
>> /*
>> -** div_2_s64_x_untied:
>> -** mov z0\.d, #2
>> +** div_3_s64_x_untied:
>> +** mov z0\.d, #3
>> ** sdivr z0\.d, p0/m, z0\.d, z1\.d
>> ** ret
>> */
>> -TEST_UNIFORM_Z (div_2_s64_x_untied, svint64_t,
>> - z0 = svdiv_n_s64_x (p0, z1, 2),
>> - z0 = svdiv_x (p0, z1, 2))
>> +TEST_UNIFORM_Z (div_3_s64_x_untied, svint64_t,
>> + z0 = svdiv_n_s64_x (p0, z1, 3),
>> + z0 = svdiv_x (p0, z1, 3))
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/div_const_1.c b/gcc/testsuite/gcc.target/aarch64/sve/div_const_1.c
>> new file mode 100644
>> index 00000000000..ac6ef1c73d4
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/div_const_1.c
>> @@ -0,0 +1,34 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -msve-vector-bits=128" } */
>> +/* { dg-final { check-function-bodies "**" "" } } */
>> +
>> +#include <arm_sve.h>
>> +
>> +typedef svbool_t pred __attribute__((arm_sve_vector_bits(128)));
>> +typedef svint64_t svint64_2 __attribute__((arm_sve_vector_bits(128)));
>> +typedef svuint64_t svuint64_2 __attribute__((arm_sve_vector_bits(128)));
>> +
>> +/*
>> +** f1:
>> +** ptrue (p[0-7])\.b, vl16
>> +** asrd (z[0-9]+\.d), \1/m, \2, #2
>> +** ret
>> +*/
>> +svint64_2 f1 (svint64_2 p)
>> +{
>> + const pred pg = svptrue_b64 ();
>> + return svdiv_x (pg, p, (svint64_2) {4, 4});
>> +}
>> +
>> +/*
>> +** f2:
>> +** ptrue (p[0-7])\.b, vl16
>> +** mov (z[0-9]+\.d), #4
>> +** udiv (z[0-9]+\.d), \1/m, \3, \2
>> +** ret
>> +*/
>> +svuint64_2 f2 (svuint64_2 p)
>> +{
>> + const pred pg = svptrue_b64 ();
>> + return svdiv_x (pg, p, (svuint64_2) {4, 4});
>> +}
This test was dropped, because it is redundant with the existing test harness.
>
>>
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/div_const_1_run.c b/gcc/testsuite/gcc.target/aarch64/sve/div_const_1_run.c
>> new file mode 100644
>> index 00000000000..a15c597d5bd
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/div_const_1_run.c
>> @@ -0,0 +1,91 @@
>> +/* { dg-do run { target aarch64_sve_hw } } */
>> +/* { dg-options "-O2 -msve-vector-bits=128" } */
>> +
>> +#include <arm_sve.h>
>> +#include <stdint.h>
>> +
>> +typedef svbool_t pred __attribute__((arm_sve_vector_bits(128)));
>> +typedef svfloat16_t svfloat16_ __attribute__((arm_sve_vector_bits(128)));
>> +typedef svfloat32_t svfloat32_ __attribute__((arm_sve_vector_bits(128)));
>> +typedef svfloat64_t svfloat64_ __attribute__((arm_sve_vector_bits(128)));
>> +typedef svint32_t svint32_ __attribute__((arm_sve_vector_bits(128)));
>> +typedef svint64_t svint64_ __attribute__((arm_sve_vector_bits(128)));
>> +typedef svuint32_t svuint32_ __attribute__((arm_sve_vector_bits(128)));
>> +typedef svuint64_t svuint64_ __attribute__((arm_sve_vector_bits(128)));
>> +
>> +#define T1(TY, TYS, P) \
>> +{ \
>> + TY##_t a = (TY##_t) 79; \
>> + TY##_t b = (TY##_t) 16; \
>> + sv##TY##_ res = svdiv_##P (pg, svdup_##TYS (a), svdup_##TYS (b)); \
>> + sv##TY##_ exp = svdup_##TYS (a / b); \
>> + if (svptest_any (pg, svcmpne (pg, exp, res))) \
>> + __builtin_abort (); \
>> +}
>> +
>> +#define T2(B) \
>> +{ \
>> + int##B##_t a[] = {0, -1, 1, INT##B##_MAX, INT##B##_MIN, -5, 5}; \
>> + int##B##_t b[] = {-1, 1, -4, 4, -5, 5, INT##B##_MAX, INT##B##_MIN}; \
>> + int length_a = sizeof (a) / sizeof (a[0]); \
>> + int length_b = sizeof (b) / sizeof (b[0]); \
>> + for (int i = 0; i < length_a; ++i) \
>> + { \
>> + for (int j = 0; j < length_b; ++j) \
>> + { \
>> + svint##B##_ op1 = svdup_s##B (a[i]); \
>> + svint##B##_ op2 = svdup_s##B (b[j]); \
>> + svint##B##_ res = svdiv_x (pg, op1, op2); \
>> + svint##B##_ exp = svdup_s##B (a[i] / b[j]); \
>> + if (svptest_any (pg, svcmpne (pg, exp, res))) \
>> + __builtin_abort (); \
>> + } \
>> + } \
>> +}
>> +
>> +#define TEST_VALUES_ASRD2 \
>> +{ \
>> + svint32_ op1_32 = (svint32_) {0, 16, -79, -1}; \
>> + svint32_ op2_32 = (svint32_) {5, 8, -32, 1}; \
>> + svint32_ res_32 = svdiv_x (pg, op1_32, op2_32); \
>> + svint32_ exp_32 = (svint32_) {0 / 5, 16 / 8, -79 / -32, -1 / 1}; \
>> + if (svptest_any (pg, svcmpne (pg, exp_32, res_32))) \
>> + __builtin_abort (); \
>> + \
>> + svint64_ op1_64 = (svint64_) {83, -11}; \
>> + svint64_ op2_64 = (svint64_) {16, 5}; \
>> + svint64_ res_64 = svdiv_x (pg, op1_64, op2_64); \
>> + svint64_ exp_64 = (svint64_) {83 / 16, -11 / 5}; \
>> + if (svptest_any (pg, svcmpne (pg, exp_64, res_64))) \
>> + __builtin_abort (); \
>> +}
>> +
>> +#define TEST_TYPES(T) \
>> + T (float16, f16, x) \
>> + T (float32, f32, x) \
>> + T (float64, f64, x) \
>> + T (int32, s32, x) \
>> + T (int64, s64, x) \
>> + T (uint32, u32, x) \
>> + T (uint64, u64, x) \
>> +
>> +#define TEST_PREDICATION(T) \
>> + T (int32, s32, z) \
>> + T (int32, s32, m) \
>> + T (int64, s64, z) \
>> + T (int64, s64, m) \
>> +
>> +#define TEST_VALUES_ASRD1(T) \
>> + T (32) \
>> + T (64) \
>> +
>> +int
>> +main (void)
>> +{
>> + const pred pg = svptrue_b64 ();
>> + TEST_TYPES (T1)
>> + TEST_PREDICATION (T1)
>> + TEST_VALUES_ASRD1 (T2)
>> + TEST_VALUES_ASRD2
>> + return 0;
>> +}
This runtime test was revised to have individual scopes for each test.
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 4312 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] SVE intrinsics: Add strength reduction for division by constant.
2024-07-29 14:07 ` Jennifer Schmitz
@ 2024-07-29 20:55 ` Richard Sandiford
2024-07-30 7:47 ` Jennifer Schmitz
0 siblings, 1 reply; 6+ messages in thread
From: Richard Sandiford @ 2024-07-29 20:55 UTC (permalink / raw)
To: Jennifer Schmitz; +Cc: gcc-patches, Kyrylo Tkachov
Thanks for doing this.
Jennifer Schmitz <jschmitz@nvidia.com> writes:
> [...]
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c
> index c49ca1aa524..6500b64c41b 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c
> @@ -1,6 +1,9 @@
> /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
>
> #include "test_sve_acle.h"
> +#include <stdint.h>
> +
I think it'd better to drop the explicit include of stdint.h. arm_sve.h
is defined to include stdint.h itself, and we rely on that elsewhere.
Same for div_s64.c.
> +#define MAXPOW 1<<30
>
> /*
> ** div_s32_m_tied1:
> @@ -53,10 +56,27 @@ TEST_UNIFORM_ZX (div_w0_s32_m_untied, svint32_t, int32_t,
> z0 = svdiv_n_s32_m (p0, z1, x0),
> z0 = svdiv_m (p0, z1, x0))
>
> +/*
> +** div_1_s32_m_tied1:
> +** sel z0\.s, p0, z0\.s, z0\.s
> +** ret
> +*/
> +TEST_UNIFORM_Z (div_1_s32_m_tied1, svint32_t,
> + z0 = svdiv_n_s32_m (p0, z0, 1),
> + z0 = svdiv_m (p0, z0, 1))
> +
> +/*
> +** div_1_s32_m_untied:
> +** sel z0\.s, p0, z1\.s, z1\.s
> +** ret
> +*/
> +TEST_UNIFORM_Z (div_1_s32_m_untied, svint32_t,
> + z0 = svdiv_n_s32_m (p0, z1, 1),
> + z0 = svdiv_m (p0, z1, 1))
> +
[ Thanks for adding the tests (which look good to me). If the output
ever improves in future, we can "defend" the improvement by changing
the test. But in the meantime, the above defends something that is
known to work. ]
> [...]
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/div_const_run.c b/gcc/testsuite/gcc.target/aarch64/sve/div_const_run.c
> new file mode 100644
> index 00000000000..1a3c25b817d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/div_const_run.c
> @@ -0,0 +1,91 @@
> +/* { dg-do run { target aarch64_sve128_hw } } */
> +/* { dg-options "-O2 -msve-vector-bits=128" } */
> +
> +#include <arm_sve.h>
> +#include <stdint.h>
> +
> +typedef svbool_t pred __attribute__((arm_sve_vector_bits(128)));
> +typedef svfloat16_t svfloat16_ __attribute__((arm_sve_vector_bits(128)));
> +typedef svfloat32_t svfloat32_ __attribute__((arm_sve_vector_bits(128)));
> +typedef svfloat64_t svfloat64_ __attribute__((arm_sve_vector_bits(128)));
> +typedef svint32_t svint32_ __attribute__((arm_sve_vector_bits(128)));
> +typedef svint64_t svint64_ __attribute__((arm_sve_vector_bits(128)));
> +typedef svuint32_t svuint32_ __attribute__((arm_sve_vector_bits(128)));
> +typedef svuint64_t svuint64_ __attribute__((arm_sve_vector_bits(128)));
> +
> +#define F(T, TS, P, OP1, OP2) \
> +{ \
> + T##_t op1 = (T##_t) OP1; \
> + T##_t op2 = (T##_t) OP2; \
> + sv##T##_ res = svdiv_##P (pg, svdup_##TS (op1), svdup_##TS (op2)); \
> + sv##T##_ exp = svdup_##TS (op1 / op2); \
> + if (svptest_any (pg, svcmpne (pg, exp, res))) \
> + __builtin_abort (); \
> + \
> + sv##T##_ res_n = svdiv_##P (pg, svdup_##TS (op1), op2); \
> + if (svptest_any (pg, svcmpne (pg, exp, res_n))) \
> + __builtin_abort (); \
> +}
> +
> +#define TEST_TYPES_1(T, TS) \
> + F (T, TS, m, 79, 16) \
> + F (T, TS, z, 79, 16) \
> + F (T, TS, x, 79, 16)
> +
> +#define TEST_TYPES \
> + TEST_TYPES_1 (float16, f16) \
> + TEST_TYPES_1 (float32, f32) \
> + TEST_TYPES_1 (float64, f64) \
> + TEST_TYPES_1 (int32, s32) \
> + TEST_TYPES_1 (int64, s64) \
> + TEST_TYPES_1 (uint32, u32) \
> + TEST_TYPES_1 (uint64, u64)
> +
> +#define TEST_VALUES_S_1(B, OP1, OP2) \
> + F (int##B, s##B, x, OP1, OP2)
> +
> +#define TEST_VALUES_S \
> + TEST_VALUES_S_1 (32, INT32_MIN, INT32_MIN) \
> + TEST_VALUES_S_1 (64, INT64_MIN, INT64_MIN) \
> + TEST_VALUES_S_1 (32, -7, 4) \
> + TEST_VALUES_S_1 (64, -7, 4) \
> + TEST_VALUES_S_1 (32, INT32_MAX, (1 << 30)) \
> + TEST_VALUES_S_1 (64, INT64_MAX, (1ULL << 62)) \
> + TEST_VALUES_S_1 (32, INT32_MIN, (1 << 30)) \
> + TEST_VALUES_S_1 (64, INT64_MIN, (1ULL << 62)) \
> + TEST_VALUES_S_1 (32, INT32_MAX, 1) \
> + TEST_VALUES_S_1 (64, INT64_MAX, 1) \
> + TEST_VALUES_S_1 (32, INT32_MIN, 16) \
> + TEST_VALUES_S_1 (64, INT64_MIN, 16) \
> + TEST_VALUES_S_1 (32, INT32_MAX, -5) \
> + TEST_VALUES_S_1 (64, INT64_MAX, -5) \
> + TEST_VALUES_S_1 (32, INT32_MIN, -4) \
> + TEST_VALUES_S_1 (64, INT64_MIN, -4)
> +
> +#define TEST_VALUES_U_1(B, OP1, OP2) \
> + F (uint##B, u##B, x, OP1, OP2)
> +
> +#define TEST_VALUES_U \
> + TEST_VALUES_U_1 (32, UINT32_MAX, UINT32_MAX) \
> + TEST_VALUES_U_1 (64, UINT64_MAX, UINT64_MAX) \
> + TEST_VALUES_U_1 (32, UINT32_MAX, (1 << 31)) \
> + TEST_VALUES_U_1 (64, UINT64_MAX, (1ULL << 63)) \
> + TEST_VALUES_U_1 (32, 7, 4) \
> + TEST_VALUES_U_1 (64, 7, 4) \
> + TEST_VALUES_U_1 (32, 7, 3) \
> + TEST_VALUES_U_1 (64, 7, 3) \
> + TEST_VALUES_U_1 (32, 11, 1) \
> + TEST_VALUES_U_1 (64, 11, 1)
> +
> +#define TEST_VALUES \
> + TEST_VALUES_S \
> + TEST_VALUES_U
> +
> +int
> +main (void)
> +{
> + const pred pg = svptrue_b64 ();
I think this should svptrue_b8 instead. As it stands, the:
if (svptest_any (pg, svcmpne (pg, ...)))
__builtin_abort ();
tests will only check the first element in each 64-bit chunk.
OK with those changes from my POV, but please give others 24 hours
to comment.
Thanks,
Richard
> + TEST_TYPES
> + TEST_VALUES
> + return 0;
> +}
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] SVE intrinsics: Add strength reduction for division by constant.
2024-07-29 20:55 ` Richard Sandiford
@ 2024-07-30 7:47 ` Jennifer Schmitz
2024-07-30 11:22 ` Kyrylo Tkachov
0 siblings, 1 reply; 6+ messages in thread
From: Jennifer Schmitz @ 2024-07-30 7:47 UTC (permalink / raw)
To: Richard Sandiford; +Cc: gcc-patches, Kyrylo Tkachov
[-- Attachment #1.1: Type: text/plain, Size: 125 bytes --]
Dear Richard,
Thanks for the feedback. Great to see this patch approved! I made the changes as suggested.
Best,
Jennifer
[-- Attachment #1.2: 0001-SVE-intrinsics-Add-strength-reduction-for-division-b.patch --]
[-- Type: application/octet-stream, Size: 37908 bytes --]
From 36fa1321a94fc5d2af11b2d34de885825befd3a4 Mon Sep 17 00:00:00 2001
From: Jennifer Schmitz <jschmitz@nvidia.com>
Date: Tue, 16 Jul 2024 01:59:50 -0700
Subject: [PATCH] SVE intrinsics: Add strength reduction for division by
constant.
This patch folds SVE division where all divisor elements are the same
power of 2 to svasrd (signed) or svlsr (unsigned).
Tests were added to check
1) whether the transform is applied (existing test harness was amended), and
2) correctness using runtime tests for all input types of svdiv; for signed
and unsigned integers, several corner cases were covered.
The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?
Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
gcc/
* config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl::fold):
Implement strength reduction.
gcc/testsuite/
* gcc.target/aarch64/sve/div_const_run.c: New test.
* gcc.target/aarch64/sve/acle/asm/div_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/div_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/div_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/div_u64.c: Likewise.
---
.../aarch64/aarch64-sve-builtins-base.cc | 49 +++-
.../gcc.target/aarch64/sve/acle/asm/div_s32.c | 273 +++++++++++++++++-
.../gcc.target/aarch64/sve/acle/asm/div_s64.c | 273 +++++++++++++++++-
.../gcc.target/aarch64/sve/acle/asm/div_u32.c | 201 ++++++++++++-
.../gcc.target/aarch64/sve/acle/asm/div_u64.c | 201 ++++++++++++-
.../gcc.target/aarch64/sve/div_const_run.c | 91 ++++++
6 files changed, 1031 insertions(+), 57 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/div_const_run.c
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index aa26370d397..41a7b4cd861 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -746,6 +746,53 @@ public:
}
};
+class svdiv_impl : public rtx_code_function
+{
+public:
+ CONSTEXPR svdiv_impl ()
+ : rtx_code_function (DIV, UDIV, UNSPEC_COND_FDIV) {}
+
+ gimple *
+ fold (gimple_folder &f) const override
+ {
+ tree divisor = gimple_call_arg (f.call, 2);
+ tree divisor_cst = uniform_integer_cst_p (divisor);
+
+ if (!divisor_cst || !integer_pow2p (divisor_cst))
+ return NULL;
+
+ tree new_divisor;
+ gcall *call;
+
+ if (f.type_suffix (0).unsigned_p && tree_to_uhwi (divisor_cst) != 1)
+ {
+ function_instance instance ("svlsr", functions::svlsr,
+ shapes::binary_uint_opt_n, MODE_n,
+ f.type_suffix_ids, GROUP_none, f.pred);
+ call = f.redirect_call (instance);
+ tree d = INTEGRAL_TYPE_P (TREE_TYPE (divisor)) ? divisor : divisor_cst;
+ new_divisor = wide_int_to_tree (TREE_TYPE (d), tree_log2 (d));
+ }
+ else
+ {
+ if (tree_int_cst_sign_bit (divisor_cst)
+ || tree_to_shwi (divisor_cst) == 1)
+ return NULL;
+
+ function_instance instance ("svasrd", functions::svasrd,
+ shapes::shift_right_imm, MODE_n,
+ f.type_suffix_ids, GROUP_none, f.pred);
+ call = f.redirect_call (instance);
+ new_divisor = wide_int_to_tree (scalar_types[VECTOR_TYPE_svuint64_t],
+ tree_log2 (divisor_cst));
+ }
+
+ gimple_call_set_arg (call, 2, new_divisor);
+ return call;
+ }
+};
+
+
class svdot_impl : public function_base
{
public:
@@ -3043,7 +3090,7 @@ FUNCTION (svcreate3, svcreate_impl, (3))
FUNCTION (svcreate4, svcreate_impl, (4))
FUNCTION (svcvt, svcvt_impl,)
FUNCTION (svcvtnt, CODE_FOR_MODE0 (aarch64_sve_cvtnt),)
-FUNCTION (svdiv, rtx_code_function, (DIV, UDIV, UNSPEC_COND_FDIV))
+FUNCTION (svdiv, svdiv_impl,)
FUNCTION (svdivr, rtx_code_function_rotated, (DIV, UDIV, UNSPEC_COND_FDIV))
FUNCTION (svdot, svdot_impl,)
FUNCTION (svdot_lane, svdotprod_lane_impl, (UNSPEC_SDOT, UNSPEC_UDOT,
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c
index c49ca1aa524..d5a23bf0726 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c
@@ -2,6 +2,8 @@
#include "test_sve_acle.h"
+#define MAXPOW 1<<30
+
/*
** div_s32_m_tied1:
** sdiv z0\.s, p0/m, z0\.s, z1\.s
@@ -53,10 +55,27 @@ TEST_UNIFORM_ZX (div_w0_s32_m_untied, svint32_t, int32_t,
z0 = svdiv_n_s32_m (p0, z1, x0),
z0 = svdiv_m (p0, z1, x0))
+/*
+** div_1_s32_m_tied1:
+** sel z0\.s, p0, z0\.s, z0\.s
+** ret
+*/
+TEST_UNIFORM_Z (div_1_s32_m_tied1, svint32_t,
+ z0 = svdiv_n_s32_m (p0, z0, 1),
+ z0 = svdiv_m (p0, z0, 1))
+
+/*
+** div_1_s32_m_untied:
+** sel z0\.s, p0, z1\.s, z1\.s
+** ret
+*/
+TEST_UNIFORM_Z (div_1_s32_m_untied, svint32_t,
+ z0 = svdiv_n_s32_m (p0, z1, 1),
+ z0 = svdiv_m (p0, z1, 1))
+
/*
** div_2_s32_m_tied1:
-** mov (z[0-9]+\.s), #2
-** sdiv z0\.s, p0/m, z0\.s, \1
+** asrd z0\.s, p0/m, z0\.s, #1
** ret
*/
TEST_UNIFORM_Z (div_2_s32_m_tied1, svint32_t,
@@ -65,15 +84,75 @@ TEST_UNIFORM_Z (div_2_s32_m_tied1, svint32_t,
/*
** div_2_s32_m_untied:
-** mov (z[0-9]+\.s), #2
** movprfx z0, z1
-** sdiv z0\.s, p0/m, z0\.s, \1
+** asrd z0\.s, p0/m, z0\.s, #1
** ret
*/
TEST_UNIFORM_Z (div_2_s32_m_untied, svint32_t,
z0 = svdiv_n_s32_m (p0, z1, 2),
z0 = svdiv_m (p0, z1, 2))
+/*
+** div_3_s32_m_tied1:
+** mov (z[0-9]+\.s), #3
+** sdiv z0\.s, p0/m, z0\.s, \1
+** ret
+*/
+TEST_UNIFORM_Z (div_3_s32_m_tied1, svint32_t,
+ z0 = svdiv_n_s32_m (p0, z0, 3),
+ z0 = svdiv_m (p0, z0, 3))
+
+/*
+** div_3_s32_m_untied:
+** mov (z[0-9]+\.s), #3
+** movprfx z0, z1
+** sdiv z0\.s, p0/m, z0\.s, \1
+** ret
+*/
+TEST_UNIFORM_Z (div_3_s32_m_untied, svint32_t,
+ z0 = svdiv_n_s32_m (p0, z1, 3),
+ z0 = svdiv_m (p0, z1, 3))
+
+/*
+** div_maxpow_s32_m_tied1:
+** asrd z0\.s, p0/m, z0\.s, #30
+** ret
+*/
+TEST_UNIFORM_Z (div_maxpow_s32_m_tied1, svint32_t,
+ z0 = svdiv_n_s32_m (p0, z0, MAXPOW),
+ z0 = svdiv_m (p0, z0, MAXPOW))
+
+/*
+** div_maxpow_s32_m_untied:
+** movprfx z0, z1
+** asrd z0\.s, p0/m, z0\.s, #30
+** ret
+*/
+TEST_UNIFORM_Z (div_maxpow_s32_m_untied, svint32_t,
+ z0 = svdiv_n_s32_m (p0, z1, MAXPOW),
+ z0 = svdiv_m (p0, z1, MAXPOW))
+
+/*
+** div_intmin_s32_m_tied1:
+** mov (z[0-9]+\.s), #-2147483648
+** sdiv z0\.s, p0/m, z0\.s, \1
+** ret
+*/
+TEST_UNIFORM_Z (div_intmin_s32_m_tied1, svint32_t,
+ z0 = svdiv_n_s32_m (p0, z0, INT32_MIN),
+ z0 = svdiv_m (p0, z0, INT32_MIN))
+
+/*
+** div_intmin_s32_m_untied:
+** mov (z[0-9]+\.s), #-2147483648
+** movprfx z0, z1
+** sdiv z0\.s, p0/m, z0\.s, \1
+** ret
+*/
+TEST_UNIFORM_Z (div_intmin_s32_m_untied, svint32_t,
+ z0 = svdiv_n_s32_m (p0, z1, INT32_MIN),
+ z0 = svdiv_m (p0, z1, INT32_MIN))
+
/*
** div_s32_z_tied1:
** movprfx z0\.s, p0/z, z0\.s
@@ -137,19 +216,61 @@ TEST_UNIFORM_ZX (div_w0_s32_z_untied, svint32_t, int32_t,
z0 = svdiv_z (p0, z1, x0))
/*
-** div_2_s32_z_tied1:
-** mov (z[0-9]+\.s), #2
+** div_1_s32_z_tied1:
+** mov (z[0-9]+\.s), #1
** movprfx z0\.s, p0/z, z0\.s
** sdiv z0\.s, p0/m, z0\.s, \1
** ret
*/
+TEST_UNIFORM_Z (div_1_s32_z_tied1, svint32_t,
+ z0 = svdiv_n_s32_z (p0, z0, 1),
+ z0 = svdiv_z (p0, z0, 1))
+
+/*
+** div_1_s32_z_untied:
+** mov z0\.s, #1
+** movprfx z0\.s, p0/z, z0\.s
+** sdivr z0\.s, p0/m, z0\.s, z1\.s
+** ret
+*/
+TEST_UNIFORM_Z (div_1_s32_z_untied, svint32_t,
+ z0 = svdiv_n_s32_z (p0, z1, 1),
+ z0 = svdiv_z (p0, z1, 1))
+
+/*
+** div_2_s32_z_tied1:
+** movprfx z0\.s, p0/z, z0\.s
+** asrd z0\.s, p0/m, z0\.s, #1
+** ret
+*/
TEST_UNIFORM_Z (div_2_s32_z_tied1, svint32_t,
z0 = svdiv_n_s32_z (p0, z0, 2),
z0 = svdiv_z (p0, z0, 2))
/*
** div_2_s32_z_untied:
-** mov (z[0-9]+\.s), #2
+** movprfx z0\.s, p0/z, z1\.s
+** asrd z0\.s, p0/m, z0\.s, #1
+** ret
+*/
+TEST_UNIFORM_Z (div_2_s32_z_untied, svint32_t,
+ z0 = svdiv_n_s32_z (p0, z1, 2),
+ z0 = svdiv_z (p0, z1, 2))
+
+/*
+** div_3_s32_z_tied1:
+** mov (z[0-9]+\.s), #3
+** movprfx z0\.s, p0/z, z0\.s
+** sdiv z0\.s, p0/m, z0\.s, \1
+** ret
+*/
+TEST_UNIFORM_Z (div_3_s32_z_tied1, svint32_t,
+ z0 = svdiv_n_s32_z (p0, z0, 3),
+ z0 = svdiv_z (p0, z0, 3))
+
+/*
+** div_3_s32_z_untied:
+** mov (z[0-9]+\.s), #3
** (
** movprfx z0\.s, p0/z, z1\.s
** sdiv z0\.s, p0/m, z0\.s, \1
@@ -159,9 +280,56 @@ TEST_UNIFORM_Z (div_2_s32_z_tied1, svint32_t,
** )
** ret
*/
-TEST_UNIFORM_Z (div_2_s32_z_untied, svint32_t,
- z0 = svdiv_n_s32_z (p0, z1, 2),
- z0 = svdiv_z (p0, z1, 2))
+TEST_UNIFORM_Z (div_3_s32_z_untied, svint32_t,
+ z0 = svdiv_n_s32_z (p0, z1, 3),
+ z0 = svdiv_z (p0, z1, 3))
+
+/*
+** div_maxpow_s32_z_tied1:
+** movprfx z0\.s, p0/z, z0\.s
+** asrd z0\.s, p0/m, z0\.s, #30
+** ret
+*/
+TEST_UNIFORM_Z (div_maxpow_s32_z_tied1, svint32_t,
+ z0 = svdiv_n_s32_z (p0, z0, MAXPOW),
+ z0 = svdiv_z (p0, z0, MAXPOW))
+
+/*
+** div_maxpow_s32_z_untied:
+** movprfx z0\.s, p0/z, z1\.s
+** asrd z0\.s, p0/m, z0\.s, #30
+** ret
+*/
+TEST_UNIFORM_Z (div_maxpow_s32_z_untied, svint32_t,
+ z0 = svdiv_n_s32_z (p0, z1, MAXPOW),
+ z0 = svdiv_z (p0, z1, MAXPOW))
+
+/*
+** div_intmin_s32_z_tied1:
+** mov (z[0-9]+\.s), #-2147483648
+** movprfx z0\.s, p0/z, z0\.s
+** sdiv z0\.s, p0/m, z0\.s, \1
+** ret
+*/
+TEST_UNIFORM_Z (div_intmin_s32_z_tied1, svint32_t,
+ z0 = svdiv_n_s32_z (p0, z0, INT32_MIN),
+ z0 = svdiv_z (p0, z0, INT32_MIN))
+
+/*
+** div_intmin_s32_z_untied:
+** mov (z[0-9]+\.s), #-2147483648
+** (
+** movprfx z0\.s, p0/z, z1\.s
+** sdiv z0\.s, p0/m, z0\.s, \1
+** |
+** movprfx z0\.s, p0/z, \1
+** sdivr z0\.s, p0/m, z0\.s, z1\.s
+** )
+** ret
+*/
+TEST_UNIFORM_Z (div_intmin_s32_z_untied, svint32_t,
+ z0 = svdiv_n_s32_z (p0, z1, INT32_MIN),
+ z0 = svdiv_z (p0, z1, INT32_MIN))
/*
** div_s32_x_tied1:
@@ -216,10 +384,26 @@ TEST_UNIFORM_ZX (div_w0_s32_x_untied, svint32_t, int32_t,
z0 = svdiv_n_s32_x (p0, z1, x0),
z0 = svdiv_x (p0, z1, x0))
+/*
+** div_1_s32_x_tied1:
+** ret
+*/
+TEST_UNIFORM_Z (div_1_s32_x_tied1, svint32_t,
+ z0 = svdiv_n_s32_x (p0, z0, 1),
+ z0 = svdiv_x (p0, z0, 1))
+
+/*
+** div_1_s32_x_untied:
+** mov z0\.d, z1\.d
+** ret
+*/
+TEST_UNIFORM_Z (div_1_s32_x_untied, svint32_t,
+ z0 = svdiv_n_s32_x (p0, z1, 1),
+ z0 = svdiv_x (p0, z1, 1))
+
/*
** div_2_s32_x_tied1:
-** mov (z[0-9]+\.s), #2
-** sdiv z0\.s, p0/m, z0\.s, \1
+** asrd z0\.s, p0/m, z0\.s, #1
** ret
*/
TEST_UNIFORM_Z (div_2_s32_x_tied1, svint32_t,
@@ -228,10 +412,71 @@ TEST_UNIFORM_Z (div_2_s32_x_tied1, svint32_t,
/*
** div_2_s32_x_untied:
-** mov z0\.s, #2
-** sdivr z0\.s, p0/m, z0\.s, z1\.s
+** movprfx z0, z1
+** asrd z0\.s, p0/m, z0\.s, #1
** ret
*/
TEST_UNIFORM_Z (div_2_s32_x_untied, svint32_t,
z0 = svdiv_n_s32_x (p0, z1, 2),
z0 = svdiv_x (p0, z1, 2))
+
+/*
+** div_3_s32_x_tied1:
+** mov (z[0-9]+\.s), #3
+** sdiv z0\.s, p0/m, z0\.s, \1
+** ret
+*/
+TEST_UNIFORM_Z (div_3_s32_x_tied1, svint32_t,
+ z0 = svdiv_n_s32_x (p0, z0, 3),
+ z0 = svdiv_x (p0, z0, 3))
+
+/*
+** div_3_s32_x_untied:
+** mov z0\.s, #3
+** sdivr z0\.s, p0/m, z0\.s, z1\.s
+** ret
+*/
+TEST_UNIFORM_Z (div_3_s32_x_untied, svint32_t,
+ z0 = svdiv_n_s32_x (p0, z1, 3),
+ z0 = svdiv_x (p0, z1, 3))
+
+/*
+** div_maxpow_s32_x_tied1:
+** asrd z0\.s, p0/m, z0\.s, #30
+** ret
+*/
+TEST_UNIFORM_Z (div_maxpow_s32_x_tied1, svint32_t,
+ z0 = svdiv_n_s32_x (p0, z0, MAXPOW),
+ z0 = svdiv_x (p0, z0, MAXPOW))
+
+/*
+** div_maxpow_s32_x_untied:
+** movprfx z0, z1
+** asrd z0\.s, p0/m, z0\.s, #30
+** ret
+*/
+TEST_UNIFORM_Z (div_maxpow_s32_x_untied, svint32_t,
+ z0 = svdiv_n_s32_x (p0, z1, MAXPOW),
+ z0 = svdiv_x (p0, z1, MAXPOW))
+
+/*
+** div_intmin_s32_x_tied1:
+** mov (z[0-9]+\.s), #-2147483648
+** sdiv z0\.s, p0/m, z0\.s, \1
+** ret
+*/
+TEST_UNIFORM_Z (div_intmin_s32_x_tied1, svint32_t,
+ z0 = svdiv_n_s32_x (p0, z0, INT32_MIN),
+ z0 = svdiv_x (p0, z0, INT32_MIN))
+
+/*
+** div_intmin_s32_x_untied:
+** mov z0\.s, #-2147483648
+** sdivr z0\.s, p0/m, z0\.s, z1\.s
+** ret
+*/
+TEST_UNIFORM_Z (div_intmin_s32_x_untied, svint32_t,
+ z0 = svdiv_n_s32_x (p0, z1, INT32_MIN),
+ z0 = svdiv_x (p0, z1, INT32_MIN))
+
+
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s64.c
index 464dca28d74..cfed6f9c1b3 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s64.c
@@ -2,6 +2,8 @@
#include "test_sve_acle.h"
+#define MAXPOW 1ULL<<62
+
/*
** div_s64_m_tied1:
** sdiv z0\.d, p0/m, z0\.d, z1\.d
@@ -53,10 +55,27 @@ TEST_UNIFORM_ZX (div_x0_s64_m_untied, svint64_t, int64_t,
z0 = svdiv_n_s64_m (p0, z1, x0),
z0 = svdiv_m (p0, z1, x0))
+/*
+** div_1_s64_m_tied1:
+** sel z0\.d, p0, z0\.d, z0\.d
+** ret
+*/
+TEST_UNIFORM_Z (div_1_s64_m_tied1, svint64_t,
+ z0 = svdiv_n_s64_m (p0, z0, 1),
+ z0 = svdiv_m (p0, z0, 1))
+
+/*
+** div_1_s64_m_untied:
+** sel z0\.d, p0, z1\.d, z1\.d
+** ret
+*/
+TEST_UNIFORM_Z (div_1_s64_m_untied, svint64_t,
+ z0 = svdiv_n_s64_m (p0, z1, 1),
+ z0 = svdiv_m (p0, z1, 1))
+
/*
** div_2_s64_m_tied1:
-** mov (z[0-9]+\.d), #2
-** sdiv z0\.d, p0/m, z0\.d, \1
+** asrd z0\.d, p0/m, z0\.d, #1
** ret
*/
TEST_UNIFORM_Z (div_2_s64_m_tied1, svint64_t,
@@ -65,15 +84,75 @@ TEST_UNIFORM_Z (div_2_s64_m_tied1, svint64_t,
/*
** div_2_s64_m_untied:
-** mov (z[0-9]+\.d), #2
** movprfx z0, z1
-** sdiv z0\.d, p0/m, z0\.d, \1
+** asrd z0\.d, p0/m, z0\.d, #1
** ret
*/
TEST_UNIFORM_Z (div_2_s64_m_untied, svint64_t,
z0 = svdiv_n_s64_m (p0, z1, 2),
z0 = svdiv_m (p0, z1, 2))
+/*
+** div_3_s64_m_tied1:
+** mov (z[0-9]+\.d), #3
+** sdiv z0\.d, p0/m, z0\.d, \1
+** ret
+*/
+TEST_UNIFORM_Z (div_3_s64_m_tied1, svint64_t,
+ z0 = svdiv_n_s64_m (p0, z0, 3),
+ z0 = svdiv_m (p0, z0, 3))
+
+/*
+** div_3_s64_m_untied:
+** mov (z[0-9]+\.d), #3
+** movprfx z0, z1
+** sdiv z0\.d, p0/m, z0\.d, \1
+** ret
+*/
+TEST_UNIFORM_Z (div_3_s64_m_untied, svint64_t,
+ z0 = svdiv_n_s64_m (p0, z1, 3),
+ z0 = svdiv_m (p0, z1, 3))
+
+/*
+** div_maxpow_s64_m_tied1:
+** asrd z0\.d, p0/m, z0\.d, #62
+** ret
+*/
+TEST_UNIFORM_Z (div_maxpow_s64_m_tied1, svint64_t,
+ z0 = svdiv_n_s64_m (p0, z0, MAXPOW),
+ z0 = svdiv_m (p0, z0, MAXPOW))
+
+/*
+** div_maxpow_s64_m_untied:
+** movprfx z0, z1
+** asrd z0\.d, p0/m, z0\.d, #62
+** ret
+*/
+TEST_UNIFORM_Z (div_maxpow_s64_m_untied, svint64_t,
+ z0 = svdiv_n_s64_m (p0, z1, MAXPOW),
+ z0 = svdiv_m (p0, z1, MAXPOW))
+
+/*
+** div_intmin_s64_m_tied1:
+** mov (z[0-9]+\.d), #-9223372036854775808
+** sdiv z0\.d, p0/m, z0\.d, \1
+** ret
+*/
+TEST_UNIFORM_Z (div_intmin_s64_m_tied1, svint64_t,
+ z0 = svdiv_n_s64_m (p0, z0, INT64_MIN),
+ z0 = svdiv_m (p0, z0, INT64_MIN))
+
+/*
+** div_intmin_s64_m_untied:
+** mov (z[0-9]+\.d), #-9223372036854775808
+** movprfx z0, z1
+** sdiv z0\.d, p0/m, z0\.d, \1
+** ret
+*/
+TEST_UNIFORM_Z (div_intmin_s64_m_untied, svint64_t,
+ z0 = svdiv_n_s64_m (p0, z1, INT64_MIN),
+ z0 = svdiv_m (p0, z1, INT64_MIN))
+
/*
** div_s64_z_tied1:
** movprfx z0\.d, p0/z, z0\.d
@@ -137,19 +216,61 @@ TEST_UNIFORM_ZX (div_x0_s64_z_untied, svint64_t, int64_t,
z0 = svdiv_z (p0, z1, x0))
/*
-** div_2_s64_z_tied1:
-** mov (z[0-9]+\.d), #2
+** div_1_s64_z_tied1:
+** mov (z[0-9]+\.d), #1
** movprfx z0\.d, p0/z, z0\.d
** sdiv z0\.d, p0/m, z0\.d, \1
** ret
*/
+TEST_UNIFORM_Z (div_1_s64_z_tied1, svint64_t,
+ z0 = svdiv_n_s64_z (p0, z0, 1),
+ z0 = svdiv_z (p0, z0, 1))
+
+/*
+** div_1_s64_z_untied:
+** mov z0\.d, #1
+** movprfx z0\.d, p0/z, z0\.d
+** sdivr z0\.d, p0/m, z0\.d, z1\.d
+** ret
+*/
+TEST_UNIFORM_Z (div_1_s64_z_untied, svint64_t,
+ z0 = svdiv_n_s64_z (p0, z1, 1),
+ z0 = svdiv_z (p0, z1, 1))
+
+/*
+** div_2_s64_z_tied1:
+** movprfx z0\.d, p0/z, z0\.d
+** asrd z0\.d, p0/m, z0\.d, #1
+** ret
+*/
TEST_UNIFORM_Z (div_2_s64_z_tied1, svint64_t,
z0 = svdiv_n_s64_z (p0, z0, 2),
z0 = svdiv_z (p0, z0, 2))
/*
** div_2_s64_z_untied:
-** mov (z[0-9]+\.d), #2
+** movprfx z0\.d, p0/z, z1\.d
+** asrd z0\.d, p0/m, z0\.d, #1
+** ret
+*/
+TEST_UNIFORM_Z (div_2_s64_z_untied, svint64_t,
+ z0 = svdiv_n_s64_z (p0, z1, 2),
+ z0 = svdiv_z (p0, z1, 2))
+
+/*
+** div_3_s64_z_tied1:
+** mov (z[0-9]+\.d), #3
+** movprfx z0\.d, p0/z, z0\.d
+** sdiv z0\.d, p0/m, z0\.d, \1
+** ret
+*/
+TEST_UNIFORM_Z (div_3_s64_z_tied1, svint64_t,
+ z0 = svdiv_n_s64_z (p0, z0, 3),
+ z0 = svdiv_z (p0, z0, 3))
+
+/*
+** div_3_s64_z_untied:
+** mov (z[0-9]+\.d), #3
** (
** movprfx z0\.d, p0/z, z1\.d
** sdiv z0\.d, p0/m, z0\.d, \1
@@ -159,9 +280,56 @@ TEST_UNIFORM_Z (div_2_s64_z_tied1, svint64_t,
** )
** ret
*/
-TEST_UNIFORM_Z (div_2_s64_z_untied, svint64_t,
- z0 = svdiv_n_s64_z (p0, z1, 2),
- z0 = svdiv_z (p0, z1, 2))
+TEST_UNIFORM_Z (div_3_s64_z_untied, svint64_t,
+ z0 = svdiv_n_s64_z (p0, z1, 3),
+ z0 = svdiv_z (p0, z1, 3))
+
+/*
+** div_maxpow_s64_z_tied1:
+** movprfx z0\.d, p0/z, z0\.d
+** asrd z0\.d, p0/m, z0\.d, #62
+** ret
+*/
+TEST_UNIFORM_Z (div_maxpow_s64_z_tied1, svint64_t,
+ z0 = svdiv_n_s64_z (p0, z0, MAXPOW),
+ z0 = svdiv_z (p0, z0, MAXPOW))
+
+/*
+** div_maxpow_s64_z_untied:
+** movprfx z0\.d, p0/z, z1\.d
+** asrd z0\.d, p0/m, z0\.d, #62
+** ret
+*/
+TEST_UNIFORM_Z (div_maxpow_s64_z_untied, svint64_t,
+ z0 = svdiv_n_s64_z (p0, z1, MAXPOW),
+ z0 = svdiv_z (p0, z1, MAXPOW))
+
+/*
+** div_intmin_s64_z_tied1:
+** mov (z[0-9]+\.d), #-9223372036854775808
+** movprfx z0\.d, p0/z, z0\.d
+** sdiv z0\.d, p0/m, z0\.d, \1
+** ret
+*/
+TEST_UNIFORM_Z (div_intmin_s64_z_tied1, svint64_t,
+ z0 = svdiv_n_s64_z (p0, z0, INT64_MIN),
+ z0 = svdiv_z (p0, z0, INT64_MIN))
+
+/*
+** div_intmin_s64_z_untied:
+** mov (z[0-9]+\.d), #-9223372036854775808
+** (
+** movprfx z0\.d, p0/z, z1\.d
+** sdiv z0\.d, p0/m, z0\.d, \1
+** |
+** movprfx z0\.d, p0/z, \1
+** sdivr z0\.d, p0/m, z0\.d, z1\.d
+** )
+** ret
+*/
+TEST_UNIFORM_Z (div_intmin_s64_z_untied, svint64_t,
+ z0 = svdiv_n_s64_z (p0, z1, INT64_MIN),
+ z0 = svdiv_z (p0, z1, INT64_MIN))
/*
** div_s64_x_tied1:
@@ -216,10 +384,26 @@ TEST_UNIFORM_ZX (div_x0_s64_x_untied, svint64_t, int64_t,
z0 = svdiv_n_s64_x (p0, z1, x0),
z0 = svdiv_x (p0, z1, x0))
+/*
+** div_1_s64_x_tied1:
+** ret
+*/
+TEST_UNIFORM_Z (div_1_s64_x_tied1, svint64_t,
+ z0 = svdiv_n_s64_x (p0, z0, 1),
+ z0 = svdiv_x (p0, z0, 1))
+
+/*
+** div_1_s64_x_untied:
+** mov z0\.d, z1\.d
+** ret
+*/
+TEST_UNIFORM_Z (div_1_s64_x_untied, svint64_t,
+ z0 = svdiv_n_s64_x (p0, z1, 1),
+ z0 = svdiv_x (p0, z1, 1))
+
/*
** div_2_s64_x_tied1:
-** mov (z[0-9]+\.d), #2
-** sdiv z0\.d, p0/m, z0\.d, \1
+** asrd z0\.d, p0/m, z0\.d, #1
** ret
*/
TEST_UNIFORM_Z (div_2_s64_x_tied1, svint64_t,
@@ -228,10 +412,71 @@ TEST_UNIFORM_Z (div_2_s64_x_tied1, svint64_t,
/*
** div_2_s64_x_untied:
-** mov z0\.d, #2
-** sdivr z0\.d, p0/m, z0\.d, z1\.d
+** movprfx z0, z1
+** asrd z0\.d, p0/m, z0\.d, #1
** ret
*/
TEST_UNIFORM_Z (div_2_s64_x_untied, svint64_t,
z0 = svdiv_n_s64_x (p0, z1, 2),
z0 = svdiv_x (p0, z1, 2))
+
+/*
+** div_3_s64_x_tied1:
+** mov (z[0-9]+\.d), #3
+** sdiv z0\.d, p0/m, z0\.d, \1
+** ret
+*/
+TEST_UNIFORM_Z (div_3_s64_x_tied1, svint64_t,
+ z0 = svdiv_n_s64_x (p0, z0, 3),
+ z0 = svdiv_x (p0, z0, 3))
+
+/*
+** div_3_s64_x_untied:
+** mov z0\.d, #3
+** sdivr z0\.d, p0/m, z0\.d, z1\.d
+** ret
+*/
+TEST_UNIFORM_Z (div_3_s64_x_untied, svint64_t,
+ z0 = svdiv_n_s64_x (p0, z1, 3),
+ z0 = svdiv_x (p0, z1, 3))
+
+/*
+** div_maxpow_s64_x_tied1:
+** asrd z0\.d, p0/m, z0\.d, #62
+** ret
+*/
+TEST_UNIFORM_Z (div_maxpow_s64_x_tied1, svint64_t,
+ z0 = svdiv_n_s64_x (p0, z0, MAXPOW),
+ z0 = svdiv_x (p0, z0, MAXPOW))
+
+/*
+** div_maxpow_s64_x_untied:
+** movprfx z0, z1
+** asrd z0\.d, p0/m, z0\.d, #62
+** ret
+*/
+TEST_UNIFORM_Z (div_maxpow_s64_x_untied, svint64_t,
+ z0 = svdiv_n_s64_x (p0, z1, MAXPOW),
+ z0 = svdiv_x (p0, z1, MAXPOW))
+
+/*
+** div_intmin_s64_x_tied1:
+** mov (z[0-9]+\.d), #-9223372036854775808
+** sdiv z0\.d, p0/m, z0\.d, \1
+** ret
+*/
+TEST_UNIFORM_Z (div_intmin_s64_x_tied1, svint64_t,
+ z0 = svdiv_n_s64_x (p0, z0, INT64_MIN),
+ z0 = svdiv_x (p0, z0, INT64_MIN))
+
+/*
+** div_intmin_s64_x_untied:
+** mov z0\.d, #-9223372036854775808
+** sdivr z0\.d, p0/m, z0\.d, z1\.d
+** ret
+*/
+TEST_UNIFORM_Z (div_intmin_s64_x_untied, svint64_t,
+ z0 = svdiv_n_s64_x (p0, z1, INT64_MIN),
+ z0 = svdiv_x (p0, z1, INT64_MIN))
+
+
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u32.c
index 232ccacf524..9707664caf4 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u32.c
@@ -2,6 +2,8 @@
#include "test_sve_acle.h"
+#define MAXPOW 1<<31
+
/*
** div_u32_m_tied1:
** udiv z0\.s, p0/m, z0\.s, z1\.s
@@ -53,10 +55,27 @@ TEST_UNIFORM_ZX (div_w0_u32_m_untied, svuint32_t, uint32_t,
z0 = svdiv_n_u32_m (p0, z1, x0),
z0 = svdiv_m (p0, z1, x0))
+/*
+** div_1_u32_m_tied1:
+** sel z0\.s, p0, z0\.s, z0\.s
+** ret
+*/
+TEST_UNIFORM_Z (div_1_u32_m_tied1, svuint32_t,
+ z0 = svdiv_n_u32_m (p0, z0, 1),
+ z0 = svdiv_m (p0, z0, 1))
+
+/*
+** div_1_u32_m_untied:
+** sel z0\.s, p0, z1\.s, z1\.s
+** ret
+*/
+TEST_UNIFORM_Z (div_1_u32_m_untied, svuint32_t,
+ z0 = svdiv_n_u32_m (p0, z1, 1),
+ z0 = svdiv_m (p0, z1, 1))
+
/*
** div_2_u32_m_tied1:
-** mov (z[0-9]+\.s), #2
-** udiv z0\.s, p0/m, z0\.s, \1
+** lsr z0\.s, p0/m, z0\.s, #1
** ret
*/
TEST_UNIFORM_Z (div_2_u32_m_tied1, svuint32_t,
@@ -65,15 +84,54 @@ TEST_UNIFORM_Z (div_2_u32_m_tied1, svuint32_t,
/*
** div_2_u32_m_untied:
-** mov (z[0-9]+\.s), #2
** movprfx z0, z1
-** udiv z0\.s, p0/m, z0\.s, \1
+** lsr z0\.s, p0/m, z0\.s, #1
** ret
*/
TEST_UNIFORM_Z (div_2_u32_m_untied, svuint32_t,
z0 = svdiv_n_u32_m (p0, z1, 2),
z0 = svdiv_m (p0, z1, 2))
+/*
+** div_3_u32_m_tied1:
+** mov (z[0-9]+\.s), #3
+** udiv z0\.s, p0/m, z0\.s, \1
+** ret
+*/
+TEST_UNIFORM_Z (div_3_u32_m_tied1, svuint32_t,
+ z0 = svdiv_n_u32_m (p0, z0, 3),
+ z0 = svdiv_m (p0, z0, 3))
+
+/*
+** div_3_u32_m_untied:
+** mov (z[0-9]+\.s), #3
+** movprfx z0, z1
+** udiv z0\.s, p0/m, z0\.s, \1
+** ret
+*/
+TEST_UNIFORM_Z (div_3_u32_m_untied, svuint32_t,
+ z0 = svdiv_n_u32_m (p0, z1, 3),
+ z0 = svdiv_m (p0, z1, 3))
+
+/*
+** div_maxpow_u32_m_tied1:
+** lsr z0\.s, p0/m, z0\.s, #31
+** ret
+*/
+TEST_UNIFORM_Z (div_maxpow_u32_m_tied1, svuint32_t,
+ z0 = svdiv_n_u32_m (p0, z0, MAXPOW),
+ z0 = svdiv_m (p0, z0, MAXPOW))
+
+/*
+** div_maxpow_u32_m_untied:
+** movprfx z0, z1
+** lsr z0\.s, p0/m, z0\.s, #31
+** ret
+*/
+TEST_UNIFORM_Z (div_maxpow_u32_m_untied, svuint32_t,
+ z0 = svdiv_n_u32_m (p0, z1, MAXPOW),
+ z0 = svdiv_m (p0, z1, MAXPOW))
+
/*
** div_u32_z_tied1:
** movprfx z0\.s, p0/z, z0\.s
@@ -137,19 +195,61 @@ TEST_UNIFORM_ZX (div_w0_u32_z_untied, svuint32_t, uint32_t,
z0 = svdiv_z (p0, z1, x0))
/*
-** div_2_u32_z_tied1:
-** mov (z[0-9]+\.s), #2
+** div_1_u32_z_tied1:
+** mov (z[0-9]+\.s), #1
** movprfx z0\.s, p0/z, z0\.s
** udiv z0\.s, p0/m, z0\.s, \1
** ret
*/
+TEST_UNIFORM_Z (div_1_u32_z_tied1, svuint32_t,
+ z0 = svdiv_n_u32_z (p0, z0, 1),
+ z0 = svdiv_z (p0, z0, 1))
+
+/*
+** div_1_u32_z_untied:
+** mov z0\.s, #1
+** movprfx z0\.s, p0/z, z0\.s
+** udivr z0\.s, p0/m, z0\.s, z1\.s
+** ret
+*/
+TEST_UNIFORM_Z (div_1_u32_z_untied, svuint32_t,
+ z0 = svdiv_n_u32_z (p0, z1, 1),
+ z0 = svdiv_z (p0, z1, 1))
+
+/*
+** div_2_u32_z_tied1:
+** movprfx z0\.s, p0/z, z0\.s
+** lsr z0\.s, p0/m, z0\.s, #1
+** ret
+*/
TEST_UNIFORM_Z (div_2_u32_z_tied1, svuint32_t,
z0 = svdiv_n_u32_z (p0, z0, 2),
z0 = svdiv_z (p0, z0, 2))
/*
** div_2_u32_z_untied:
-** mov (z[0-9]+\.s), #2
+** movprfx z0\.s, p0/z, z1\.s
+** lsr z0\.s, p0/m, z0\.s, #1
+** ret
+*/
+TEST_UNIFORM_Z (div_2_u32_z_untied, svuint32_t,
+ z0 = svdiv_n_u32_z (p0, z1, 2),
+ z0 = svdiv_z (p0, z1, 2))
+
+/*
+** div_3_u32_z_tied1:
+** mov (z[0-9]+\.s), #3
+** movprfx z0\.s, p0/z, z0\.s
+** udiv z0\.s, p0/m, z0\.s, \1
+** ret
+*/
+TEST_UNIFORM_Z (div_3_u32_z_tied1, svuint32_t,
+ z0 = svdiv_n_u32_z (p0, z0, 3),
+ z0 = svdiv_z (p0, z0, 3))
+
+/*
+** div_3_u32_z_untied:
+** mov (z[0-9]+\.s), #3
** (
** movprfx z0\.s, p0/z, z1\.s
** udiv z0\.s, p0/m, z0\.s, \1
@@ -159,9 +259,29 @@ TEST_UNIFORM_Z (div_2_u32_z_tied1, svuint32_t,
** )
** ret
*/
-TEST_UNIFORM_Z (div_2_u32_z_untied, svuint32_t,
- z0 = svdiv_n_u32_z (p0, z1, 2),
- z0 = svdiv_z (p0, z1, 2))
+TEST_UNIFORM_Z (div_3_u32_z_untied, svuint32_t,
+ z0 = svdiv_n_u32_z (p0, z1, 3),
+ z0 = svdiv_z (p0, z1, 3))
+
+/*
+** div_maxpow_u32_z_tied1:
+** movprfx z0\.s, p0/z, z0\.s
+** lsr z0\.s, p0/m, z0\.s, #31
+** ret
+*/
+TEST_UNIFORM_Z (div_maxpow_u32_z_tied1, svuint32_t,
+ z0 = svdiv_n_u32_z (p0, z0, MAXPOW),
+ z0 = svdiv_z (p0, z0, MAXPOW))
+
+/*
+** div_maxpow_u32_z_untied:
+** movprfx z0\.s, p0/z, z1\.s
+** lsr z0\.s, p0/m, z0\.s, #31
+** ret
+*/
+TEST_UNIFORM_Z (div_maxpow_u32_z_untied, svuint32_t,
+ z0 = svdiv_n_u32_z (p0, z1, MAXPOW),
+ z0 = svdiv_z (p0, z1, MAXPOW))
/*
** div_u32_x_tied1:
@@ -216,10 +336,26 @@ TEST_UNIFORM_ZX (div_w0_u32_x_untied, svuint32_t, uint32_t,
z0 = svdiv_n_u32_x (p0, z1, x0),
z0 = svdiv_x (p0, z1, x0))
+/*
+** div_1_u32_x_tied1:
+** ret
+*/
+TEST_UNIFORM_Z (div_1_u32_x_tied1, svuint32_t,
+ z0 = svdiv_n_u32_x (p0, z0, 1),
+ z0 = svdiv_x (p0, z0, 1))
+
+/*
+** div_1_u32_x_untied:
+** mov z0\.d, z1\.d
+** ret
+*/
+TEST_UNIFORM_Z (div_1_u32_x_untied, svuint32_t,
+ z0 = svdiv_n_u32_x (p0, z1, 1),
+ z0 = svdiv_x (p0, z1, 1))
+
/*
** div_2_u32_x_tied1:
-** mov (z[0-9]+\.s), #2
-** udiv z0\.s, p0/m, z0\.s, \1
+** lsr z0\.s, z0\.s, #1
** ret
*/
TEST_UNIFORM_Z (div_2_u32_x_tied1, svuint32_t,
@@ -228,10 +364,47 @@ TEST_UNIFORM_Z (div_2_u32_x_tied1, svuint32_t,
/*
** div_2_u32_x_untied:
-** mov z0\.s, #2
-** udivr z0\.s, p0/m, z0\.s, z1\.s
+** lsr z0\.s, z1\.s, #1
** ret
*/
TEST_UNIFORM_Z (div_2_u32_x_untied, svuint32_t,
z0 = svdiv_n_u32_x (p0, z1, 2),
z0 = svdiv_x (p0, z1, 2))
+
+/*
+** div_3_u32_x_tied1:
+** mov (z[0-9]+\.s), #3
+** udiv z0\.s, p0/m, z0\.s, \1
+** ret
+*/
+TEST_UNIFORM_Z (div_3_u32_x_tied1, svuint32_t,
+ z0 = svdiv_n_u32_x (p0, z0, 3),
+ z0 = svdiv_x (p0, z0, 3))
+
+/*
+** div_3_u32_x_untied:
+** mov z0\.s, #3
+** udivr z0\.s, p0/m, z0\.s, z1\.s
+** ret
+*/
+TEST_UNIFORM_Z (div_3_u32_x_untied, svuint32_t,
+ z0 = svdiv_n_u32_x (p0, z1, 3),
+ z0 = svdiv_x (p0, z1, 3))
+
+/*
+** div_maxpow_u32_x_tied1:
+** lsr z0\.s, z0\.s, #31
+** ret
+*/
+TEST_UNIFORM_Z (div_maxpow_u32_x_tied1, svuint32_t,
+ z0 = svdiv_n_u32_x (p0, z0, MAXPOW),
+ z0 = svdiv_x (p0, z0, MAXPOW))
+
+/*
+** div_maxpow_u32_x_untied:
+** lsr z0\.s, z1\.s, #31
+** ret
+*/
+TEST_UNIFORM_Z (div_maxpow_u32_x_untied, svuint32_t,
+ z0 = svdiv_n_u32_x (p0, z1, MAXPOW),
+ z0 = svdiv_x (p0, z1, MAXPOW))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u64.c
index ac7c026eea3..5247ebdac7a 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u64.c
@@ -2,6 +2,8 @@
#include "test_sve_acle.h"
+#define MAXPOW 1ULL<<63
+
/*
** div_u64_m_tied1:
** udiv z0\.d, p0/m, z0\.d, z1\.d
@@ -53,10 +55,27 @@ TEST_UNIFORM_ZX (div_x0_u64_m_untied, svuint64_t, uint64_t,
z0 = svdiv_n_u64_m (p0, z1, x0),
z0 = svdiv_m (p0, z1, x0))
+/*
+** div_1_u64_m_tied1:
+** sel z0\.d, p0, z0\.d, z0\.d
+** ret
+*/
+TEST_UNIFORM_Z (div_1_u64_m_tied1, svuint64_t,
+ z0 = svdiv_n_u64_m (p0, z0, 1),
+ z0 = svdiv_m (p0, z0, 1))
+
+/*
+** div_1_u64_m_untied:
+** sel z0\.d, p0, z1\.d, z1\.d
+** ret
+*/
+TEST_UNIFORM_Z (div_1_u64_m_untied, svuint64_t,
+ z0 = svdiv_n_u64_m (p0, z1, 1),
+ z0 = svdiv_m (p0, z1, 1))
+
/*
** div_2_u64_m_tied1:
-** mov (z[0-9]+\.d), #2
-** udiv z0\.d, p0/m, z0\.d, \1
+** lsr z0\.d, p0/m, z0\.d, #1
** ret
*/
TEST_UNIFORM_Z (div_2_u64_m_tied1, svuint64_t,
@@ -65,15 +84,54 @@ TEST_UNIFORM_Z (div_2_u64_m_tied1, svuint64_t,
/*
** div_2_u64_m_untied:
-** mov (z[0-9]+\.d), #2
** movprfx z0, z1
-** udiv z0\.d, p0/m, z0\.d, \1
+** lsr z0\.d, p0/m, z0\.d, #1
** ret
*/
TEST_UNIFORM_Z (div_2_u64_m_untied, svuint64_t,
z0 = svdiv_n_u64_m (p0, z1, 2),
z0 = svdiv_m (p0, z1, 2))
+/*
+** div_3_u64_m_tied1:
+** mov (z[0-9]+\.d), #3
+** udiv z0\.d, p0/m, z0\.d, \1
+** ret
+*/
+TEST_UNIFORM_Z (div_3_u64_m_tied1, svuint64_t,
+ z0 = svdiv_n_u64_m (p0, z0, 3),
+ z0 = svdiv_m (p0, z0, 3))
+
+/*
+** div_3_u64_m_untied:
+** mov (z[0-9]+\.d), #3
+** movprfx z0, z1
+** udiv z0\.d, p0/m, z0\.d, \1
+** ret
+*/
+TEST_UNIFORM_Z (div_3_u64_m_untied, svuint64_t,
+ z0 = svdiv_n_u64_m (p0, z1, 3),
+ z0 = svdiv_m (p0, z1, 3))
+
+/*
+** div_maxpow_u64_m_tied1:
+** lsr z0\.d, p0/m, z0\.d, #63
+** ret
+*/
+TEST_UNIFORM_Z (div_maxpow_u64_m_tied1, svuint64_t,
+ z0 = svdiv_n_u64_m (p0, z0, MAXPOW),
+ z0 = svdiv_m (p0, z0, MAXPOW))
+
+/*
+** div_maxpow_u64_m_untied:
+** movprfx z0, z1
+** lsr z0\.d, p0/m, z0\.d, #63
+** ret
+*/
+TEST_UNIFORM_Z (div_maxpow_u64_m_untied, svuint64_t,
+ z0 = svdiv_n_u64_m (p0, z1, MAXPOW),
+ z0 = svdiv_m (p0, z1, MAXPOW))
+
/*
** div_u64_z_tied1:
** movprfx z0\.d, p0/z, z0\.d
@@ -137,19 +195,61 @@ TEST_UNIFORM_ZX (div_x0_u64_z_untied, svuint64_t, uint64_t,
z0 = svdiv_z (p0, z1, x0))
/*
-** div_2_u64_z_tied1:
-** mov (z[0-9]+\.d), #2
+** div_1_u64_z_tied1:
+** mov (z[0-9]+\.d), #1
** movprfx z0\.d, p0/z, z0\.d
** udiv z0\.d, p0/m, z0\.d, \1
** ret
*/
+TEST_UNIFORM_Z (div_1_u64_z_tied1, svuint64_t,
+ z0 = svdiv_n_u64_z (p0, z0, 1),
+ z0 = svdiv_z (p0, z0, 1))
+
+/*
+** div_1_u64_z_untied:
+** mov z0\.d, #1
+** movprfx z0\.d, p0/z, z0\.d
+** udivr z0\.d, p0/m, z0\.d, z1\.d
+** ret
+*/
+TEST_UNIFORM_Z (div_1_u64_z_untied, svuint64_t,
+ z0 = svdiv_n_u64_z (p0, z1, 1),
+ z0 = svdiv_z (p0, z1, 1))
+
+/*
+** div_2_u64_z_tied1:
+** movprfx z0\.d, p0/z, z0\.d
+** lsr z0\.d, p0/m, z0\.d, #1
+** ret
+*/
TEST_UNIFORM_Z (div_2_u64_z_tied1, svuint64_t,
z0 = svdiv_n_u64_z (p0, z0, 2),
z0 = svdiv_z (p0, z0, 2))
/*
** div_2_u64_z_untied:
-** mov (z[0-9]+\.d), #2
+** movprfx z0\.d, p0/z, z1\.d
+** lsr z0\.d, p0/m, z0\.d, #1
+** ret
+*/
+TEST_UNIFORM_Z (div_2_u64_z_untied, svuint64_t,
+ z0 = svdiv_n_u64_z (p0, z1, 2),
+ z0 = svdiv_z (p0, z1, 2))
+
+/*
+** div_3_u64_z_tied1:
+** mov (z[0-9]+\.d), #3
+** movprfx z0\.d, p0/z, z0\.d
+** udiv z0\.d, p0/m, z0\.d, \1
+** ret
+*/
+TEST_UNIFORM_Z (div_3_u64_z_tied1, svuint64_t,
+ z0 = svdiv_n_u64_z (p0, z0, 3),
+ z0 = svdiv_z (p0, z0, 3))
+
+/*
+** div_3_u64_z_untied:
+** mov (z[0-9]+\.d), #3
** (
** movprfx z0\.d, p0/z, z1\.d
** udiv z0\.d, p0/m, z0\.d, \1
@@ -159,9 +259,29 @@ TEST_UNIFORM_Z (div_2_u64_z_tied1, svuint64_t,
** )
** ret
*/
-TEST_UNIFORM_Z (div_2_u64_z_untied, svuint64_t,
- z0 = svdiv_n_u64_z (p0, z1, 2),
- z0 = svdiv_z (p0, z1, 2))
+TEST_UNIFORM_Z (div_3_u64_z_untied, svuint64_t,
+ z0 = svdiv_n_u64_z (p0, z1, 3),
+ z0 = svdiv_z (p0, z1, 3))
+
+/*
+** div_maxpow_u64_z_tied1:
+** movprfx z0\.d, p0/z, z0\.d
+** lsr z0\.d, p0/m, z0\.d, #63
+** ret
+*/
+TEST_UNIFORM_Z (div_maxpow_u64_z_tied1, svuint64_t,
+ z0 = svdiv_n_u64_z (p0, z0, MAXPOW),
+ z0 = svdiv_z (p0, z0, MAXPOW))
+
+/*
+** div_maxpow_u64_z_untied:
+** movprfx z0\.d, p0/z, z1\.d
+** lsr z0\.d, p0/m, z0\.d, #63
+** ret
+*/
+TEST_UNIFORM_Z (div_maxpow_u64_z_untied, svuint64_t,
+ z0 = svdiv_n_u64_z (p0, z1, MAXPOW),
+ z0 = svdiv_z (p0, z1, MAXPOW))
/*
** div_u64_x_tied1:
@@ -216,10 +336,26 @@ TEST_UNIFORM_ZX (div_x0_u64_x_untied, svuint64_t, uint64_t,
z0 = svdiv_n_u64_x (p0, z1, x0),
z0 = svdiv_x (p0, z1, x0))
+/*
+** div_1_u64_x_tied1:
+** ret
+*/
+TEST_UNIFORM_Z (div_1_u64_x_tied1, svuint64_t,
+ z0 = svdiv_n_u64_x (p0, z0, 1),
+ z0 = svdiv_x (p0, z0, 1))
+
+/*
+** div_1_u64_x_untied:
+** mov z0\.d, z1\.d
+** ret
+*/
+TEST_UNIFORM_Z (div_1_u64_x_untied, svuint64_t,
+ z0 = svdiv_n_u64_x (p0, z1, 1),
+ z0 = svdiv_x (p0, z1, 1))
+
/*
** div_2_u64_x_tied1:
-** mov (z[0-9]+\.d), #2
-** udiv z0\.d, p0/m, z0\.d, \1
+** lsr z0\.d, z0\.d, #1
** ret
*/
TEST_UNIFORM_Z (div_2_u64_x_tied1, svuint64_t,
@@ -228,10 +364,47 @@ TEST_UNIFORM_Z (div_2_u64_x_tied1, svuint64_t,
/*
** div_2_u64_x_untied:
-** mov z0\.d, #2
-** udivr z0\.d, p0/m, z0\.d, z1\.d
+** lsr z0\.d, z1\.d, #1
** ret
*/
TEST_UNIFORM_Z (div_2_u64_x_untied, svuint64_t,
z0 = svdiv_n_u64_x (p0, z1, 2),
z0 = svdiv_x (p0, z1, 2))
+
+/*
+** div_3_u64_x_tied1:
+** mov (z[0-9]+\.d), #3
+** udiv z0\.d, p0/m, z0\.d, \1
+** ret
+*/
+TEST_UNIFORM_Z (div_3_u64_x_tied1, svuint64_t,
+ z0 = svdiv_n_u64_x (p0, z0, 3),
+ z0 = svdiv_x (p0, z0, 3))
+
+/*
+** div_3_u64_x_untied:
+** mov z0\.d, #3
+** udivr z0\.d, p0/m, z0\.d, z1\.d
+** ret
+*/
+TEST_UNIFORM_Z (div_3_u64_x_untied, svuint64_t,
+ z0 = svdiv_n_u64_x (p0, z1, 3),
+ z0 = svdiv_x (p0, z1, 3))
+
+/*
+** div_maxpow_u64_x_tied1:
+** lsr z0\.d, z0\.d, #63
+** ret
+*/
+TEST_UNIFORM_Z (div_maxpow_u64_x_tied1, svuint64_t,
+ z0 = svdiv_n_u64_x (p0, z0, MAXPOW),
+ z0 = svdiv_x (p0, z0, MAXPOW))
+
+/*
+** div_maxpow_u64_x_untied:
+** lsr z0\.d, z1\.d, #63
+** ret
+*/
+TEST_UNIFORM_Z (div_maxpow_u64_x_untied, svuint64_t,
+ z0 = svdiv_n_u64_x (p0, z1, MAXPOW),
+ z0 = svdiv_x (p0, z1, MAXPOW))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/div_const_run.c b/gcc/testsuite/gcc.target/aarch64/sve/div_const_run.c
new file mode 100644
index 00000000000..c96bb2763dc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/div_const_run.c
@@ -0,0 +1,91 @@
+/* { dg-do run { target aarch64_sve128_hw } } */
+/* { dg-options "-O2 -msve-vector-bits=128" } */
+
+#include <arm_sve.h>
+#include <stdint.h>
+
+typedef svbool_t pred __attribute__((arm_sve_vector_bits(128)));
+typedef svfloat16_t svfloat16_ __attribute__((arm_sve_vector_bits(128)));
+typedef svfloat32_t svfloat32_ __attribute__((arm_sve_vector_bits(128)));
+typedef svfloat64_t svfloat64_ __attribute__((arm_sve_vector_bits(128)));
+typedef svint32_t svint32_ __attribute__((arm_sve_vector_bits(128)));
+typedef svint64_t svint64_ __attribute__((arm_sve_vector_bits(128)));
+typedef svuint32_t svuint32_ __attribute__((arm_sve_vector_bits(128)));
+typedef svuint64_t svuint64_ __attribute__((arm_sve_vector_bits(128)));
+
+#define F(T, TS, P, OP1, OP2) \
+{ \
+ T##_t op1 = (T##_t) OP1; \
+ T##_t op2 = (T##_t) OP2; \
+ sv##T##_ res = svdiv_##P (pg, svdup_##TS (op1), svdup_##TS (op2)); \
+ sv##T##_ exp = svdup_##TS (op1 / op2); \
+ if (svptest_any (pg, svcmpne (pg, exp, res))) \
+ __builtin_abort (); \
+ \
+ sv##T##_ res_n = svdiv_##P (pg, svdup_##TS (op1), op2); \
+ if (svptest_any (pg, svcmpne (pg, exp, res_n))) \
+ __builtin_abort (); \
+}
+
+#define TEST_TYPES_1(T, TS) \
+ F (T, TS, m, 79, 16) \
+ F (T, TS, z, 79, 16) \
+ F (T, TS, x, 79, 16)
+
+#define TEST_TYPES \
+ TEST_TYPES_1 (float16, f16) \
+ TEST_TYPES_1 (float32, f32) \
+ TEST_TYPES_1 (float64, f64) \
+ TEST_TYPES_1 (int32, s32) \
+ TEST_TYPES_1 (int64, s64) \
+ TEST_TYPES_1 (uint32, u32) \
+ TEST_TYPES_1 (uint64, u64)
+
+#define TEST_VALUES_S_1(B, OP1, OP2) \
+ F (int##B, s##B, x, OP1, OP2)
+
+#define TEST_VALUES_S \
+ TEST_VALUES_S_1 (32, INT32_MIN, INT32_MIN) \
+ TEST_VALUES_S_1 (64, INT64_MIN, INT64_MIN) \
+ TEST_VALUES_S_1 (32, -7, 4) \
+ TEST_VALUES_S_1 (64, -7, 4) \
+ TEST_VALUES_S_1 (32, INT32_MAX, (1 << 30)) \
+ TEST_VALUES_S_1 (64, INT64_MAX, (1ULL << 62)) \
+ TEST_VALUES_S_1 (32, INT32_MIN, (1 << 30)) \
+ TEST_VALUES_S_1 (64, INT64_MIN, (1ULL << 62)) \
+ TEST_VALUES_S_1 (32, INT32_MAX, 1) \
+ TEST_VALUES_S_1 (64, INT64_MAX, 1) \
+ TEST_VALUES_S_1 (32, INT32_MIN, 16) \
+ TEST_VALUES_S_1 (64, INT64_MIN, 16) \
+ TEST_VALUES_S_1 (32, INT32_MAX, -5) \
+ TEST_VALUES_S_1 (64, INT64_MAX, -5) \
+ TEST_VALUES_S_1 (32, INT32_MIN, -4) \
+ TEST_VALUES_S_1 (64, INT64_MIN, -4)
+
+#define TEST_VALUES_U_1(B, OP1, OP2) \
+ F (uint##B, u##B, x, OP1, OP2)
+
+#define TEST_VALUES_U \
+ TEST_VALUES_U_1 (32, UINT32_MAX, UINT32_MAX) \
+ TEST_VALUES_U_1 (64, UINT64_MAX, UINT64_MAX) \
+ TEST_VALUES_U_1 (32, UINT32_MAX, (1 << 31)) \
+ TEST_VALUES_U_1 (64, UINT64_MAX, (1ULL << 63)) \
+ TEST_VALUES_U_1 (32, 7, 4) \
+ TEST_VALUES_U_1 (64, 7, 4) \
+ TEST_VALUES_U_1 (32, 7, 3) \
+ TEST_VALUES_U_1 (64, 7, 3) \
+ TEST_VALUES_U_1 (32, 11, 1) \
+ TEST_VALUES_U_1 (64, 11, 1)
+
+#define TEST_VALUES \
+ TEST_VALUES_S \
+ TEST_VALUES_U
+
+int
+main (void)
+{
+ const pred pg = svptrue_b8 ();
+ TEST_TYPES
+ TEST_VALUES
+ return 0;
+}
--
2.44.0
[-- Attachment #1.3: Type: text/plain, Size: 7896 bytes --]
> On 29 Jul 2024, at 22:55, Richard Sandiford <richard.sandiford@arm.com> wrote:
>
> External email: Use caution opening links or attachments
>
>
> Thanks for doing this.
>
> Jennifer Schmitz <jschmitz@nvidia.com> writes:
>> [...]
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c
>> index c49ca1aa524..6500b64c41b 100644
>> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c
>> @@ -1,6 +1,9 @@
>> /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
>>
>> #include "test_sve_acle.h"
>> +#include <stdint.h>
>> +
>
> I think it'd better to drop the explicit include of stdint.h. arm_sve.h
> is defined to include stdint.h itself, and we rely on that elsewhere.
>
> Same for div_s64.c.
Done.
>
>> +#define MAXPOW 1<<30
>>
>> /*
>> ** div_s32_m_tied1:
>> @@ -53,10 +56,27 @@ TEST_UNIFORM_ZX (div_w0_s32_m_untied, svint32_t, int32_t,
>> z0 = svdiv_n_s32_m (p0, z1, x0),
>> z0 = svdiv_m (p0, z1, x0))
>>
>> +/*
>> +** div_1_s32_m_tied1:
>> +** sel z0\.s, p0, z0\.s, z0\.s
>> +** ret
>> +*/
>> +TEST_UNIFORM_Z (div_1_s32_m_tied1, svint32_t,
>> + z0 = svdiv_n_s32_m (p0, z0, 1),
>> + z0 = svdiv_m (p0, z0, 1))
>> +
>> +/*
>> +** div_1_s32_m_untied:
>> +** sel z0\.s, p0, z1\.s, z1\.s
>> +** ret
>> +*/
>> +TEST_UNIFORM_Z (div_1_s32_m_untied, svint32_t,
>> + z0 = svdiv_n_s32_m (p0, z1, 1),
>> + z0 = svdiv_m (p0, z1, 1))
>> +
>
> [ Thanks for adding the tests (which look good to me). If the output
> ever improves in future, we can "defend" the improvement by changing
> the test. But in the meantime, the above defends something that is
> known to work. ]
>
>> [...]
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/div_const_run.c b/gcc/testsuite/gcc.target/aarch64/sve/div_const_run.c
>> new file mode 100644
>> index 00000000000..1a3c25b817d
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/div_const_run.c
>> @@ -0,0 +1,91 @@
>> +/* { dg-do run { target aarch64_sve128_hw } } */
>> +/* { dg-options "-O2 -msve-vector-bits=128" } */
>> +
>> +#include <arm_sve.h>
>> +#include <stdint.h>
>> +
>> +typedef svbool_t pred __attribute__((arm_sve_vector_bits(128)));
>> +typedef svfloat16_t svfloat16_ __attribute__((arm_sve_vector_bits(128)));
>> +typedef svfloat32_t svfloat32_ __attribute__((arm_sve_vector_bits(128)));
>> +typedef svfloat64_t svfloat64_ __attribute__((arm_sve_vector_bits(128)));
>> +typedef svint32_t svint32_ __attribute__((arm_sve_vector_bits(128)));
>> +typedef svint64_t svint64_ __attribute__((arm_sve_vector_bits(128)));
>> +typedef svuint32_t svuint32_ __attribute__((arm_sve_vector_bits(128)));
>> +typedef svuint64_t svuint64_ __attribute__((arm_sve_vector_bits(128)));
>> +
>> +#define F(T, TS, P, OP1, OP2) \
>> +{ \
>> + T##_t op1 = (T##_t) OP1; \
>> + T##_t op2 = (T##_t) OP2; \
>> + sv##T##_ res = svdiv_##P (pg, svdup_##TS (op1), svdup_##TS (op2)); \
>> + sv##T##_ exp = svdup_##TS (op1 / op2); \
>> + if (svptest_any (pg, svcmpne (pg, exp, res))) \
>> + __builtin_abort (); \
>> + \
>> + sv##T##_ res_n = svdiv_##P (pg, svdup_##TS (op1), op2); \
>> + if (svptest_any (pg, svcmpne (pg, exp, res_n))) \
>> + __builtin_abort (); \
>> +}
>> +
>> +#define TEST_TYPES_1(T, TS) \
>> + F (T, TS, m, 79, 16) \
>> + F (T, TS, z, 79, 16) \
>> + F (T, TS, x, 79, 16)
>> +
>> +#define TEST_TYPES \
>> + TEST_TYPES_1 (float16, f16) \
>> + TEST_TYPES_1 (float32, f32) \
>> + TEST_TYPES_1 (float64, f64) \
>> + TEST_TYPES_1 (int32, s32) \
>> + TEST_TYPES_1 (int64, s64) \
>> + TEST_TYPES_1 (uint32, u32) \
>> + TEST_TYPES_1 (uint64, u64)
>> +
>> +#define TEST_VALUES_S_1(B, OP1, OP2) \
>> + F (int##B, s##B, x, OP1, OP2)
>> +
>> +#define TEST_VALUES_S \
>> + TEST_VALUES_S_1 (32, INT32_MIN, INT32_MIN) \
>> + TEST_VALUES_S_1 (64, INT64_MIN, INT64_MIN) \
>> + TEST_VALUES_S_1 (32, -7, 4) \
>> + TEST_VALUES_S_1 (64, -7, 4) \
>> + TEST_VALUES_S_1 (32, INT32_MAX, (1 << 30)) \
>> + TEST_VALUES_S_1 (64, INT64_MAX, (1ULL << 62)) \
>> + TEST_VALUES_S_1 (32, INT32_MIN, (1 << 30)) \
>> + TEST_VALUES_S_1 (64, INT64_MIN, (1ULL << 62)) \
>> + TEST_VALUES_S_1 (32, INT32_MAX, 1) \
>> + TEST_VALUES_S_1 (64, INT64_MAX, 1) \
>> + TEST_VALUES_S_1 (32, INT32_MIN, 16) \
>> + TEST_VALUES_S_1 (64, INT64_MIN, 16) \
>> + TEST_VALUES_S_1 (32, INT32_MAX, -5) \
>> + TEST_VALUES_S_1 (64, INT64_MAX, -5) \
>> + TEST_VALUES_S_1 (32, INT32_MIN, -4) \
>> + TEST_VALUES_S_1 (64, INT64_MIN, -4)
>> +
>> +#define TEST_VALUES_U_1(B, OP1, OP2) \
>> + F (uint##B, u##B, x, OP1, OP2)
>> +
>> +#define TEST_VALUES_U \
>> + TEST_VALUES_U_1 (32, UINT32_MAX, UINT32_MAX) \
>> + TEST_VALUES_U_1 (64, UINT64_MAX, UINT64_MAX) \
>> + TEST_VALUES_U_1 (32, UINT32_MAX, (1 << 31)) \
>> + TEST_VALUES_U_1 (64, UINT64_MAX, (1ULL << 63)) \
>> + TEST_VALUES_U_1 (32, 7, 4) \
>> + TEST_VALUES_U_1 (64, 7, 4) \
>> + TEST_VALUES_U_1 (32, 7, 3) \
>> + TEST_VALUES_U_1 (64, 7, 3) \
>> + TEST_VALUES_U_1 (32, 11, 1) \
>> + TEST_VALUES_U_1 (64, 11, 1)
>> +
>> +#define TEST_VALUES \
>> + TEST_VALUES_S \
>> + TEST_VALUES_U
>> +
>> +int
>> +main (void)
>> +{
>> + const pred pg = svptrue_b64 ();
>
> I think this should svptrue_b8 instead. As it stands, the:
>
> if (svptest_any (pg, svcmpne (pg, ...)))
> __builtin_abort ();
>
> tests will only check the first element in each 64-bit chunk.
Done.
>
> OK with those changes from my POV, but please give others 24 hours
> to comment.
>
> Thanks,
> Richard
>
>> + TEST_TYPES
>> + TEST_VALUES
>> + return 0;
>> +}
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 4312 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] SVE intrinsics: Add strength reduction for division by constant.
2024-07-30 7:47 ` Jennifer Schmitz
@ 2024-07-30 11:22 ` Kyrylo Tkachov
0 siblings, 0 replies; 6+ messages in thread
From: Kyrylo Tkachov @ 2024-07-30 11:22 UTC (permalink / raw)
To: Jennifer Schmitz; +Cc: Richard Sandiford, gcc-patches, Kyrylo Tkachov
Hi Jennifer,
> On 30 Jul 2024, at 09:47, Jennifer Schmitz <jschmitz@nvidia.com> wrote:
>
> Dear Richard,
> Thanks for the feedback. Great to see this patch approved! I made the changes as suggested.
> Best,
> Jennifer
> <0001-SVE-intrinsics-Add-strength-reduction-for-division-b.patch>
Thanks, I’m okay with the patch as well and have pushed it to trunk with 7cde140863e.
To commit future patches yourself you should apply for Write After Approval commit access by filling in the form at https://sourceware.org/cgi-bin/pdw/ps_form.cgi . You can use my email address as approver.
Kyrill
>
>> On 29 Jul 2024, at 22:55, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>
>> External email: Use caution opening links or attachments
>>
>>
>> Thanks for doing this.
>>
>> Jennifer Schmitz <jschmitz@nvidia.com> writes:
>>> [...]
>>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c
>>> index c49ca1aa524..6500b64c41b 100644
>>> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c
>>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_s32.c
>>> @@ -1,6 +1,9 @@
>>> /* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
>>>
>>> #include "test_sve_acle.h"
>>> +#include <stdint.h>
>>> +
>>
>> I think it'd better to drop the explicit include of stdint.h. arm_sve.h
>> is defined to include stdint.h itself, and we rely on that elsewhere.
>>
>> Same for div_s64.c.
> Done.
>>
>>> +#define MAXPOW 1<<30
>>>
>>> /*
>>> ** div_s32_m_tied1:
>>> @@ -53,10 +56,27 @@ TEST_UNIFORM_ZX (div_w0_s32_m_untied, svint32_t, int32_t,
>>> z0 = svdiv_n_s32_m (p0, z1, x0),
>>> z0 = svdiv_m (p0, z1, x0))
>>>
>>> +/*
>>> +** div_1_s32_m_tied1:
>>> +** sel z0\.s, p0, z0\.s, z0\.s
>>> +** ret
>>> +*/
>>> +TEST_UNIFORM_Z (div_1_s32_m_tied1, svint32_t,
>>> + z0 = svdiv_n_s32_m (p0, z0, 1),
>>> + z0 = svdiv_m (p0, z0, 1))
>>> +
>>> +/*
>>> +** div_1_s32_m_untied:
>>> +** sel z0\.s, p0, z1\.s, z1\.s
>>> +** ret
>>> +*/
>>> +TEST_UNIFORM_Z (div_1_s32_m_untied, svint32_t,
>>> + z0 = svdiv_n_s32_m (p0, z1, 1),
>>> + z0 = svdiv_m (p0, z1, 1))
>>> +
>>
>> [ Thanks for adding the tests (which look good to me). If the output
>> ever improves in future, we can "defend" the improvement by changing
>> the test. But in the meantime, the above defends something that is
>> known to work. ]
>>
>>> [...]
>>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/div_const_run.c b/gcc/testsuite/gcc.target/aarch64/sve/div_const_run.c
>>> new file mode 100644
>>> index 00000000000..1a3c25b817d
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/div_const_run.c
>>> @@ -0,0 +1,91 @@
>>> +/* { dg-do run { target aarch64_sve128_hw } } */
>>> +/* { dg-options "-O2 -msve-vector-bits=128" } */
>>> +
>>> +#include <arm_sve.h>
>>> +#include <stdint.h>
>>> +
>>> +typedef svbool_t pred __attribute__((arm_sve_vector_bits(128)));
>>> +typedef svfloat16_t svfloat16_ __attribute__((arm_sve_vector_bits(128)));
>>> +typedef svfloat32_t svfloat32_ __attribute__((arm_sve_vector_bits(128)));
>>> +typedef svfloat64_t svfloat64_ __attribute__((arm_sve_vector_bits(128)));
>>> +typedef svint32_t svint32_ __attribute__((arm_sve_vector_bits(128)));
>>> +typedef svint64_t svint64_ __attribute__((arm_sve_vector_bits(128)));
>>> +typedef svuint32_t svuint32_ __attribute__((arm_sve_vector_bits(128)));
>>> +typedef svuint64_t svuint64_ __attribute__((arm_sve_vector_bits(128)));
>>> +
>>> +#define F(T, TS, P, OP1, OP2) \
>>> +{ \
>>> + T##_t op1 = (T##_t) OP1; \
>>> + T##_t op2 = (T##_t) OP2; \
>>> + sv##T##_ res = svdiv_##P (pg, svdup_##TS (op1), svdup_##TS (op2)); \
>>> + sv##T##_ exp = svdup_##TS (op1 / op2); \
>>> + if (svptest_any (pg, svcmpne (pg, exp, res))) \
>>> + __builtin_abort (); \
>>> + \
>>> + sv##T##_ res_n = svdiv_##P (pg, svdup_##TS (op1), op2); \
>>> + if (svptest_any (pg, svcmpne (pg, exp, res_n))) \
>>> + __builtin_abort (); \
>>> +}
>>> +
>>> +#define TEST_TYPES_1(T, TS) \
>>> + F (T, TS, m, 79, 16) \
>>> + F (T, TS, z, 79, 16) \
>>> + F (T, TS, x, 79, 16)
>>> +
>>> +#define TEST_TYPES \
>>> + TEST_TYPES_1 (float16, f16) \
>>> + TEST_TYPES_1 (float32, f32) \
>>> + TEST_TYPES_1 (float64, f64) \
>>> + TEST_TYPES_1 (int32, s32) \
>>> + TEST_TYPES_1 (int64, s64) \
>>> + TEST_TYPES_1 (uint32, u32) \
>>> + TEST_TYPES_1 (uint64, u64)
>>> +
>>> +#define TEST_VALUES_S_1(B, OP1, OP2) \
>>> + F (int##B, s##B, x, OP1, OP2)
>>> +
>>> +#define TEST_VALUES_S \
>>> + TEST_VALUES_S_1 (32, INT32_MIN, INT32_MIN) \
>>> + TEST_VALUES_S_1 (64, INT64_MIN, INT64_MIN) \
>>> + TEST_VALUES_S_1 (32, -7, 4) \
>>> + TEST_VALUES_S_1 (64, -7, 4) \
>>> + TEST_VALUES_S_1 (32, INT32_MAX, (1 << 30)) \
>>> + TEST_VALUES_S_1 (64, INT64_MAX, (1ULL << 62)) \
>>> + TEST_VALUES_S_1 (32, INT32_MIN, (1 << 30)) \
>>> + TEST_VALUES_S_1 (64, INT64_MIN, (1ULL << 62)) \
>>> + TEST_VALUES_S_1 (32, INT32_MAX, 1) \
>>> + TEST_VALUES_S_1 (64, INT64_MAX, 1) \
>>> + TEST_VALUES_S_1 (32, INT32_MIN, 16) \
>>> + TEST_VALUES_S_1 (64, INT64_MIN, 16) \
>>> + TEST_VALUES_S_1 (32, INT32_MAX, -5) \
>>> + TEST_VALUES_S_1 (64, INT64_MAX, -5) \
>>> + TEST_VALUES_S_1 (32, INT32_MIN, -4) \
>>> + TEST_VALUES_S_1 (64, INT64_MIN, -4)
>>> +
>>> +#define TEST_VALUES_U_1(B, OP1, OP2) \
>>> + F (uint##B, u##B, x, OP1, OP2)
>>> +
>>> +#define TEST_VALUES_U \
>>> + TEST_VALUES_U_1 (32, UINT32_MAX, UINT32_MAX) \
>>> + TEST_VALUES_U_1 (64, UINT64_MAX, UINT64_MAX) \
>>> + TEST_VALUES_U_1 (32, UINT32_MAX, (1 << 31)) \
>>> + TEST_VALUES_U_1 (64, UINT64_MAX, (1ULL << 63)) \
>>> + TEST_VALUES_U_1 (32, 7, 4) \
>>> + TEST_VALUES_U_1 (64, 7, 4) \
>>> + TEST_VALUES_U_1 (32, 7, 3) \
>>> + TEST_VALUES_U_1 (64, 7, 3) \
>>> + TEST_VALUES_U_1 (32, 11, 1) \
>>> + TEST_VALUES_U_1 (64, 11, 1)
>>> +
>>> +#define TEST_VALUES \
>>> + TEST_VALUES_S \
>>> + TEST_VALUES_U
>>> +
>>> +int
>>> +main (void)
>>> +{
>>> + const pred pg = svptrue_b64 ();
>>
>> I think this should svptrue_b8 instead. As it stands, the:
>>
>> if (svptest_any (pg, svcmpne (pg, ...)))
>> __builtin_abort ();
>>
>> tests will only check the first element in each 64-bit chunk.
> Done.
>>
>> OK with those changes from my POV, but please give others 24 hours
>> to comment.
>>
>> Thanks,
>> Richard
>>
>>> + TEST_TYPES
>>> + TEST_VALUES
>>> + return 0;
>>> +}
>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-07-30 11:22 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-07-17 7:02 [PATCH 1/2] SVE intrinsics: Add strength reduction for division by constant Jennifer Schmitz
2024-07-17 7:57 ` Richard Sandiford
2024-07-29 14:07 ` Jennifer Schmitz
2024-07-29 20:55 ` Richard Sandiford
2024-07-30 7:47 ` Jennifer Schmitz
2024-07-30 11:22 ` Kyrylo Tkachov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).