* [PATCH 2/3] SVE intrinsics: Fold constant operands for svdiv.
@ 2024-08-30 11:41 Jennifer Schmitz
2024-08-30 12:17 ` Richard Sandiford
0 siblings, 1 reply; 4+ messages in thread
From: Jennifer Schmitz @ 2024-08-30 11:41 UTC (permalink / raw)
To: gcc-patches; +Cc: Richard Sandiford, Richard Biener, Kyrylo Tkachov
[-- Attachment #1.1: Type: text/plain, Size: 905 bytes --]
This patch implements constant folding for svdiv. If the predicate is
ptrue or predication is _x, it uses vector_const_binop with
aarch64_const_binop as callback and tree_code TRUNC_DIV_EXPR to fold constant
integer operands.
In aarch64_const_binop, a case was added for TRUNC_DIV_EXPR to return 0
for division by 0, as defined in the semantics for svdiv.
Tests were added to check the produced assembly for different
predicates, signed and unsigned integers, and the svdiv_n_* case.
The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?
Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
gcc/
* config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl::fold):
Try constant folding.
* config/aarch64/aarch64-sve-builtins.cc (aarch64_const_binop):
Add special case for division by 0.
gcc/testsuite/
* gcc.target/aarch64/sve/const_fold_div_1.c: New test.
[-- Attachment #1.2: 0002-SVE-intrinsics-Fold-constant-operands-for-svdiv.patch --]
[-- Type: application/octet-stream, Size: 9789 bytes --]
From 92583732da28f6eb4a8db484fa3b24d55a7265e6 Mon Sep 17 00:00:00 2001
From: Jennifer Schmitz <jschmitz@nvidia.com>
Date: Thu, 29 Aug 2024 05:04:51 -0700
Subject: [PATCH 2/3] SVE intrinsics: Fold constant operands for svdiv.
This patch implements constant folding for svdiv. If the predicate is
ptrue or predication is _x, it uses vector_const_binop with
aarch64_const_binop as callback and tree_code TRUNC_DIV_EXPR to fold constant
integer operands.
In aarch64_const_binop, a case was added for TRUNC_DIV_EXPR to return 0
for division by 0, as defined in the semantics for svdiv.
Tests were added to check the produced assembly for different
predicates, signed and unsigned integers, and the svdiv_n_* case.
The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?
Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
gcc/
* config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl::fold):
Try constant folding.
* config/aarch64/aarch64-sve-builtins.cc (aarch64_const_binop):
Add special case for division by 0.
gcc/testsuite/
* gcc.target/aarch64/sve/const_fold_div_1.c: New test.
---
.../aarch64/aarch64-sve-builtins-base.cc | 19 +-
gcc/config/aarch64/aarch64-sve-builtins.cc | 4 +
.../gcc.target/aarch64/sve/const_fold_div_1.c | 336 ++++++++++++++++++
3 files changed, 356 insertions(+), 3 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_1.c
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index d55bee0b72f..617c7fc87e5 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -755,8 +755,21 @@ public:
gimple *
fold (gimple_folder &f) const override
{
- tree divisor = gimple_call_arg (f.call, 2);
- tree divisor_cst = uniform_integer_cst_p (divisor);
+ tree pg = gimple_call_arg (f.call, 0);
+ tree op1 = gimple_call_arg (f.call, 1);
+ tree op2 = gimple_call_arg (f.call, 2);
+
+ /* Try to fold constant integer operands. */
+ if (f.type_suffix (0).integer_p
+ && (f.pred == PRED_x
+ || is_ptrue (pg, f.type_suffix (0).element_bytes)))
+ if (tree res = vector_const_binop (TRUNC_DIV_EXPR, op1, op2,
+ aarch64_const_binop))
+ return gimple_build_assign (f.lhs, res);
+
+ /* If the divisor is a uniform power of 2, fold to a shift
+ instruction. */
+ tree divisor_cst = uniform_integer_cst_p (op2);
if (!divisor_cst || !integer_pow2p (divisor_cst))
return NULL;
@@ -770,7 +783,7 @@ public:
shapes::binary_uint_opt_n, MODE_n,
f.type_suffix_ids, GROUP_none, f.pred);
call = f.redirect_call (instance);
- tree d = INTEGRAL_TYPE_P (TREE_TYPE (divisor)) ? divisor : divisor_cst;
+ tree d = INTEGRAL_TYPE_P (TREE_TYPE (op2)) ? op2 : divisor_cst;
new_divisor = wide_int_to_tree (TREE_TYPE (d), tree_log2 (d));
}
else
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc
index 315d5ac4177..c1b28ebfe4e 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
@@ -3444,6 +3444,10 @@ aarch64_const_binop (enum tree_code code, tree arg1, tree arg2)
signop sign = TYPE_SIGN (type);
wi::overflow_type overflow = wi::OVF_NONE;
+ /* Return 0 for division by 0. */
+ if (code == TRUNC_DIV_EXPR && integer_zerop (arg2))
+ return arg2;
+
if (!poly_int_binop (poly_res, code, arg1, arg2, sign, &overflow))
return NULL_TREE;
return force_fit_type (type, poly_res, false,
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_1.c b/gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_1.c
new file mode 100644
index 00000000000..062fb6e560e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_1.c
@@ -0,0 +1,336 @@
+/* { dg-final { check-function-bodies "**" "" } } */
+/* { dg-options "-O2" } */
+
+#include "arm_sve.h"
+
+/*
+** s64_x_pg:
+** mov z[0-9]+\.d, #1
+** ret
+*/
+svint64_t s64_x_pg (svbool_t pg)
+{
+ return svdiv_x (pg, svdup_s64 (5), svdup_s64 (3));
+}
+
+/*
+** s64_x_pg_0:
+** mov z[0-9]+\.b, #0
+** ret
+*/
+svint64_t s64_x_pg_0 (svbool_t pg)
+{
+ return svdiv_x (pg, svdup_s64 (0), svdup_s64 (3));
+}
+
+/*
+** s64_x_pg_by0:
+** mov z[0-9]+\.b, #0
+** ret
+*/
+svint64_t s64_x_pg_by0 (svbool_t pg)
+{
+ return svdiv_x (pg, svdup_s64 (5), svdup_s64 (0));
+}
+
+/*
+** s64_z_pg:
+** mov z[0-9]+\.d, p[0-7]/z, #1
+** ret
+*/
+svint64_t s64_z_pg (svbool_t pg)
+{
+ return svdiv_z (pg, svdup_s64 (5), svdup_s64 (3));
+}
+
+/*
+** s64_z_pg_0:
+** mov z[0-9]+\.d, p[0-7]/z, #0
+** ret
+*/
+svint64_t s64_z_pg_0 (svbool_t pg)
+{
+ return svdiv_z (pg, svdup_s64 (0), svdup_s64 (3));
+}
+
+/*
+** s64_z_pg_by0:
+** mov (z[0-9]+\.d), #5
+** mov (z[0-9]+)\.b, #0
+** sdivr \2\.d, p[0-7]/m, \2\.d, \1
+** ret
+*/
+svint64_t s64_z_pg_by0 (svbool_t pg)
+{
+ return svdiv_z (pg, svdup_s64 (5), svdup_s64 (0));
+}
+
+/*
+** s64_m_pg:
+** mov (z[0-9]+\.d), #3
+** mov (z[0-9]+\.d), #5
+** sdiv \2, p[0-7]/m, \2, \1
+** ret
+*/
+svint64_t s64_m_pg (svbool_t pg)
+{
+ return svdiv_m (pg, svdup_s64 (5), svdup_s64 (3));
+}
+
+/*
+** s64_x_ptrue:
+** mov z[0-9]+\.d, #1
+** ret
+*/
+svint64_t s64_x_ptrue ()
+{
+ return svdiv_x (svptrue_b64 (), svdup_s64 (5), svdup_s64 (3));
+}
+
+/*
+** s64_z_ptrue:
+** mov z[0-9]+\.d, #1
+** ret
+*/
+svint64_t s64_z_ptrue ()
+{
+ return svdiv_z (svptrue_b64 (), svdup_s64 (5), svdup_s64 (3));
+}
+
+/*
+** s64_m_ptrue:
+** mov z[0-9]+\.d, #1
+** ret
+*/
+svint64_t s64_m_ptrue ()
+{
+ return svdiv_m (svptrue_b64 (), svdup_s64 (5), svdup_s64 (3));
+}
+
+/*
+** s64_x_pg_n:
+** mov z[0-9]+\.d, #1
+** ret
+*/
+svint64_t s64_x_pg_n (svbool_t pg)
+{
+ return svdiv_n_s64_x (pg, svdup_s64 (5), 3);
+}
+
+/*
+** s64_x_pg_n_s64_0:
+** mov z[0-9]+\.b, #0
+** ret
+*/
+svint64_t s64_x_pg_n_s64_0 (svbool_t pg)
+{
+ return svdiv_n_s64_x (pg, svdup_s64 (0), 3);
+}
+
+/*
+** s64_x_pg_n_s64_by0:
+** mov z[0-9]+\.b, #0
+** ret
+*/
+svint64_t s64_x_pg_n_s64_by0 (svbool_t pg)
+{
+ return svdiv_n_s64_x (pg, svdup_s64 (5), 0);
+}
+
+/*
+** s64_z_pg_n:
+** mov z[0-9]+\.d, p[0-7]/z, #1
+** ret
+*/
+svint64_t s64_z_pg_n (svbool_t pg)
+{
+ return svdiv_n_s64_z (pg, svdup_s64 (5), 3);
+}
+
+/*
+** s64_z_pg_n_s64_0:
+** mov z[0-9]+\.d, p[0-7]/z, #0
+** ret
+*/
+svint64_t s64_z_pg_n_s64_0 (svbool_t pg)
+{
+ return svdiv_n_s64_z (pg, svdup_s64 (0), 3);
+}
+
+/*
+** s64_z_pg_n_s64_by0:
+** mov (z[0-9]+\.d), #5
+** mov (z[0-9]+)\.b, #0
+** sdivr \2\.d, p[0-7]/m, \2\.d, \1
+** ret
+*/
+svint64_t s64_z_pg_n_s64_by0 (svbool_t pg)
+{
+ return svdiv_n_s64_z (pg, svdup_s64 (5), 0);
+}
+
+/*
+** s64_m_pg_n:
+** mov (z[0-9]+\.d), #3
+** mov (z[0-9]+\.d), #5
+** sdiv \2, p[0-7]/m, \2, \1
+** ret
+*/
+svint64_t s64_m_pg_n (svbool_t pg)
+{
+ return svdiv_n_s64_m (pg, svdup_s64 (5), 3);
+}
+
+/*
+** s64_x_ptrue_n:
+** mov z[0-9]+\.d, #1
+** ret
+*/
+svint64_t s64_x_ptrue_n ()
+{
+ return svdiv_n_s64_x (svptrue_b64 (), svdup_s64 (5), 3);
+}
+
+/*
+** s64_z_ptrue_n:
+** mov z[0-9]+\.d, #1
+** ret
+*/
+svint64_t s64_z_ptrue_n ()
+{
+ return svdiv_n_s64_z (svptrue_b64 (), svdup_s64 (5), 3);
+}
+
+/*
+** s64_m_ptrue_n:
+** mov z[0-9]+\.d, #1
+** ret
+*/
+svint64_t s64_m_ptrue_n ()
+{
+ return svdiv_n_s64_m (svptrue_b64 (), svdup_s64 (5), 3);
+}
+
+/*
+** u64_x_pg:
+** mov z[0-9]+\.d, #1
+** ret
+*/
+svuint64_t u64_x_pg (svbool_t pg)
+{
+ return svdiv_x (pg, svdup_u64 (5), svdup_u64 (3));
+}
+
+/*
+** u64_z_pg:
+** mov z[0-9]+\.d, p[0-7]/z, #1
+** ret
+*/
+svuint64_t u64_z_pg (svbool_t pg)
+{
+ return svdiv_z (pg, svdup_u64 (5), svdup_u64 (3));
+}
+
+/*
+** u64_m_pg:
+** mov (z[0-9]+\.d), #3
+** mov (z[0-9]+\.d), #5
+** udiv \2, p[0-7]/m, \2, \1
+** ret
+*/
+svuint64_t u64_m_pg (svbool_t pg)
+{
+ return svdiv_m (pg, svdup_u64 (5), svdup_u64 (3));
+}
+
+/*
+** u64_x_ptrue:
+** mov z[0-9]+\.d, #1
+** ret
+*/
+svuint64_t u64_x_ptrue ()
+{
+ return svdiv_x (svptrue_b64 (), svdup_u64 (5), svdup_u64 (3));
+}
+
+/*
+** u64_z_ptrue:
+** mov z[0-9]+\.d, #1
+** ret
+*/
+svuint64_t u64_z_ptrue ()
+{
+ return svdiv_z (svptrue_b64 (), svdup_u64 (5), svdup_u64 (3));
+}
+
+/*
+** u64_m_ptrue:
+** mov z[0-9]+\.d, #1
+** ret
+*/
+svuint64_t u64_m_ptrue ()
+{
+ return svdiv_m (svptrue_b64 (), svdup_u64 (5), svdup_u64 (3));
+}
+
+/*
+** u64_x_pg_n:
+** mov z[0-9]+\.d, #1
+** ret
+*/
+svuint64_t u64_x_pg_n (svbool_t pg)
+{
+ return svdiv_n_u64_x (pg, svdup_u64 (5), 3);
+}
+
+/*
+** u64_z_pg_n:
+** mov z[0-9]+\.d, p[0-7]/z, #1
+** ret
+*/
+svuint64_t u64_z_pg_n (svbool_t pg)
+{
+ return svdiv_n_u64_z (pg, svdup_u64 (5), 3);
+}
+
+/*
+** u64_m_pg_n:
+** mov (z[0-9]+\.d), #3
+** mov (z[0-9]+\.d), #5
+** udiv \2, p[0-7]/m, \2, \1
+** ret
+*/
+svuint64_t u64_m_pg_n (svbool_t pg)
+{
+ return svdiv_n_u64_m (pg, svdup_u64 (5), 3);
+}
+
+/*
+** u64_x_ptrue_n:
+** mov z[0-9]+\.d, #1
+** ret
+*/
+svuint64_t u64_x_ptrue_n ()
+{
+ return svdiv_n_u64_x (svptrue_b64 (), svdup_u64 (5), 3);
+}
+
+/*
+** u64_z_ptrue_n:
+** mov z[0-9]+\.d, #1
+** ret
+*/
+svuint64_t u64_z_ptrue_n ()
+{
+ return svdiv_n_u64_z (svptrue_b64 (), svdup_u64 (5), 3);
+}
+
+/*
+** u64_m_ptrue_n:
+** mov z[0-9]+\.d, #1
+** ret
+*/
+svuint64_t u64_m_ptrue_n ()
+{
+ return svdiv_n_u64_m (svptrue_b64 (), svdup_u64 (5), 3);
+}
--
2.44.0
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 4312 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 2/3] SVE intrinsics: Fold constant operands for svdiv.
2024-08-30 11:41 [PATCH 2/3] SVE intrinsics: Fold constant operands for svdiv Jennifer Schmitz
@ 2024-08-30 12:17 ` Richard Sandiford
2024-09-02 12:05 ` Jennifer Schmitz
0 siblings, 1 reply; 4+ messages in thread
From: Richard Sandiford @ 2024-08-30 12:17 UTC (permalink / raw)
To: Jennifer Schmitz; +Cc: gcc-patches, Richard Biener, Kyrylo Tkachov
Jennifer Schmitz <jschmitz@nvidia.com> writes:
> This patch implements constant folding for svdiv. If the predicate is
> ptrue or predication is _x, it uses vector_const_binop with
> aarch64_const_binop as callback and tree_code TRUNC_DIV_EXPR to fold constant
> integer operands.
> In aarch64_const_binop, a case was added for TRUNC_DIV_EXPR to return 0
> for division by 0, as defined in the semantics for svdiv.
> Tests were added to check the produced assembly for different
> predicates, signed and unsigned integers, and the svdiv_n_* case.
>
> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
> OK for mainline?
>
> Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
>
> gcc/
> * config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl::fold):
> Try constant folding.
> * config/aarch64/aarch64-sve-builtins.cc (aarch64_const_binop):
> Add special case for division by 0.
>
> gcc/testsuite/
> * gcc.target/aarch64/sve/const_fold_div_1.c: New test.
>
> From 92583732da28f6eb4a8db484fa3b24d55a7265e6 Mon Sep 17 00:00:00 2001
> From: Jennifer Schmitz <jschmitz@nvidia.com>
> Date: Thu, 29 Aug 2024 05:04:51 -0700
> Subject: [PATCH 2/3] SVE intrinsics: Fold constant operands for svdiv.
>
> This patch implements constant folding for svdiv. If the predicate is
> ptrue or predication is _x, it uses vector_const_binop with
> aarch64_const_binop as callback and tree_code TRUNC_DIV_EXPR to fold constant
> integer operands.
> In aarch64_const_binop, a case was added for TRUNC_DIV_EXPR to return 0
> for division by 0, as defined in the semantics for svdiv.
> Tests were added to check the produced assembly for different
> predicates, signed and unsigned integers, and the svdiv_n_* case.
>
> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
> OK for mainline?
>
> Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
>
> gcc/
> * config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl::fold):
> Try constant folding.
> * config/aarch64/aarch64-sve-builtins.cc (aarch64_const_binop):
> Add special case for division by 0.
>
> gcc/testsuite/
> * gcc.target/aarch64/sve/const_fold_div_1.c: New test.
> ---
> .../aarch64/aarch64-sve-builtins-base.cc | 19 +-
> gcc/config/aarch64/aarch64-sve-builtins.cc | 4 +
> .../gcc.target/aarch64/sve/const_fold_div_1.c | 336 ++++++++++++++++++
> 3 files changed, 356 insertions(+), 3 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_1.c
>
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> index d55bee0b72f..617c7fc87e5 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> @@ -755,8 +755,21 @@ public:
> gimple *
> fold (gimple_folder &f) const override
> {
> - tree divisor = gimple_call_arg (f.call, 2);
> - tree divisor_cst = uniform_integer_cst_p (divisor);
> + tree pg = gimple_call_arg (f.call, 0);
> + tree op1 = gimple_call_arg (f.call, 1);
> + tree op2 = gimple_call_arg (f.call, 2);
> +
> + /* Try to fold constant integer operands. */
> + if (f.type_suffix (0).integer_p
> + && (f.pred == PRED_x
> + || is_ptrue (pg, f.type_suffix (0).element_bytes)))
> + if (tree res = vector_const_binop (TRUNC_DIV_EXPR, op1, op2,
> + aarch64_const_binop))
> + return gimple_build_assign (f.lhs, res);
To reduce cut-&-paste, it'd be good to put this in a helper:
gimple *gimple_folder::fold_const_binary (tree_code code);
that does the outermost "if" above for "code" rather than TRUNC_DIV_EXPR.
It could return null on failure. Then the caller can just be:
if (auto *res = f.fold_const_binary (TRUNC_DIV_EXPR))
return res;
This could go right at the top of the function, since it doesn't rely
on any of the local variables above.
> +
> + /* If the divisor is a uniform power of 2, fold to a shift
> + instruction. */
> + tree divisor_cst = uniform_integer_cst_p (op2);
>
> if (!divisor_cst || !integer_pow2p (divisor_cst))
> return NULL;
> @@ -770,7 +783,7 @@ public:
> shapes::binary_uint_opt_n, MODE_n,
> f.type_suffix_ids, GROUP_none, f.pred);
> call = f.redirect_call (instance);
> - tree d = INTEGRAL_TYPE_P (TREE_TYPE (divisor)) ? divisor : divisor_cst;
> + tree d = INTEGRAL_TYPE_P (TREE_TYPE (op2)) ? op2 : divisor_cst;
> new_divisor = wide_int_to_tree (TREE_TYPE (d), tree_log2 (d));
> }
> else
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc
> index 315d5ac4177..c1b28ebfe4e 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
> @@ -3444,6 +3444,10 @@ aarch64_const_binop (enum tree_code code, tree arg1, tree arg2)
> signop sign = TYPE_SIGN (type);
> wi::overflow_type overflow = wi::OVF_NONE;
>
> + /* Return 0 for division by 0. */
> + if (code == TRUNC_DIV_EXPR && integer_zerop (arg2))
> + return arg2;
> +
> if (!poly_int_binop (poly_res, code, arg1, arg2, sign, &overflow))
> return NULL_TREE;
> return force_fit_type (type, poly_res, false,
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_1.c b/gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_1.c
> new file mode 100644
> index 00000000000..062fb6e560e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_1.c
> @@ -0,0 +1,336 @@
> +/* { dg-final { check-function-bodies "**" "" } } */
> +/* { dg-options "-O2" } */
> +
> +#include "arm_sve.h"
> +
> +/*
> +** s64_x_pg:
> +** mov z[0-9]+\.d, #1
> +** ret
> +*/
> +svint64_t s64_x_pg (svbool_t pg)
> +{
> + return svdiv_x (pg, svdup_s64 (5), svdup_s64 (3));
> +}
> +
> +/*
> +** s64_x_pg_0:
> +** mov z[0-9]+\.b, #0
> +** ret
> +*/
> +svint64_t s64_x_pg_0 (svbool_t pg)
> +{
> + return svdiv_x (pg, svdup_s64 (0), svdup_s64 (3));
> +}
> +
> +/*
> +** s64_x_pg_by0:
> +** mov z[0-9]+\.b, #0
> +** ret
> +*/
> +svint64_t s64_x_pg_by0 (svbool_t pg)
> +{
> + return svdiv_x (pg, svdup_s64 (5), svdup_s64 (0));
> +}
> +
> +/*
> +** s64_z_pg:
> +** mov z[0-9]+\.d, p[0-7]/z, #1
> +** ret
> +*/
> +svint64_t s64_z_pg (svbool_t pg)
> +{
> + return svdiv_z (pg, svdup_s64 (5), svdup_s64 (3));
> +}
> +
> +/*
> +** s64_z_pg_0:
> +** mov z[0-9]+\.d, p[0-7]/z, #0
> +** ret
> +*/
> +svint64_t s64_z_pg_0 (svbool_t pg)
> +{
> + return svdiv_z (pg, svdup_s64 (0), svdup_s64 (3));
> +}
> +
> +/*
> +** s64_z_pg_by0:
> +** mov (z[0-9]+\.d), #5
> +** mov (z[0-9]+)\.b, #0
> +** sdivr \2\.d, p[0-7]/m, \2\.d, \1
> +** ret
> +*/
> +svint64_t s64_z_pg_by0 (svbool_t pg)
> +{
> + return svdiv_z (pg, svdup_s64 (5), svdup_s64 (0));
> +}
> +
> +/*
> +** s64_m_pg:
> +** mov (z[0-9]+\.d), #3
> +** mov (z[0-9]+\.d), #5
> +** sdiv \2, p[0-7]/m, \2, \1
> +** ret
> +*/
> +svint64_t s64_m_pg (svbool_t pg)
> +{
> + return svdiv_m (pg, svdup_s64 (5), svdup_s64 (3));
> +}
> +
> +/*
> +** s64_x_ptrue:
> +** mov z[0-9]+\.d, #1
> +** ret
> +*/
> +svint64_t s64_x_ptrue ()
> +{
> + return svdiv_x (svptrue_b64 (), svdup_s64 (5), svdup_s64 (3));
> +}
> +
> +/*
> +** s64_z_ptrue:
> +** mov z[0-9]+\.d, #1
> +** ret
> +*/
> +svint64_t s64_z_ptrue ()
> +{
> + return svdiv_z (svptrue_b64 (), svdup_s64 (5), svdup_s64 (3));
> +}
> +
> +/*
> +** s64_m_ptrue:
> +** mov z[0-9]+\.d, #1
> +** ret
> +*/
> +svint64_t s64_m_ptrue ()
> +{
> + return svdiv_m (svptrue_b64 (), svdup_s64 (5), svdup_s64 (3));
> +}
> +
> +/*
> +** s64_x_pg_n:
> +** mov z[0-9]+\.d, #1
> +** ret
> +*/
> +svint64_t s64_x_pg_n (svbool_t pg)
> +{
> + return svdiv_n_s64_x (pg, svdup_s64 (5), 3);
> +}
> +
> +/*
> +** s64_x_pg_n_s64_0:
> +** mov z[0-9]+\.b, #0
> +** ret
> +*/
> +svint64_t s64_x_pg_n_s64_0 (svbool_t pg)
> +{
> + return svdiv_n_s64_x (pg, svdup_s64 (0), 3);
> +}
> +
> +/*
> +** s64_x_pg_n_s64_by0:
> +** mov z[0-9]+\.b, #0
> +** ret
> +*/
> +svint64_t s64_x_pg_n_s64_by0 (svbool_t pg)
> +{
> + return svdiv_n_s64_x (pg, svdup_s64 (5), 0);
> +}
> +
> +/*
> +** s64_z_pg_n:
> +** mov z[0-9]+\.d, p[0-7]/z, #1
> +** ret
> +*/
> +svint64_t s64_z_pg_n (svbool_t pg)
> +{
> + return svdiv_n_s64_z (pg, svdup_s64 (5), 3);
> +}
> +
> +/*
> +** s64_z_pg_n_s64_0:
> +** mov z[0-9]+\.d, p[0-7]/z, #0
> +** ret
> +*/
> +svint64_t s64_z_pg_n_s64_0 (svbool_t pg)
> +{
> + return svdiv_n_s64_z (pg, svdup_s64 (0), 3);
> +}
> +
> +/*
> +** s64_z_pg_n_s64_by0:
> +** mov (z[0-9]+\.d), #5
> +** mov (z[0-9]+)\.b, #0
> +** sdivr \2\.d, p[0-7]/m, \2\.d, \1
> +** ret
> +*/
> +svint64_t s64_z_pg_n_s64_by0 (svbool_t pg)
> +{
> + return svdiv_n_s64_z (pg, svdup_s64 (5), 0);
> +}
> +
> +/*
> +** s64_m_pg_n:
> +** mov (z[0-9]+\.d), #3
> +** mov (z[0-9]+\.d), #5
> +** sdiv \2, p[0-7]/m, \2, \1
> +** ret
> +*/
> +svint64_t s64_m_pg_n (svbool_t pg)
> +{
> + return svdiv_n_s64_m (pg, svdup_s64 (5), 3);
> +}
> +
> +/*
> +** s64_x_ptrue_n:
> +** mov z[0-9]+\.d, #1
> +** ret
> +*/
> +svint64_t s64_x_ptrue_n ()
> +{
> + return svdiv_n_s64_x (svptrue_b64 (), svdup_s64 (5), 3);
> +}
> +
> +/*
> +** s64_z_ptrue_n:
> +** mov z[0-9]+\.d, #1
> +** ret
> +*/
> +svint64_t s64_z_ptrue_n ()
> +{
> + return svdiv_n_s64_z (svptrue_b64 (), svdup_s64 (5), 3);
> +}
> +
> +/*
> +** s64_m_ptrue_n:
> +** mov z[0-9]+\.d, #1
> +** ret
> +*/
> +svint64_t s64_m_ptrue_n ()
> +{
> + return svdiv_n_s64_m (svptrue_b64 (), svdup_s64 (5), 3);
> +}
> +
> +/*
> +** u64_x_pg:
> +** mov z[0-9]+\.d, #1
> +** ret
> +*/
> +svuint64_t u64_x_pg (svbool_t pg)
> +{
> + return svdiv_x (pg, svdup_u64 (5), svdup_u64 (3));
> +}
> +
> +/*
> +** u64_z_pg:
> +** mov z[0-9]+\.d, p[0-7]/z, #1
> +** ret
> +*/
> +svuint64_t u64_z_pg (svbool_t pg)
> +{
> + return svdiv_z (pg, svdup_u64 (5), svdup_u64 (3));
> +}
> +
> +/*
> +** u64_m_pg:
> +** mov (z[0-9]+\.d), #3
> +** mov (z[0-9]+\.d), #5
> +** udiv \2, p[0-7]/m, \2, \1
> +** ret
> +*/
> +svuint64_t u64_m_pg (svbool_t pg)
> +{
> + return svdiv_m (pg, svdup_u64 (5), svdup_u64 (3));
> +}
> +
> +/*
> +** u64_x_ptrue:
> +** mov z[0-9]+\.d, #1
> +** ret
> +*/
> +svuint64_t u64_x_ptrue ()
> +{
> + return svdiv_x (svptrue_b64 (), svdup_u64 (5), svdup_u64 (3));
> +}
> +
> +/*
> +** u64_z_ptrue:
> +** mov z[0-9]+\.d, #1
> +** ret
> +*/
> +svuint64_t u64_z_ptrue ()
> +{
> + return svdiv_z (svptrue_b64 (), svdup_u64 (5), svdup_u64 (3));
> +}
> +
> +/*
> +** u64_m_ptrue:
> +** mov z[0-9]+\.d, #1
> +** ret
> +*/
> +svuint64_t u64_m_ptrue ()
> +{
> + return svdiv_m (svptrue_b64 (), svdup_u64 (5), svdup_u64 (3));
> +}
> +
> +/*
> +** u64_x_pg_n:
> +** mov z[0-9]+\.d, #1
> +** ret
> +*/
> +svuint64_t u64_x_pg_n (svbool_t pg)
> +{
> + return svdiv_n_u64_x (pg, svdup_u64 (5), 3);
> +}
> +
> +/*
> +** u64_z_pg_n:
> +** mov z[0-9]+\.d, p[0-7]/z, #1
> +** ret
> +*/
> +svuint64_t u64_z_pg_n (svbool_t pg)
> +{
> + return svdiv_n_u64_z (pg, svdup_u64 (5), 3);
> +}
> +
> +/*
> +** u64_m_pg_n:
> +** mov (z[0-9]+\.d), #3
> +** mov (z[0-9]+\.d), #5
> +** udiv \2, p[0-7]/m, \2, \1
> +** ret
> +*/
> +svuint64_t u64_m_pg_n (svbool_t pg)
> +{
> + return svdiv_n_u64_m (pg, svdup_u64 (5), 3);
> +}
> +
> +/*
> +** u64_x_ptrue_n:
> +** mov z[0-9]+\.d, #1
> +** ret
> +*/
> +svuint64_t u64_x_ptrue_n ()
> +{
> + return svdiv_n_u64_x (svptrue_b64 (), svdup_u64 (5), 3);
> +}
> +
> +/*
> +** u64_z_ptrue_n:
> +** mov z[0-9]+\.d, #1
> +** ret
> +*/
> +svuint64_t u64_z_ptrue_n ()
> +{
> + return svdiv_n_u64_z (svptrue_b64 (), svdup_u64 (5), 3);
> +}
> +
> +/*
> +** u64_m_ptrue_n:
> +** mov z[0-9]+\.d, #1
> +** ret
> +*/
> +svuint64_t u64_m_ptrue_n ()
> +{
> + return svdiv_n_u64_m (svptrue_b64 (), svdup_u64 (5), 3);
> +}
These are good tests, but maybe we could throw in a small number
of svdupq tests as well, to test for non-uniform cases. E.g.:
svdiv_s32_m (svptrue_b32 (), svdupq_s32 (3, 0, -5, 11), svdupq_s32 (4, 1, -6, 0));
which hopefully should get optimised to zero.
Similarly:
svdiv_s32_z (svptrue_b32 (), svdupq_s32 (6, -30, 100, -4), svdupq_s32 (-3, 15, -50, 2));
should get optimised to -2.
Looks good to me otherwise.
Thanks,
Richard
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 2/3] SVE intrinsics: Fold constant operands for svdiv.
2024-08-30 12:17 ` Richard Sandiford
@ 2024-09-02 12:05 ` Jennifer Schmitz
2024-09-02 12:35 ` Richard Sandiford
0 siblings, 1 reply; 4+ messages in thread
From: Jennifer Schmitz @ 2024-09-02 12:05 UTC (permalink / raw)
To: Richard Sandiford; +Cc: gcc-patches, Richard Biener, Kyrylo Tkachov
[-- Attachment #1.1: Type: text/plain, Size: 13774 bytes --]
> On 30 Aug 2024, at 14:17, Richard Sandiford <richard.sandiford@arm.com> wrote:
>
> External email: Use caution opening links or attachments
>
>
> Jennifer Schmitz <jschmitz@nvidia.com> writes:
>> This patch implements constant folding for svdiv. If the predicate is
>> ptrue or predication is _x, it uses vector_const_binop with
>> aarch64_const_binop as callback and tree_code TRUNC_DIV_EXPR to fold constant
>> integer operands.
>> In aarch64_const_binop, a case was added for TRUNC_DIV_EXPR to return 0
>> for division by 0, as defined in the semantics for svdiv.
>> Tests were added to check the produced assembly for different
>> predicates, signed and unsigned integers, and the svdiv_n_* case.
>>
>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
>> OK for mainline?
>>
>> Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
>>
>> gcc/
>> * config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl::fold):
>> Try constant folding.
>> * config/aarch64/aarch64-sve-builtins.cc (aarch64_const_binop):
>> Add special case for division by 0.
>>
>> gcc/testsuite/
>> * gcc.target/aarch64/sve/const_fold_div_1.c: New test.
>>
>> From 92583732da28f6eb4a8db484fa3b24d55a7265e6 Mon Sep 17 00:00:00 2001
>> From: Jennifer Schmitz <jschmitz@nvidia.com>
>> Date: Thu, 29 Aug 2024 05:04:51 -0700
>> Subject: [PATCH 2/3] SVE intrinsics: Fold constant operands for svdiv.
>>
>> This patch implements constant folding for svdiv. If the predicate is
>> ptrue or predication is _x, it uses vector_const_binop with
>> aarch64_const_binop as callback and tree_code TRUNC_DIV_EXPR to fold constant
>> integer operands.
>> In aarch64_const_binop, a case was added for TRUNC_DIV_EXPR to return 0
>> for division by 0, as defined in the semantics for svdiv.
>> Tests were added to check the produced assembly for different
>> predicates, signed and unsigned integers, and the svdiv_n_* case.
>>
>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
>> OK for mainline?
>>
>> Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
>>
>> gcc/
>> * config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl::fold):
>> Try constant folding.
>> * config/aarch64/aarch64-sve-builtins.cc (aarch64_const_binop):
>> Add special case for division by 0.
>>
>> gcc/testsuite/
>> * gcc.target/aarch64/sve/const_fold_div_1.c: New test.
>> ---
>> .../aarch64/aarch64-sve-builtins-base.cc | 19 +-
>> gcc/config/aarch64/aarch64-sve-builtins.cc | 4 +
>> .../gcc.target/aarch64/sve/const_fold_div_1.c | 336 ++++++++++++++++++
>> 3 files changed, 356 insertions(+), 3 deletions(-)
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_1.c
>>
>> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.ccb/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>> index d55bee0b72f..617c7fc87e5 100644
>> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>> @@ -755,8 +755,21 @@ public:
>> gimple *
>> fold (gimple_folder &f) const override
>> {
>> - tree divisor = gimple_call_arg (f.call, 2);
>> - tree divisor_cst = uniform_integer_cst_p (divisor);
>> + tree pg = gimple_call_arg (f.call, 0);
>> + tree op1 = gimple_call_arg (f.call, 1);
>> + tree op2 = gimple_call_arg (f.call, 2);
>> +
>> + /* Try to fold constant integer operands. */
>> + if (f.type_suffix (0).integer_p
>> + && (f.pred == PRED_x
>> + || is_ptrue (pg, f.type_suffix (0).element_bytes)))
>> + if (tree res = vector_const_binop (TRUNC_DIV_EXPR, op1, op2,
>> + aarch64_const_binop))
>> + return gimple_build_assign (f.lhs, res);
>
> To reduce cut-&-paste, it'd be good to put this in a helper:
>
> gimple *gimple_folder::fold_const_binary (tree_code code);
>
> that does the outermost "if" above for "code" rather than TRUNC_DIV_EXPR.
> It could return null on failure. Then the caller can just be:
>
> if (auto *res = f.fold_const_binary (TRUNC_DIV_EXPR))
> return res;
>
> This could go right at the top of the function, since it doesn't rely
> on any of the local variables above.
Done.
>
>> +
>> + /* If the divisor is a uniform power of 2, fold to a shift
>> + instruction. */
>> + tree divisor_cst = uniform_integer_cst_p (op2);
>>
>> if (!divisor_cst || !integer_pow2p (divisor_cst))
>> return NULL;
>> @@ -770,7 +783,7 @@ public:
>> shapes::binary_uint_opt_n, MODE_n,
>> f.type_suffix_ids, GROUP_none, f.pred);
>> call = f.redirect_call (instance);
>> - tree d = INTEGRAL_TYPE_P (TREE_TYPE (divisor)) ? divisor : divisor_cst;
>> + tree d = INTEGRAL_TYPE_P (TREE_TYPE (op2)) ? op2 : divisor_cst;
>> new_divisor = wide_int_to_tree (TREE_TYPE (d), tree_log2 (d));
>> }
>> else
>> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc
>> index 315d5ac4177..c1b28ebfe4e 100644
>> --- a/gcc/config/aarch64/aarch64-sve-builtins.cc
>> +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
>> @@ -3444,6 +3444,10 @@ aarch64_const_binop (enum tree_code code, tree arg1, tree arg2)
>> signop sign = TYPE_SIGN (type);
>> wi::overflow_type overflow = wi::OVF_NONE;
>>
>> + /* Return 0 for division by 0. */
>> + if (code == TRUNC_DIV_EXPR && integer_zerop (arg2))
>> + return arg2;
>> +
>> if (!poly_int_binop (poly_res, code, arg1, arg2, sign, &overflow))
>> return NULL_TREE;
>> return force_fit_type (type, poly_res, false,
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_1.c b/gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_1.c
>> new file mode 100644
>> index 00000000000..062fb6e560e
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_1.c
>> @@ -0,0 +1,336 @@
>> +/* { dg-final { check-function-bodies "**" "" } } */
>> +/* { dg-options "-O2" } */
>> +
>> +#include "arm_sve.h"
>> +
>> +/*
>> +** s64_x_pg:
>> +** mov z[0-9]+\.d, #1
>> +** ret
>> +*/
>> +svint64_t s64_x_pg (svbool_t pg)
>> +{
>> + return svdiv_x (pg, svdup_s64 (5), svdup_s64 (3));
>> +}
>> +
>> +/*
>> +** s64_x_pg_0:
>> +** mov z[0-9]+\.b, #0
>> +** ret
>> +*/
>> +svint64_t s64_x_pg_0 (svbool_t pg)
>> +{
>> + return svdiv_x (pg, svdup_s64 (0), svdup_s64 (3));
>> +}
>> +
>> +/*
>> +** s64_x_pg_by0:
>> +** mov z[0-9]+\.b, #0
>> +** ret
>> +*/
>> +svint64_t s64_x_pg_by0 (svbool_t pg)
>> +{
>> + return svdiv_x (pg, svdup_s64 (5), svdup_s64 (0));
>> +}
>> +
>> +/*
>> +** s64_z_pg:
>> +** mov z[0-9]+\.d, p[0-7]/z, #1
>> +** ret
>> +*/
>> +svint64_t s64_z_pg (svbool_t pg)
>> +{
>> + return svdiv_z (pg, svdup_s64 (5), svdup_s64 (3));
>> +}
>> +
>> +/*
>> +** s64_z_pg_0:
>> +** mov z[0-9]+\.d, p[0-7]/z, #0
>> +** ret
>> +*/
>> +svint64_t s64_z_pg_0 (svbool_t pg)
>> +{
>> + return svdiv_z (pg, svdup_s64 (0), svdup_s64 (3));
>> +}
>> +
>> +/*
>> +** s64_z_pg_by0:
>> +** mov (z[0-9]+\.d), #5
>> +** mov (z[0-9]+)\.b, #0
>> +** sdivr \2\.d, p[0-7]/m, \2\.d, \1
>> +** ret
>> +*/
>> +svint64_t s64_z_pg_by0 (svbool_t pg)
>> +{
>> + return svdiv_z (pg, svdup_s64 (5), svdup_s64 (0));
>> +}
>> +
>> +/*
>> +** s64_m_pg:
>> +** mov (z[0-9]+\.d), #3
>> +** mov (z[0-9]+\.d), #5
>> +** sdiv \2, p[0-7]/m, \2, \1
>> +** ret
>> +*/
>> +svint64_t s64_m_pg (svbool_t pg)
>> +{
>> + return svdiv_m (pg, svdup_s64 (5), svdup_s64 (3));
>> +}
>> +
>> +/*
>> +** s64_x_ptrue:
>> +** mov z[0-9]+\.d, #1
>> +** ret
>> +*/
>> +svint64_t s64_x_ptrue ()
>> +{
>> + return svdiv_x (svptrue_b64 (), svdup_s64 (5), svdup_s64 (3));
>> +}
>> +
>> +/*
>> +** s64_z_ptrue:
>> +** mov z[0-9]+\.d, #1
>> +** ret
>> +*/
>> +svint64_t s64_z_ptrue ()
>> +{
>> + return svdiv_z (svptrue_b64 (), svdup_s64 (5), svdup_s64 (3));
>> +}
>> +
>> +/*
>> +** s64_m_ptrue:
>> +** mov z[0-9]+\.d, #1
>> +** ret
>> +*/
>> +svint64_t s64_m_ptrue ()
>> +{
>> + return svdiv_m (svptrue_b64 (), svdup_s64 (5), svdup_s64 (3));
>> +}
>> +
>> +/*
>> +** s64_x_pg_n:
>> +** mov z[0-9]+\.d, #1
>> +** ret
>> +*/
>> +svint64_t s64_x_pg_n (svbool_t pg)
>> +{
>> + return svdiv_n_s64_x (pg, svdup_s64 (5), 3);
>> +}
>> +
>> +/*
>> +** s64_x_pg_n_s64_0:
>> +** mov z[0-9]+\.b, #0
>> +** ret
>> +*/
>> +svint64_t s64_x_pg_n_s64_0 (svbool_t pg)
>> +{
>> + return svdiv_n_s64_x (pg, svdup_s64 (0), 3);
>> +}
>> +
>> +/*
>> +** s64_x_pg_n_s64_by0:
>> +** mov z[0-9]+\.b, #0
>> +** ret
>> +*/
>> +svint64_t s64_x_pg_n_s64_by0 (svbool_t pg)
>> +{
>> + return svdiv_n_s64_x (pg, svdup_s64 (5), 0);
>> +}
>> +
>> +/*
>> +** s64_z_pg_n:
>> +** mov z[0-9]+\.d, p[0-7]/z, #1
>> +** ret
>> +*/
>> +svint64_t s64_z_pg_n (svbool_t pg)
>> +{
>> + return svdiv_n_s64_z (pg, svdup_s64 (5), 3);
>> +}
>> +
>> +/*
>> +** s64_z_pg_n_s64_0:
>> +** mov z[0-9]+\.d, p[0-7]/z, #0
>> +** ret
>> +*/
>> +svint64_t s64_z_pg_n_s64_0 (svbool_t pg)
>> +{
>> + return svdiv_n_s64_z (pg, svdup_s64 (0), 3);
>> +}
>> +
>> +/*
>> +** s64_z_pg_n_s64_by0:
>> +** mov (z[0-9]+\.d), #5
>> +** mov (z[0-9]+)\.b, #0
>> +** sdivr \2\.d, p[0-7]/m, \2\.d, \1
>> +** ret
>> +*/
>> +svint64_t s64_z_pg_n_s64_by0 (svbool_t pg)
>> +{
>> + return svdiv_n_s64_z (pg, svdup_s64 (5), 0);
>> +}
>> +
>> +/*
>> +** s64_m_pg_n:
>> +** mov (z[0-9]+\.d), #3
>> +** mov (z[0-9]+\.d), #5
>> +** sdiv \2, p[0-7]/m, \2, \1
>> +** ret
>> +*/
>> +svint64_t s64_m_pg_n (svbool_t pg)
>> +{
>> + return svdiv_n_s64_m (pg, svdup_s64 (5), 3);
>> +}
>> +
>> +/*
>> +** s64_x_ptrue_n:
>> +** mov z[0-9]+\.d, #1
>> +** ret
>> +*/
>> +svint64_t s64_x_ptrue_n ()
>> +{
>> + return svdiv_n_s64_x (svptrue_b64 (), svdup_s64 (5), 3);
>> +}
>> +
>> +/*
>> +** s64_z_ptrue_n:
>> +** mov z[0-9]+\.d, #1
>> +** ret
>> +*/
>> +svint64_t s64_z_ptrue_n ()
>> +{
>> + return svdiv_n_s64_z (svptrue_b64 (), svdup_s64 (5), 3);
>> +}
>> +
>> +/*
>> +** s64_m_ptrue_n:
>> +** mov z[0-9]+\.d, #1
>> +** ret
>> +*/
>> +svint64_t s64_m_ptrue_n ()
>> +{
>> + return svdiv_n_s64_m (svptrue_b64 (), svdup_s64 (5), 3);
>> +}
>> +
>> +/*
>> +** u64_x_pg:
>> +** mov z[0-9]+\.d, #1
>> +** ret
>> +*/
>> +svuint64_t u64_x_pg (svbool_t pg)
>> +{
>> + return svdiv_x (pg, svdup_u64 (5), svdup_u64 (3));
>> +}
>> +
>> +/*
>> +** u64_z_pg:
>> +** mov z[0-9]+\.d, p[0-7]/z, #1
>> +** ret
>> +*/
>> +svuint64_t u64_z_pg (svbool_t pg)
>> +{
>> + return svdiv_z (pg, svdup_u64 (5), svdup_u64 (3));
>> +}
>> +
>> +/*
>> +** u64_m_pg:
>> +** mov (z[0-9]+\.d), #3
>> +** mov (z[0-9]+\.d), #5
>> +** udiv \2, p[0-7]/m, \2, \1
>> +** ret
>> +*/
>> +svuint64_t u64_m_pg (svbool_t pg)
>> +{
>> + return svdiv_m (pg, svdup_u64 (5), svdup_u64 (3));
>> +}
>> +
>> +/*
>> +** u64_x_ptrue:
>> +** mov z[0-9]+\.d, #1
>> +** ret
>> +*/
>> +svuint64_t u64_x_ptrue ()
>> +{
>> + return svdiv_x (svptrue_b64 (), svdup_u64 (5), svdup_u64 (3));
>> +}
>> +
>> +/*
>> +** u64_z_ptrue:
>> +** mov z[0-9]+\.d, #1
>> +** ret
>> +*/
>> +svuint64_t u64_z_ptrue ()
>> +{
>> + return svdiv_z (svptrue_b64 (), svdup_u64 (5), svdup_u64 (3));
>> +}
>> +
>> +/*
>> +** u64_m_ptrue:
>> +** mov z[0-9]+\.d, #1
>> +** ret
>> +*/
>> +svuint64_t u64_m_ptrue ()
>> +{
>> + return svdiv_m (svptrue_b64 (), svdup_u64 (5), svdup_u64 (3));
>> +}
>> +
>> +/*
>> +** u64_x_pg_n:
>> +** mov z[0-9]+\.d, #1
>> +** ret
>> +*/
>> +svuint64_t u64_x_pg_n (svbool_t pg)
>> +{
>> + return svdiv_n_u64_x (pg, svdup_u64 (5), 3);
>> +}
>> +
>> +/*
>> +** u64_z_pg_n:
>> +** mov z[0-9]+\.d, p[0-7]/z, #1
>> +** ret
>> +*/
>> +svuint64_t u64_z_pg_n (svbool_t pg)
>> +{
>> + return svdiv_n_u64_z (pg, svdup_u64 (5), 3);
>> +}
>> +
>> +/*
>> +** u64_m_pg_n:
>> +** mov (z[0-9]+\.d), #3
>> +** mov (z[0-9]+\.d), #5
>> +** udiv \2, p[0-7]/m, \2, \1
>> +** ret
>> +*/
>> +svuint64_t u64_m_pg_n (svbool_t pg)
>> +{
>> + return svdiv_n_u64_m (pg, svdup_u64 (5), 3);
>> +}
>> +
>> +/*
>> +** u64_x_ptrue_n:
>> +** mov z[0-9]+\.d, #1
>> +** ret
>> +*/
>> +svuint64_t u64_x_ptrue_n ()
>> +{
>> + return svdiv_n_u64_x (svptrue_b64 (), svdup_u64 (5), 3);
>> +}
>> +
>> +/*
>> +** u64_z_ptrue_n:
>> +** mov z[0-9]+\.d, #1
>> +** ret
>> +*/
>> +svuint64_t u64_z_ptrue_n ()
>> +{
>> + return svdiv_n_u64_z (svptrue_b64 (), svdup_u64 (5), 3);
>> +}
>> +
>> +/*
>> +** u64_m_ptrue_n:
>> +** mov z[0-9]+\.d, #1
>> +** ret
>> +*/
>> +svuint64_t u64_m_ptrue_n ()
>> +{
>> + return svdiv_n_u64_m (svptrue_b64 (), svdup_u64 (5), 3);
>> +}
>
> These are good tests, but maybe we could throw in a small number
> of svdupq tests as well, to test for non-uniform cases. E.g.:
>
> svdiv_s32_m (svptrue_b32 (), svdupq_s32 (3, 0, -5, 11), svdupq_s32 (4, 1, -6, 0));
>
> which hopefully should get optimised to zero.
>
> Similarly:
>
> svdiv_s32_z (svptrue_b32 (), svdupq_s32 (6, -30, 100, -4), svdupq_s32 (-3, 15, -50, 2));
>
> should get optimised to -2.
>
> Looks good to me otherwise.
>
> Thanks,
> Richard
Thanks for suggesting these tests, I added them to the test file.
Best, Jennifer
[-- Attachment #1.2: 0002-SVE-intrinsics-Fold-constant-operands-for-svdiv.patch --]
[-- Type: application/octet-stream, Size: 12233 bytes --]
From d606d237f48ba60d0f82f125bfdd38e7e2a403e4 Mon Sep 17 00:00:00 2001
From: Jennifer Schmitz <jschmitz@nvidia.com>
Date: Fri, 30 Aug 2024 07:03:49 -0700
Subject: [PATCH 2/3] SVE intrinsics: Fold constant operands for svdiv.
This patch implements constant folding for svdiv:
The new function aarch64_const_binop was created, which - in contrast to
int_const_binop - does not treat operations as overflowing. This function is
passed as callback to vector_const_binop from the new gimple_folder
method fold_const_binary, if the predicate is ptrue or predication is _x.
From svdiv_impl::fold, fold_const_binary is called with TRUNC_DIV_EXPR as
tree_code.
In aarch64_const_binop, a case was added for TRUNC_DIV_EXPR to return 0
for division by 0, as defined in the semantics for svdiv.
Tests were added to check the produced assembly for different
predicates, signed and unsigned integers, and the svdiv_n_* case.
The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?
Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
gcc/
* config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl::fold):
Try constant folding.
* config/aarch64/aarch64-sve-builtins.h: Declare
gimple_folder::fold_const_binary.
* config/aarch64/aarch64-sve-builtins.cc (aarch64_const_binop):
New function to fold binary SVE intrinsics without overflow.
(gimple_folder::fold_const_binary): New helper function for
constant folding of SVE intrinsics.
gcc/testsuite/
* gcc.target/aarch64/sve/const_fold_div_1.c: New test.
---
.../aarch64/aarch64-sve-builtins-base.cc | 11 +-
gcc/config/aarch64/aarch64-sve-builtins.cc | 41 ++
gcc/config/aarch64/aarch64-sve-builtins.h | 2 +
.../gcc.target/aarch64/sve/const_fold_div_1.c | 358 ++++++++++++++++++
4 files changed, 409 insertions(+), 3 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_1.c
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index d55bee0b72f..6c94d144dc9 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -755,8 +755,13 @@ public:
gimple *
fold (gimple_folder &f) const override
{
- tree divisor = gimple_call_arg (f.call, 2);
- tree divisor_cst = uniform_integer_cst_p (divisor);
+ if (auto *res = f.fold_const_binary (TRUNC_DIV_EXPR))
+ return res;
+
+ /* If the divisor is a uniform power of 2, fold to a shift
+ instruction. */
+ tree op2 = gimple_call_arg (f.call, 2);
+ tree divisor_cst = uniform_integer_cst_p (op2);
if (!divisor_cst || !integer_pow2p (divisor_cst))
return NULL;
@@ -770,7 +775,7 @@ public:
shapes::binary_uint_opt_n, MODE_n,
f.type_suffix_ids, GROUP_none, f.pred);
call = f.redirect_call (instance);
- tree d = INTEGRAL_TYPE_P (TREE_TYPE (divisor)) ? divisor : divisor_cst;
+ tree d = INTEGRAL_TYPE_P (TREE_TYPE (op2)) ? op2 : divisor_cst;
new_divisor = wide_int_to_tree (TREE_TYPE (d), tree_log2 (d));
}
else
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc
index 5ca9ec32b69..60350e08372 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
@@ -1132,6 +1132,30 @@ report_not_enum (location_t location, tree fndecl, unsigned int argno,
" a valid %qT value", actual, argno + 1, fndecl, enumtype);
}
+/* Try to fold constant arguments arg1 and arg2 using the given tree_code.
+ Operations are not treated as overflowing. */
+static tree
+aarch64_const_binop (enum tree_code code, tree arg1, tree arg2)
+{
+ if (poly_int_tree_p (arg1) && poly_int_tree_p (arg2))
+ {
+ poly_wide_int poly_res;
+ tree type = TREE_TYPE (arg1);
+ signop sign = TYPE_SIGN (type);
+ wi::overflow_type overflow = wi::OVF_NONE;
+
+ /* Return 0 for division by 0. */
+ if (code == TRUNC_DIV_EXPR && integer_zerop (arg2))
+ return arg2;
+
+ if (!poly_int_binop (poly_res, code, arg1, arg2, sign, &overflow))
+ return NULL_TREE;
+ return force_fit_type (type, poly_res, false,
+ TREE_OVERFLOW (arg1) | TREE_OVERFLOW (arg2));
+ }
+ return NULL_TREE;
+}
+
/* Return a hash code for a function_instance. */
hashval_t
function_instance::hash () const
@@ -3616,6 +3640,23 @@ gimple_folder::fold ()
return base->fold (*this);
}
+/* Try to fold constant integer operands. */
+gimple *
+gimple_folder::fold_const_binary (enum tree_code code)
+{
+ gcc_assert (gimple_call_num_args (call) == 3);
+ tree pg = gimple_call_arg (call, 0);
+ tree op1 = gimple_call_arg (call, 1);
+ tree op2 = gimple_call_arg (call, 2);
+
+ if (type_suffix (0).integer_p
+ && (pred == PRED_x || is_ptrue (pg, type_suffix (0).element_bytes)))
+ if (tree res = vector_const_binop (code, op1, op2, aarch64_const_binop))
+ return gimple_build_assign (lhs, res);
+
+ return NULL;
+}
+
function_expander::function_expander (const function_instance &instance,
tree fndecl, tree call_expr_in,
rtx possible_target_in)
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h b/gcc/config/aarch64/aarch64-sve-builtins.h
index 9ab6f202c30..22e9a815039 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.h
+++ b/gcc/config/aarch64/aarch64-sve-builtins.h
@@ -639,6 +639,8 @@ public:
gimple *fold ();
+ gimple *fold_const_binary (enum tree_code);
+
/* Where to insert extra statements that feed the final replacement. */
gimple_stmt_iterator *gsi;
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_1.c b/gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_1.c
new file mode 100644
index 00000000000..c15b3fc3aa0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_1.c
@@ -0,0 +1,358 @@
+/* { dg-final { check-function-bodies "**" "" } } */
+/* { dg-options "-O2" } */
+
+#include "arm_sve.h"
+
+/*
+** s64_x_pg:
+** mov z[0-9]+\.d, #1
+** ret
+*/
+svint64_t s64_x_pg (svbool_t pg)
+{
+ return svdiv_x (pg, svdup_s64 (5), svdup_s64 (3));
+}
+
+/*
+** s64_x_pg_0:
+** mov z[0-9]+\.b, #0
+** ret
+*/
+svint64_t s64_x_pg_0 (svbool_t pg)
+{
+ return svdiv_x (pg, svdup_s64 (0), svdup_s64 (3));
+}
+
+/*
+** s64_x_pg_by0:
+** mov z[0-9]+\.b, #0
+** ret
+*/
+svint64_t s64_x_pg_by0 (svbool_t pg)
+{
+ return svdiv_x (pg, svdup_s64 (5), svdup_s64 (0));
+}
+
+/*
+** s64_z_pg:
+** mov z[0-9]+\.d, p[0-7]/z, #1
+** ret
+*/
+svint64_t s64_z_pg (svbool_t pg)
+{
+ return svdiv_z (pg, svdup_s64 (5), svdup_s64 (3));
+}
+
+/*
+** s64_z_pg_0:
+** mov z[0-9]+\.d, p[0-7]/z, #0
+** ret
+*/
+svint64_t s64_z_pg_0 (svbool_t pg)
+{
+ return svdiv_z (pg, svdup_s64 (0), svdup_s64 (3));
+}
+
+/*
+** s64_z_pg_by0:
+** mov (z[0-9]+\.d), #5
+** mov (z[0-9]+)\.b, #0
+** sdivr \2\.d, p[0-7]/m, \2\.d, \1
+** ret
+*/
+svint64_t s64_z_pg_by0 (svbool_t pg)
+{
+ return svdiv_z (pg, svdup_s64 (5), svdup_s64 (0));
+}
+
+/*
+** s64_m_pg:
+** mov (z[0-9]+\.d), #3
+** mov (z[0-9]+\.d), #5
+** sdiv \2, p[0-7]/m, \2, \1
+** ret
+*/
+svint64_t s64_m_pg (svbool_t pg)
+{
+ return svdiv_m (pg, svdup_s64 (5), svdup_s64 (3));
+}
+
+/*
+** s64_x_ptrue:
+** mov z[0-9]+\.d, #1
+** ret
+*/
+svint64_t s64_x_ptrue ()
+{
+ return svdiv_x (svptrue_b64 (), svdup_s64 (5), svdup_s64 (3));
+}
+
+/*
+** s64_z_ptrue:
+** mov z[0-9]+\.d, #1
+** ret
+*/
+svint64_t s64_z_ptrue ()
+{
+ return svdiv_z (svptrue_b64 (), svdup_s64 (5), svdup_s64 (3));
+}
+
+/*
+** s64_m_ptrue:
+** mov z[0-9]+\.d, #1
+** ret
+*/
+svint64_t s64_m_ptrue ()
+{
+ return svdiv_m (svptrue_b64 (), svdup_s64 (5), svdup_s64 (3));
+}
+
+/*
+** s64_x_pg_n:
+** mov z[0-9]+\.d, #1
+** ret
+*/
+svint64_t s64_x_pg_n (svbool_t pg)
+{
+ return svdiv_n_s64_x (pg, svdup_s64 (5), 3);
+}
+
+/*
+** s64_x_pg_n_s64_0:
+** mov z[0-9]+\.b, #0
+** ret
+*/
+svint64_t s64_x_pg_n_s64_0 (svbool_t pg)
+{
+ return svdiv_n_s64_x (pg, svdup_s64 (0), 3);
+}
+
+/*
+** s64_x_pg_n_s64_by0:
+** mov z[0-9]+\.b, #0
+** ret
+*/
+svint64_t s64_x_pg_n_s64_by0 (svbool_t pg)
+{
+ return svdiv_n_s64_x (pg, svdup_s64 (5), 0);
+}
+
+/*
+** s64_z_pg_n:
+** mov z[0-9]+\.d, p[0-7]/z, #1
+** ret
+*/
+svint64_t s64_z_pg_n (svbool_t pg)
+{
+ return svdiv_n_s64_z (pg, svdup_s64 (5), 3);
+}
+
+/*
+** s64_z_pg_n_s64_0:
+** mov z[0-9]+\.d, p[0-7]/z, #0
+** ret
+*/
+svint64_t s64_z_pg_n_s64_0 (svbool_t pg)
+{
+ return svdiv_n_s64_z (pg, svdup_s64 (0), 3);
+}
+
+/*
+** s64_z_pg_n_s64_by0:
+** mov (z[0-9]+\.d), #5
+** mov (z[0-9]+)\.b, #0
+** sdivr \2\.d, p[0-7]/m, \2\.d, \1
+** ret
+*/
+svint64_t s64_z_pg_n_s64_by0 (svbool_t pg)
+{
+ return svdiv_n_s64_z (pg, svdup_s64 (5), 0);
+}
+
+/*
+** s64_m_pg_n:
+** mov (z[0-9]+\.d), #3
+** mov (z[0-9]+\.d), #5
+** sdiv \2, p[0-7]/m, \2, \1
+** ret
+*/
+svint64_t s64_m_pg_n (svbool_t pg)
+{
+ return svdiv_n_s64_m (pg, svdup_s64 (5), 3);
+}
+
+/*
+** s64_x_ptrue_n:
+** mov z[0-9]+\.d, #1
+** ret
+*/
+svint64_t s64_x_ptrue_n ()
+{
+ return svdiv_n_s64_x (svptrue_b64 (), svdup_s64 (5), 3);
+}
+
+/*
+** s64_z_ptrue_n:
+** mov z[0-9]+\.d, #1
+** ret
+*/
+svint64_t s64_z_ptrue_n ()
+{
+ return svdiv_n_s64_z (svptrue_b64 (), svdup_s64 (5), 3);
+}
+
+/*
+** s64_m_ptrue_n:
+** mov z[0-9]+\.d, #1
+** ret
+*/
+svint64_t s64_m_ptrue_n ()
+{
+ return svdiv_n_s64_m (svptrue_b64 (), svdup_s64 (5), 3);
+}
+
+/*
+** s32_m_ptrue_dupq:
+** mov z[0-9]+\.b, #0
+** ret
+*/
+svint32_t s32_m_ptrue_dupq ()
+{
+ return svdiv_s32_m (svptrue_b32 (), svdupq_s32 (3, 0, -5, 11),
+ svdupq_s32 (4, 1, -6, 0));
+}
+
+/*
+** s32_z_ptrue_dupq:
+** mov z[0-9]+\.s, #-2
+** ret
+*/
+svint32_t s32_z_ptrue_dupq ()
+{
+ return svdiv_s32_z (svptrue_b32 (), svdupq_s32 (6, -30, 100, -4),
+ svdupq_s32 (-3, 15, -50, 2));
+}
+
+/*
+** u64_x_pg:
+** mov z[0-9]+\.d, #1
+** ret
+*/
+svuint64_t u64_x_pg (svbool_t pg)
+{
+ return svdiv_x (pg, svdup_u64 (5), svdup_u64 (3));
+}
+
+/*
+** u64_z_pg:
+** mov z[0-9]+\.d, p[0-7]/z, #1
+** ret
+*/
+svuint64_t u64_z_pg (svbool_t pg)
+{
+ return svdiv_z (pg, svdup_u64 (5), svdup_u64 (3));
+}
+
+/*
+** u64_m_pg:
+** mov (z[0-9]+\.d), #3
+** mov (z[0-9]+\.d), #5
+** udiv \2, p[0-7]/m, \2, \1
+** ret
+*/
+svuint64_t u64_m_pg (svbool_t pg)
+{
+ return svdiv_m (pg, svdup_u64 (5), svdup_u64 (3));
+}
+
+/*
+** u64_x_ptrue:
+** mov z[0-9]+\.d, #1
+** ret
+*/
+svuint64_t u64_x_ptrue ()
+{
+ return svdiv_x (svptrue_b64 (), svdup_u64 (5), svdup_u64 (3));
+}
+
+/*
+** u64_z_ptrue:
+** mov z[0-9]+\.d, #1
+** ret
+*/
+svuint64_t u64_z_ptrue ()
+{
+ return svdiv_z (svptrue_b64 (), svdup_u64 (5), svdup_u64 (3));
+}
+
+/*
+** u64_m_ptrue:
+** mov z[0-9]+\.d, #1
+** ret
+*/
+svuint64_t u64_m_ptrue ()
+{
+ return svdiv_m (svptrue_b64 (), svdup_u64 (5), svdup_u64 (3));
+}
+
+/*
+** u64_x_pg_n:
+** mov z[0-9]+\.d, #1
+** ret
+*/
+svuint64_t u64_x_pg_n (svbool_t pg)
+{
+ return svdiv_n_u64_x (pg, svdup_u64 (5), 3);
+}
+
+/*
+** u64_z_pg_n:
+** mov z[0-9]+\.d, p[0-7]/z, #1
+** ret
+*/
+svuint64_t u64_z_pg_n (svbool_t pg)
+{
+ return svdiv_n_u64_z (pg, svdup_u64 (5), 3);
+}
+
+/*
+** u64_m_pg_n:
+** mov (z[0-9]+\.d), #3
+** mov (z[0-9]+\.d), #5
+** udiv \2, p[0-7]/m, \2, \1
+** ret
+*/
+svuint64_t u64_m_pg_n (svbool_t pg)
+{
+ return svdiv_n_u64_m (pg, svdup_u64 (5), 3);
+}
+
+/*
+** u64_x_ptrue_n:
+** mov z[0-9]+\.d, #1
+** ret
+*/
+svuint64_t u64_x_ptrue_n ()
+{
+ return svdiv_n_u64_x (svptrue_b64 (), svdup_u64 (5), 3);
+}
+
+/*
+** u64_z_ptrue_n:
+** mov z[0-9]+\.d, #1
+** ret
+*/
+svuint64_t u64_z_ptrue_n ()
+{
+ return svdiv_n_u64_z (svptrue_b64 (), svdup_u64 (5), 3);
+}
+
+/*
+** u64_m_ptrue_n:
+** mov z[0-9]+\.d, #1
+** ret
+*/
+svuint64_t u64_m_ptrue_n ()
+{
+ return svdiv_n_u64_m (svptrue_b64 (), svdup_u64 (5), 3);
+}
--
2.34.1
[-- Attachment #1.3: Type: text/plain, Size: 3 bytes --]
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 4312 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 2/3] SVE intrinsics: Fold constant operands for svdiv.
2024-09-02 12:05 ` Jennifer Schmitz
@ 2024-09-02 12:35 ` Richard Sandiford
0 siblings, 0 replies; 4+ messages in thread
From: Richard Sandiford @ 2024-09-02 12:35 UTC (permalink / raw)
To: Jennifer Schmitz; +Cc: gcc-patches, Richard Biener, Kyrylo Tkachov
Jennifer Schmitz <jschmitz@nvidia.com> writes:
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc
> index 5ca9ec32b69..60350e08372 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
> @@ -1132,6 +1132,30 @@ report_not_enum (location_t location, tree fndecl, unsigned int argno,
> " a valid %qT value", actual, argno + 1, fndecl, enumtype);
> }
>
> +/* Try to fold constant arguments arg1 and arg2 using the given tree_code.
> + Operations are not treated as overflowing. */
Sorry for the nit, but: the convention is to put argument names in caps,
so ARG1 and ARG2.
> +static tree
> +aarch64_const_binop (enum tree_code code, tree arg1, tree arg2)
> +{
> + if (poly_int_tree_p (arg1) && poly_int_tree_p (arg2))
> + {
> + poly_wide_int poly_res;
> + tree type = TREE_TYPE (arg1);
> + signop sign = TYPE_SIGN (type);
> + wi::overflow_type overflow = wi::OVF_NONE;
> +
> + /* Return 0 for division by 0. */
Maybe add ", like SDIV and UDIV do", to make it clearer where this has
come from.
> + if (code == TRUNC_DIV_EXPR && integer_zerop (arg2))
> + return arg2;
> +
> + if (!poly_int_binop (poly_res, code, arg1, arg2, sign, &overflow))
> + return NULL_TREE;
> + return force_fit_type (type, poly_res, false,
> + TREE_OVERFLOW (arg1) | TREE_OVERFLOW (arg2));
> + }
> + return NULL_TREE;
> +}
> +
> /* Return a hash code for a function_instance. */
> hashval_t
> function_instance::hash () const
> @@ -3616,6 +3640,23 @@ gimple_folder::fold ()
> return base->fold (*this);
> }
>
> +/* Try to fold constant integer operands. */
Maybe:
/* Try to fold the call to a constant, given that, for integers, the call
is roughly equivalent to binary operation CODE. aarch64_const_binop
handles any differences between CODE and the intrinsic. */
> +gimple *
> +gimple_folder::fold_const_binary (enum tree_code code)
> +{
> + gcc_assert (gimple_call_num_args (call) == 3);
> + tree pg = gimple_call_arg (call, 0);
> + tree op1 = gimple_call_arg (call, 1);
> + tree op2 = gimple_call_arg (call, 2);
> +
> + if (type_suffix (0).integer_p
> + && (pred == PRED_x || is_ptrue (pg, type_suffix (0).element_bytes)))
> + if (tree res = vector_const_binop (code, op1, op2, aarch64_const_binop))
> + return gimple_build_assign (lhs, res);
> +
> + return NULL;
> +}
> +
> function_expander::function_expander (const function_instance &instance,
> tree fndecl, tree call_expr_in,
> rtx possible_target_in)
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h b/gcc/config/aarch64/aarch64-sve-builtins.h
> index 9ab6f202c30..22e9a815039 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins.h
> +++ b/gcc/config/aarch64/aarch64-sve-builtins.h
> @@ -639,6 +639,8 @@ public:
>
> gimple *fold ();
>
> + gimple *fold_const_binary (enum tree_code);
> +
Sorry for being so picky, but could you put this above "fold" instead?
fold is the top-level entry point, so I think it should either come
first or last.
OK for trunk with those changes, thanks.
Richard
> /* Where to insert extra statements that feed the final replacement. */
> gimple_stmt_iterator *gsi;
>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-09-02 12:35 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-08-30 11:41 [PATCH 2/3] SVE intrinsics: Fold constant operands for svdiv Jennifer Schmitz
2024-08-30 12:17 ` Richard Sandiford
2024-09-02 12:05 ` Jennifer Schmitz
2024-09-02 12:35 ` Richard Sandiford
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).