public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 2/3] SVE intrinsics: Fold constant operands for svdiv.
@ 2024-08-30 11:41 Jennifer Schmitz
  2024-08-30 12:17 ` Richard Sandiford
  0 siblings, 1 reply; 4+ messages in thread
From: Jennifer Schmitz @ 2024-08-30 11:41 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Sandiford, Richard Biener, Kyrylo Tkachov


[-- Attachment #1.1: Type: text/plain, Size: 905 bytes --]

This patch implements constant folding for svdiv. If the predicate is
ptrue or predication is _x, it uses vector_const_binop with
aarch64_const_binop as callback and tree_code TRUNC_DIV_EXPR to fold constant
integer operands.
In aarch64_const_binop, a case was added for TRUNC_DIV_EXPR to return 0
for division by 0, as defined in the semantics for svdiv.
Tests were added to check the produced assembly for different
predicates, signed and unsigned integers, and the svdiv_n_* case.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>

gcc/
	* config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl::fold):
	Try constant folding.
	* config/aarch64/aarch64-sve-builtins.cc (aarch64_const_binop):
	Add special case for division by 0.

gcc/testsuite/
	* gcc.target/aarch64/sve/const_fold_div_1.c: New test.

[-- Attachment #1.2: 0002-SVE-intrinsics-Fold-constant-operands-for-svdiv.patch --]
[-- Type: application/octet-stream, Size: 9789 bytes --]

From 92583732da28f6eb4a8db484fa3b24d55a7265e6 Mon Sep 17 00:00:00 2001
From: Jennifer Schmitz <jschmitz@nvidia.com>
Date: Thu, 29 Aug 2024 05:04:51 -0700
Subject: [PATCH 2/3] SVE intrinsics: Fold constant operands for svdiv.

This patch implements constant folding for svdiv. If the predicate is
ptrue or predication is _x, it uses vector_const_binop with
aarch64_const_binop as callback and tree_code TRUNC_DIV_EXPR to fold constant
integer operands.
In aarch64_const_binop, a case was added for TRUNC_DIV_EXPR to return 0
for division by 0, as defined in the semantics for svdiv.
Tests were added to check the produced assembly for different
predicates, signed and unsigned integers, and the svdiv_n_* case.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>

gcc/
	* config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl::fold):
	Try constant folding.
	* config/aarch64/aarch64-sve-builtins.cc (aarch64_const_binop):
	Add special case for division by 0.

gcc/testsuite/
	* gcc.target/aarch64/sve/const_fold_div_1.c: New test.
---
 .../aarch64/aarch64-sve-builtins-base.cc      |  19 +-
 gcc/config/aarch64/aarch64-sve-builtins.cc    |   4 +
 .../gcc.target/aarch64/sve/const_fold_div_1.c | 336 ++++++++++++++++++
 3 files changed, 356 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_1.c

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index d55bee0b72f..617c7fc87e5 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -755,8 +755,21 @@ public:
   gimple *
   fold (gimple_folder &f) const override
   {
-    tree divisor = gimple_call_arg (f.call, 2);
-    tree divisor_cst = uniform_integer_cst_p (divisor);
+    tree pg = gimple_call_arg (f.call, 0);
+    tree op1 = gimple_call_arg (f.call, 1);
+    tree op2 = gimple_call_arg (f.call, 2);
+
+    /* Try to fold constant integer operands.  */
+    if (f.type_suffix (0).integer_p
+	&& (f.pred == PRED_x
+	    || is_ptrue (pg, f.type_suffix (0).element_bytes)))
+      if (tree res = vector_const_binop (TRUNC_DIV_EXPR, op1, op2,
+					 aarch64_const_binop))
+	return gimple_build_assign (f.lhs, res);
+
+    /* If the divisor is a uniform power of 2, fold to a shift
+       instruction.  */
+    tree divisor_cst = uniform_integer_cst_p (op2);
 
     if (!divisor_cst || !integer_pow2p (divisor_cst))
       return NULL;
@@ -770,7 +783,7 @@ public:
 				    shapes::binary_uint_opt_n, MODE_n,
 				    f.type_suffix_ids, GROUP_none, f.pred);
 	call = f.redirect_call (instance);
-	tree d = INTEGRAL_TYPE_P (TREE_TYPE (divisor)) ? divisor : divisor_cst;
+	tree d = INTEGRAL_TYPE_P (TREE_TYPE (op2)) ? op2 : divisor_cst;
 	new_divisor = wide_int_to_tree (TREE_TYPE (d), tree_log2 (d));
       }
     else
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc
index 315d5ac4177..c1b28ebfe4e 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
@@ -3444,6 +3444,10 @@ aarch64_const_binop (enum tree_code code, tree arg1, tree arg2)
       signop sign = TYPE_SIGN (type);
       wi::overflow_type overflow = wi::OVF_NONE;
 
+      /* Return 0 for division by 0.  */
+      if (code == TRUNC_DIV_EXPR && integer_zerop (arg2))
+	return arg2;
+      
       if (!poly_int_binop (poly_res, code, arg1, arg2, sign, &overflow))
         return NULL_TREE;
       return force_fit_type (type, poly_res, false,
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_1.c b/gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_1.c
new file mode 100644
index 00000000000..062fb6e560e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_1.c
@@ -0,0 +1,336 @@
+/* { dg-final { check-function-bodies "**" "" } } */
+/* { dg-options "-O2" } */
+
+#include "arm_sve.h"
+
+/*
+** s64_x_pg:
+**	mov	z[0-9]+\.d, #1
+**	ret
+*/
+svint64_t s64_x_pg (svbool_t pg)
+{
+  return svdiv_x (pg, svdup_s64 (5), svdup_s64 (3));
+}
+
+/*
+** s64_x_pg_0:
+**	mov	z[0-9]+\.b, #0
+**	ret
+*/
+svint64_t s64_x_pg_0 (svbool_t pg)
+{
+  return svdiv_x (pg, svdup_s64 (0), svdup_s64 (3));
+}
+
+/*
+** s64_x_pg_by0:
+**	mov	z[0-9]+\.b, #0
+**	ret
+*/
+svint64_t s64_x_pg_by0 (svbool_t pg)
+{
+  return svdiv_x (pg, svdup_s64 (5), svdup_s64 (0));
+}
+
+/*
+** s64_z_pg:
+**	mov	z[0-9]+\.d, p[0-7]/z, #1
+**	ret
+*/
+svint64_t s64_z_pg (svbool_t pg)
+{
+  return svdiv_z (pg, svdup_s64 (5), svdup_s64 (3));
+}
+
+/*
+** s64_z_pg_0:
+**	mov	z[0-9]+\.d, p[0-7]/z, #0
+**	ret
+*/
+svint64_t s64_z_pg_0 (svbool_t pg)
+{
+  return svdiv_z (pg, svdup_s64 (0), svdup_s64 (3));
+}
+
+/*
+** s64_z_pg_by0:
+**	mov	(z[0-9]+\.d), #5
+**	mov	(z[0-9]+)\.b, #0
+**	sdivr	\2\.d, p[0-7]/m, \2\.d, \1
+**	ret
+*/
+svint64_t s64_z_pg_by0 (svbool_t pg)
+{
+  return svdiv_z (pg, svdup_s64 (5), svdup_s64 (0));
+}
+
+/*
+** s64_m_pg:
+**	mov	(z[0-9]+\.d), #3
+**	mov	(z[0-9]+\.d), #5
+**	sdiv	\2, p[0-7]/m, \2, \1
+**	ret
+*/
+svint64_t s64_m_pg (svbool_t pg)
+{
+  return svdiv_m (pg, svdup_s64 (5), svdup_s64 (3));
+}
+
+/*
+** s64_x_ptrue:
+**	mov	z[0-9]+\.d, #1
+**	ret
+*/
+svint64_t s64_x_ptrue ()
+{
+  return svdiv_x (svptrue_b64 (), svdup_s64 (5), svdup_s64 (3));
+}
+
+/*
+** s64_z_ptrue:
+**	mov	z[0-9]+\.d, #1
+**	ret
+*/
+svint64_t s64_z_ptrue ()
+{
+  return svdiv_z (svptrue_b64 (), svdup_s64 (5), svdup_s64 (3));
+}
+
+/*
+** s64_m_ptrue:
+**	mov	z[0-9]+\.d, #1
+**	ret
+*/
+svint64_t s64_m_ptrue ()
+{
+  return svdiv_m (svptrue_b64 (), svdup_s64 (5), svdup_s64 (3));
+}
+
+/*
+** s64_x_pg_n:
+**	mov	z[0-9]+\.d, #1
+**	ret
+*/
+svint64_t s64_x_pg_n (svbool_t pg)
+{
+  return svdiv_n_s64_x (pg, svdup_s64 (5), 3);
+}
+
+/*
+** s64_x_pg_n_s64_0:
+**	mov	z[0-9]+\.b, #0
+**	ret
+*/
+svint64_t s64_x_pg_n_s64_0 (svbool_t pg)
+{
+  return svdiv_n_s64_x (pg, svdup_s64 (0), 3);
+}
+
+/*
+** s64_x_pg_n_s64_by0:
+**	mov	z[0-9]+\.b, #0
+**	ret
+*/
+svint64_t s64_x_pg_n_s64_by0 (svbool_t pg)
+{
+  return svdiv_n_s64_x (pg, svdup_s64 (5), 0);
+}
+
+/*
+** s64_z_pg_n:
+**	mov	z[0-9]+\.d, p[0-7]/z, #1
+**	ret
+*/
+svint64_t s64_z_pg_n (svbool_t pg)
+{
+  return svdiv_n_s64_z (pg, svdup_s64 (5), 3);
+}
+
+/*
+** s64_z_pg_n_s64_0:
+**	mov	z[0-9]+\.d, p[0-7]/z, #0
+**	ret
+*/
+svint64_t s64_z_pg_n_s64_0 (svbool_t pg)
+{
+  return svdiv_n_s64_z (pg, svdup_s64 (0), 3);
+}
+
+/*
+** s64_z_pg_n_s64_by0:
+**	mov	(z[0-9]+\.d), #5
+**	mov	(z[0-9]+)\.b, #0
+**	sdivr	\2\.d, p[0-7]/m, \2\.d, \1
+**	ret
+*/
+svint64_t s64_z_pg_n_s64_by0 (svbool_t pg)
+{
+  return svdiv_n_s64_z (pg, svdup_s64 (5), 0);
+}
+
+/*
+** s64_m_pg_n:
+**	mov	(z[0-9]+\.d), #3
+**	mov	(z[0-9]+\.d), #5
+**	sdiv	\2, p[0-7]/m, \2, \1
+**	ret
+*/
+svint64_t s64_m_pg_n (svbool_t pg)
+{
+  return svdiv_n_s64_m (pg, svdup_s64 (5), 3);
+}
+
+/*
+** s64_x_ptrue_n:
+**	mov	z[0-9]+\.d, #1
+**	ret
+*/
+svint64_t s64_x_ptrue_n ()
+{
+  return svdiv_n_s64_x (svptrue_b64 (), svdup_s64 (5), 3);
+}
+
+/*
+** s64_z_ptrue_n:
+**	mov	z[0-9]+\.d, #1
+**	ret
+*/
+svint64_t s64_z_ptrue_n ()
+{
+  return svdiv_n_s64_z (svptrue_b64 (), svdup_s64 (5), 3);
+}
+
+/*
+** s64_m_ptrue_n:
+**	mov	z[0-9]+\.d, #1
+**	ret
+*/
+svint64_t s64_m_ptrue_n ()
+{
+  return svdiv_n_s64_m (svptrue_b64 (), svdup_s64 (5), 3);
+}
+
+/*
+** u64_x_pg:
+**	mov	z[0-9]+\.d, #1
+**	ret
+*/
+svuint64_t u64_x_pg (svbool_t pg)
+{
+  return svdiv_x (pg, svdup_u64 (5), svdup_u64 (3));
+}
+
+/*
+** u64_z_pg:
+**	mov	z[0-9]+\.d, p[0-7]/z, #1
+**	ret
+*/
+svuint64_t u64_z_pg (svbool_t pg)
+{
+  return svdiv_z (pg, svdup_u64 (5), svdup_u64 (3));
+}
+
+/*
+** u64_m_pg:
+**	mov	(z[0-9]+\.d), #3
+**	mov	(z[0-9]+\.d), #5
+**	udiv	\2, p[0-7]/m, \2, \1
+**	ret
+*/
+svuint64_t u64_m_pg (svbool_t pg)
+{
+  return svdiv_m (pg, svdup_u64 (5), svdup_u64 (3));
+}
+
+/*
+** u64_x_ptrue:
+**	mov	z[0-9]+\.d, #1
+**	ret
+*/
+svuint64_t u64_x_ptrue ()
+{
+  return svdiv_x (svptrue_b64 (), svdup_u64 (5), svdup_u64 (3));
+}
+
+/*
+** u64_z_ptrue:
+**	mov	z[0-9]+\.d, #1
+**	ret
+*/
+svuint64_t u64_z_ptrue ()
+{
+  return svdiv_z (svptrue_b64 (), svdup_u64 (5), svdup_u64 (3));
+}
+
+/*
+** u64_m_ptrue:
+**	mov	z[0-9]+\.d, #1
+**	ret
+*/
+svuint64_t u64_m_ptrue ()
+{
+  return svdiv_m (svptrue_b64 (), svdup_u64 (5), svdup_u64 (3));
+}
+
+/*
+** u64_x_pg_n:
+**	mov	z[0-9]+\.d, #1
+**	ret
+*/
+svuint64_t u64_x_pg_n (svbool_t pg)
+{
+  return svdiv_n_u64_x (pg, svdup_u64 (5), 3);
+}
+
+/*
+** u64_z_pg_n:
+**	mov	z[0-9]+\.d, p[0-7]/z, #1
+**	ret
+*/
+svuint64_t u64_z_pg_n (svbool_t pg)
+{
+  return svdiv_n_u64_z (pg, svdup_u64 (5), 3);
+}
+
+/*
+** u64_m_pg_n:
+**	mov	(z[0-9]+\.d), #3
+**	mov	(z[0-9]+\.d), #5
+**	udiv	\2, p[0-7]/m, \2, \1
+**	ret
+*/
+svuint64_t u64_m_pg_n (svbool_t pg)
+{
+  return svdiv_n_u64_m (pg, svdup_u64 (5), 3);
+}
+
+/*
+** u64_x_ptrue_n:
+**	mov	z[0-9]+\.d, #1
+**	ret
+*/
+svuint64_t u64_x_ptrue_n ()
+{
+  return svdiv_n_u64_x (svptrue_b64 (), svdup_u64 (5), 3);
+}
+
+/*
+** u64_z_ptrue_n:
+**	mov	z[0-9]+\.d, #1
+**	ret
+*/
+svuint64_t u64_z_ptrue_n ()
+{
+  return svdiv_n_u64_z (svptrue_b64 (), svdup_u64 (5), 3);
+}
+
+/*
+** u64_m_ptrue_n:
+**	mov	z[0-9]+\.d, #1
+**	ret
+*/
+svuint64_t u64_m_ptrue_n ()
+{
+  return svdiv_n_u64_m (svptrue_b64 (), svdup_u64 (5), 3);
+}
-- 
2.44.0


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 4312 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 2/3] SVE intrinsics: Fold constant operands for svdiv.
  2024-08-30 11:41 [PATCH 2/3] SVE intrinsics: Fold constant operands for svdiv Jennifer Schmitz
@ 2024-08-30 12:17 ` Richard Sandiford
  2024-09-02 12:05   ` Jennifer Schmitz
  0 siblings, 1 reply; 4+ messages in thread
From: Richard Sandiford @ 2024-08-30 12:17 UTC (permalink / raw)
  To: Jennifer Schmitz; +Cc: gcc-patches, Richard Biener, Kyrylo Tkachov

Jennifer Schmitz <jschmitz@nvidia.com> writes:
> This patch implements constant folding for svdiv. If the predicate is
> ptrue or predication is _x, it uses vector_const_binop with
> aarch64_const_binop as callback and tree_code TRUNC_DIV_EXPR to fold constant
> integer operands.
> In aarch64_const_binop, a case was added for TRUNC_DIV_EXPR to return 0
> for division by 0, as defined in the semantics for svdiv.
> Tests were added to check the produced assembly for different
> predicates, signed and unsigned integers, and the svdiv_n_* case.
>
> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
> OK for mainline?
>
> Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
>
> gcc/
> 	* config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl::fold):
> 	Try constant folding.
> 	* config/aarch64/aarch64-sve-builtins.cc (aarch64_const_binop):
> 	Add special case for division by 0.
>
> gcc/testsuite/
> 	* gcc.target/aarch64/sve/const_fold_div_1.c: New test.
>
> From 92583732da28f6eb4a8db484fa3b24d55a7265e6 Mon Sep 17 00:00:00 2001
> From: Jennifer Schmitz <jschmitz@nvidia.com>
> Date: Thu, 29 Aug 2024 05:04:51 -0700
> Subject: [PATCH 2/3] SVE intrinsics: Fold constant operands for svdiv.
>
> This patch implements constant folding for svdiv. If the predicate is
> ptrue or predication is _x, it uses vector_const_binop with
> aarch64_const_binop as callback and tree_code TRUNC_DIV_EXPR to fold constant
> integer operands.
> In aarch64_const_binop, a case was added for TRUNC_DIV_EXPR to return 0
> for division by 0, as defined in the semantics for svdiv.
> Tests were added to check the produced assembly for different
> predicates, signed and unsigned integers, and the svdiv_n_* case.
>
> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
> OK for mainline?
>
> Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
>
> gcc/
> 	* config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl::fold):
> 	Try constant folding.
> 	* config/aarch64/aarch64-sve-builtins.cc (aarch64_const_binop):
> 	Add special case for division by 0.
>
> gcc/testsuite/
> 	* gcc.target/aarch64/sve/const_fold_div_1.c: New test.
> ---
>  .../aarch64/aarch64-sve-builtins-base.cc      |  19 +-
>  gcc/config/aarch64/aarch64-sve-builtins.cc    |   4 +
>  .../gcc.target/aarch64/sve/const_fold_div_1.c | 336 ++++++++++++++++++
>  3 files changed, 356 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_1.c
>
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> index d55bee0b72f..617c7fc87e5 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> @@ -755,8 +755,21 @@ public:
>    gimple *
>    fold (gimple_folder &f) const override
>    {
> -    tree divisor = gimple_call_arg (f.call, 2);
> -    tree divisor_cst = uniform_integer_cst_p (divisor);
> +    tree pg = gimple_call_arg (f.call, 0);
> +    tree op1 = gimple_call_arg (f.call, 1);
> +    tree op2 = gimple_call_arg (f.call, 2);
> +
> +    /* Try to fold constant integer operands.  */
> +    if (f.type_suffix (0).integer_p
> +	&& (f.pred == PRED_x
> +	    || is_ptrue (pg, f.type_suffix (0).element_bytes)))
> +      if (tree res = vector_const_binop (TRUNC_DIV_EXPR, op1, op2,
> +					 aarch64_const_binop))
> +	return gimple_build_assign (f.lhs, res);

To reduce cut-&-paste, it'd be good to put this in a helper:

  gimple *gimple_folder::fold_const_binary (tree_code code);

that does the outermost "if" above for "code" rather than TRUNC_DIV_EXPR.
It could return null on failure.  Then the caller can just be:

  if (auto *res = f.fold_const_binary (TRUNC_DIV_EXPR))
    return res;

This could go right at the top of the function, since it doesn't rely
on any of the local variables above.

> +
> +    /* If the divisor is a uniform power of 2, fold to a shift
> +       instruction.  */
> +    tree divisor_cst = uniform_integer_cst_p (op2);
>  
>      if (!divisor_cst || !integer_pow2p (divisor_cst))
>        return NULL;
> @@ -770,7 +783,7 @@ public:
>  				    shapes::binary_uint_opt_n, MODE_n,
>  				    f.type_suffix_ids, GROUP_none, f.pred);
>  	call = f.redirect_call (instance);
> -	tree d = INTEGRAL_TYPE_P (TREE_TYPE (divisor)) ? divisor : divisor_cst;
> +	tree d = INTEGRAL_TYPE_P (TREE_TYPE (op2)) ? op2 : divisor_cst;
>  	new_divisor = wide_int_to_tree (TREE_TYPE (d), tree_log2 (d));
>        }
>      else
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc
> index 315d5ac4177..c1b28ebfe4e 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
> @@ -3444,6 +3444,10 @@ aarch64_const_binop (enum tree_code code, tree arg1, tree arg2)
>        signop sign = TYPE_SIGN (type);
>        wi::overflow_type overflow = wi::OVF_NONE;
>  
> +      /* Return 0 for division by 0.  */
> +      if (code == TRUNC_DIV_EXPR && integer_zerop (arg2))
> +	return arg2;
> +      
>        if (!poly_int_binop (poly_res, code, arg1, arg2, sign, &overflow))
>          return NULL_TREE;
>        return force_fit_type (type, poly_res, false,
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_1.c b/gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_1.c
> new file mode 100644
> index 00000000000..062fb6e560e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_1.c
> @@ -0,0 +1,336 @@
> +/* { dg-final { check-function-bodies "**" "" } } */
> +/* { dg-options "-O2" } */
> +
> +#include "arm_sve.h"
> +
> +/*
> +** s64_x_pg:
> +**	mov	z[0-9]+\.d, #1
> +**	ret
> +*/
> +svint64_t s64_x_pg (svbool_t pg)
> +{
> +  return svdiv_x (pg, svdup_s64 (5), svdup_s64 (3));
> +}
> +
> +/*
> +** s64_x_pg_0:
> +**	mov	z[0-9]+\.b, #0
> +**	ret
> +*/
> +svint64_t s64_x_pg_0 (svbool_t pg)
> +{
> +  return svdiv_x (pg, svdup_s64 (0), svdup_s64 (3));
> +}
> +
> +/*
> +** s64_x_pg_by0:
> +**	mov	z[0-9]+\.b, #0
> +**	ret
> +*/
> +svint64_t s64_x_pg_by0 (svbool_t pg)
> +{
> +  return svdiv_x (pg, svdup_s64 (5), svdup_s64 (0));
> +}
> +
> +/*
> +** s64_z_pg:
> +**	mov	z[0-9]+\.d, p[0-7]/z, #1
> +**	ret
> +*/
> +svint64_t s64_z_pg (svbool_t pg)
> +{
> +  return svdiv_z (pg, svdup_s64 (5), svdup_s64 (3));
> +}
> +
> +/*
> +** s64_z_pg_0:
> +**	mov	z[0-9]+\.d, p[0-7]/z, #0
> +**	ret
> +*/
> +svint64_t s64_z_pg_0 (svbool_t pg)
> +{
> +  return svdiv_z (pg, svdup_s64 (0), svdup_s64 (3));
> +}
> +
> +/*
> +** s64_z_pg_by0:
> +**	mov	(z[0-9]+\.d), #5
> +**	mov	(z[0-9]+)\.b, #0
> +**	sdivr	\2\.d, p[0-7]/m, \2\.d, \1
> +**	ret
> +*/
> +svint64_t s64_z_pg_by0 (svbool_t pg)
> +{
> +  return svdiv_z (pg, svdup_s64 (5), svdup_s64 (0));
> +}
> +
> +/*
> +** s64_m_pg:
> +**	mov	(z[0-9]+\.d), #3
> +**	mov	(z[0-9]+\.d), #5
> +**	sdiv	\2, p[0-7]/m, \2, \1
> +**	ret
> +*/
> +svint64_t s64_m_pg (svbool_t pg)
> +{
> +  return svdiv_m (pg, svdup_s64 (5), svdup_s64 (3));
> +}
> +
> +/*
> +** s64_x_ptrue:
> +**	mov	z[0-9]+\.d, #1
> +**	ret
> +*/
> +svint64_t s64_x_ptrue ()
> +{
> +  return svdiv_x (svptrue_b64 (), svdup_s64 (5), svdup_s64 (3));
> +}
> +
> +/*
> +** s64_z_ptrue:
> +**	mov	z[0-9]+\.d, #1
> +**	ret
> +*/
> +svint64_t s64_z_ptrue ()
> +{
> +  return svdiv_z (svptrue_b64 (), svdup_s64 (5), svdup_s64 (3));
> +}
> +
> +/*
> +** s64_m_ptrue:
> +**	mov	z[0-9]+\.d, #1
> +**	ret
> +*/
> +svint64_t s64_m_ptrue ()
> +{
> +  return svdiv_m (svptrue_b64 (), svdup_s64 (5), svdup_s64 (3));
> +}
> +
> +/*
> +** s64_x_pg_n:
> +**	mov	z[0-9]+\.d, #1
> +**	ret
> +*/
> +svint64_t s64_x_pg_n (svbool_t pg)
> +{
> +  return svdiv_n_s64_x (pg, svdup_s64 (5), 3);
> +}
> +
> +/*
> +** s64_x_pg_n_s64_0:
> +**	mov	z[0-9]+\.b, #0
> +**	ret
> +*/
> +svint64_t s64_x_pg_n_s64_0 (svbool_t pg)
> +{
> +  return svdiv_n_s64_x (pg, svdup_s64 (0), 3);
> +}
> +
> +/*
> +** s64_x_pg_n_s64_by0:
> +**	mov	z[0-9]+\.b, #0
> +**	ret
> +*/
> +svint64_t s64_x_pg_n_s64_by0 (svbool_t pg)
> +{
> +  return svdiv_n_s64_x (pg, svdup_s64 (5), 0);
> +}
> +
> +/*
> +** s64_z_pg_n:
> +**	mov	z[0-9]+\.d, p[0-7]/z, #1
> +**	ret
> +*/
> +svint64_t s64_z_pg_n (svbool_t pg)
> +{
> +  return svdiv_n_s64_z (pg, svdup_s64 (5), 3);
> +}
> +
> +/*
> +** s64_z_pg_n_s64_0:
> +**	mov	z[0-9]+\.d, p[0-7]/z, #0
> +**	ret
> +*/
> +svint64_t s64_z_pg_n_s64_0 (svbool_t pg)
> +{
> +  return svdiv_n_s64_z (pg, svdup_s64 (0), 3);
> +}
> +
> +/*
> +** s64_z_pg_n_s64_by0:
> +**	mov	(z[0-9]+\.d), #5
> +**	mov	(z[0-9]+)\.b, #0
> +**	sdivr	\2\.d, p[0-7]/m, \2\.d, \1
> +**	ret
> +*/
> +svint64_t s64_z_pg_n_s64_by0 (svbool_t pg)
> +{
> +  return svdiv_n_s64_z (pg, svdup_s64 (5), 0);
> +}
> +
> +/*
> +** s64_m_pg_n:
> +**	mov	(z[0-9]+\.d), #3
> +**	mov	(z[0-9]+\.d), #5
> +**	sdiv	\2, p[0-7]/m, \2, \1
> +**	ret
> +*/
> +svint64_t s64_m_pg_n (svbool_t pg)
> +{
> +  return svdiv_n_s64_m (pg, svdup_s64 (5), 3);
> +}
> +
> +/*
> +** s64_x_ptrue_n:
> +**	mov	z[0-9]+\.d, #1
> +**	ret
> +*/
> +svint64_t s64_x_ptrue_n ()
> +{
> +  return svdiv_n_s64_x (svptrue_b64 (), svdup_s64 (5), 3);
> +}
> +
> +/*
> +** s64_z_ptrue_n:
> +**	mov	z[0-9]+\.d, #1
> +**	ret
> +*/
> +svint64_t s64_z_ptrue_n ()
> +{
> +  return svdiv_n_s64_z (svptrue_b64 (), svdup_s64 (5), 3);
> +}
> +
> +/*
> +** s64_m_ptrue_n:
> +**	mov	z[0-9]+\.d, #1
> +**	ret
> +*/
> +svint64_t s64_m_ptrue_n ()
> +{
> +  return svdiv_n_s64_m (svptrue_b64 (), svdup_s64 (5), 3);
> +}
> +
> +/*
> +** u64_x_pg:
> +**	mov	z[0-9]+\.d, #1
> +**	ret
> +*/
> +svuint64_t u64_x_pg (svbool_t pg)
> +{
> +  return svdiv_x (pg, svdup_u64 (5), svdup_u64 (3));
> +}
> +
> +/*
> +** u64_z_pg:
> +**	mov	z[0-9]+\.d, p[0-7]/z, #1
> +**	ret
> +*/
> +svuint64_t u64_z_pg (svbool_t pg)
> +{
> +  return svdiv_z (pg, svdup_u64 (5), svdup_u64 (3));
> +}
> +
> +/*
> +** u64_m_pg:
> +**	mov	(z[0-9]+\.d), #3
> +**	mov	(z[0-9]+\.d), #5
> +**	udiv	\2, p[0-7]/m, \2, \1
> +**	ret
> +*/
> +svuint64_t u64_m_pg (svbool_t pg)
> +{
> +  return svdiv_m (pg, svdup_u64 (5), svdup_u64 (3));
> +}
> +
> +/*
> +** u64_x_ptrue:
> +**	mov	z[0-9]+\.d, #1
> +**	ret
> +*/
> +svuint64_t u64_x_ptrue ()
> +{
> +  return svdiv_x (svptrue_b64 (), svdup_u64 (5), svdup_u64 (3));
> +}
> +
> +/*
> +** u64_z_ptrue:
> +**	mov	z[0-9]+\.d, #1
> +**	ret
> +*/
> +svuint64_t u64_z_ptrue ()
> +{
> +  return svdiv_z (svptrue_b64 (), svdup_u64 (5), svdup_u64 (3));
> +}
> +
> +/*
> +** u64_m_ptrue:
> +**	mov	z[0-9]+\.d, #1
> +**	ret
> +*/
> +svuint64_t u64_m_ptrue ()
> +{
> +  return svdiv_m (svptrue_b64 (), svdup_u64 (5), svdup_u64 (3));
> +}
> +
> +/*
> +** u64_x_pg_n:
> +**	mov	z[0-9]+\.d, #1
> +**	ret
> +*/
> +svuint64_t u64_x_pg_n (svbool_t pg)
> +{
> +  return svdiv_n_u64_x (pg, svdup_u64 (5), 3);
> +}
> +
> +/*
> +** u64_z_pg_n:
> +**	mov	z[0-9]+\.d, p[0-7]/z, #1
> +**	ret
> +*/
> +svuint64_t u64_z_pg_n (svbool_t pg)
> +{
> +  return svdiv_n_u64_z (pg, svdup_u64 (5), 3);
> +}
> +
> +/*
> +** u64_m_pg_n:
> +**	mov	(z[0-9]+\.d), #3
> +**	mov	(z[0-9]+\.d), #5
> +**	udiv	\2, p[0-7]/m, \2, \1
> +**	ret
> +*/
> +svuint64_t u64_m_pg_n (svbool_t pg)
> +{
> +  return svdiv_n_u64_m (pg, svdup_u64 (5), 3);
> +}
> +
> +/*
> +** u64_x_ptrue_n:
> +**	mov	z[0-9]+\.d, #1
> +**	ret
> +*/
> +svuint64_t u64_x_ptrue_n ()
> +{
> +  return svdiv_n_u64_x (svptrue_b64 (), svdup_u64 (5), 3);
> +}
> +
> +/*
> +** u64_z_ptrue_n:
> +**	mov	z[0-9]+\.d, #1
> +**	ret
> +*/
> +svuint64_t u64_z_ptrue_n ()
> +{
> +  return svdiv_n_u64_z (svptrue_b64 (), svdup_u64 (5), 3);
> +}
> +
> +/*
> +** u64_m_ptrue_n:
> +**	mov	z[0-9]+\.d, #1
> +**	ret
> +*/
> +svuint64_t u64_m_ptrue_n ()
> +{
> +  return svdiv_n_u64_m (svptrue_b64 (), svdup_u64 (5), 3);
> +}

These are good tests, but maybe we could throw in a small number
of svdupq tests as well, to test for non-uniform cases.  E.g.:

  svdiv_s32_m (svptrue_b32 (), svdupq_s32 (3, 0, -5, 11), svdupq_s32 (4, 1, -6, 0));

which hopefully should get optimised to zero.

Similarly:

  svdiv_s32_z (svptrue_b32 (), svdupq_s32 (6, -30, 100, -4), svdupq_s32 (-3, 15, -50, 2));

should get optimised to -2.

Looks good to me otherwise.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 2/3] SVE intrinsics: Fold constant operands for svdiv.
  2024-08-30 12:17 ` Richard Sandiford
@ 2024-09-02 12:05   ` Jennifer Schmitz
  2024-09-02 12:35     ` Richard Sandiford
  0 siblings, 1 reply; 4+ messages in thread
From: Jennifer Schmitz @ 2024-09-02 12:05 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches, Richard Biener, Kyrylo Tkachov


[-- Attachment #1.1: Type: text/plain, Size: 13774 bytes --]



> On 30 Aug 2024, at 14:17, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Jennifer Schmitz <jschmitz@nvidia.com> writes:
>> This patch implements constant folding for svdiv. If the predicate is
>> ptrue or predication is _x, it uses vector_const_binop with
>> aarch64_const_binop as callback and tree_code TRUNC_DIV_EXPR to fold constant
>> integer operands.
>> In aarch64_const_binop, a case was added for TRUNC_DIV_EXPR to return 0
>> for division by 0, as defined in the semantics for svdiv.
>> Tests were added to check the produced assembly for different
>> predicates, signed and unsigned integers, and the svdiv_n_* case.
>> 
>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
>> OK for mainline?
>> 
>> Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
>> 
>> gcc/
>>      * config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl::fold):
>>      Try constant folding.
>>      * config/aarch64/aarch64-sve-builtins.cc (aarch64_const_binop):
>>      Add special case for division by 0.
>> 
>> gcc/testsuite/
>>      * gcc.target/aarch64/sve/const_fold_div_1.c: New test.
>> 
>> From 92583732da28f6eb4a8db484fa3b24d55a7265e6 Mon Sep 17 00:00:00 2001
>> From: Jennifer Schmitz <jschmitz@nvidia.com>
>> Date: Thu, 29 Aug 2024 05:04:51 -0700
>> Subject: [PATCH 2/3] SVE intrinsics: Fold constant operands for svdiv.
>> 
>> This patch implements constant folding for svdiv. If the predicate is
>> ptrue or predication is _x, it uses vector_const_binop with
>> aarch64_const_binop as callback and tree_code TRUNC_DIV_EXPR to fold constant
>> integer operands.
>> In aarch64_const_binop, a case was added for TRUNC_DIV_EXPR to return 0
>> for division by 0, as defined in the semantics for svdiv.
>> Tests were added to check the produced assembly for different
>> predicates, signed and unsigned integers, and the svdiv_n_* case.
>> 
>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
>> OK for mainline?
>> 
>> Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
>> 
>> gcc/
>>      * config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl::fold):
>>      Try constant folding.
>>      * config/aarch64/aarch64-sve-builtins.cc (aarch64_const_binop):
>>      Add special case for division by 0.
>> 
>> gcc/testsuite/
>>      * gcc.target/aarch64/sve/const_fold_div_1.c: New test.
>> ---
>> .../aarch64/aarch64-sve-builtins-base.cc      |  19 +-
>> gcc/config/aarch64/aarch64-sve-builtins.cc    |   4 +
>> .../gcc.target/aarch64/sve/const_fold_div_1.c | 336 ++++++++++++++++++
>> 3 files changed, 356 insertions(+), 3 deletions(-)
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_1.c
>> 
>> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.ccb/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>> index d55bee0b72f..617c7fc87e5 100644
>> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>> @@ -755,8 +755,21 @@ public:
>>   gimple *
>>   fold (gimple_folder &f) const override
>>   {
>> -    tree divisor = gimple_call_arg (f.call, 2);
>> -    tree divisor_cst = uniform_integer_cst_p (divisor);
>> +    tree pg = gimple_call_arg (f.call, 0);
>> +    tree op1 = gimple_call_arg (f.call, 1);
>> +    tree op2 = gimple_call_arg (f.call, 2);
>> +
>> +    /* Try to fold constant integer operands.  */
>> +    if (f.type_suffix (0).integer_p
>> +     && (f.pred == PRED_x
>> +         || is_ptrue (pg, f.type_suffix (0).element_bytes)))
>> +      if (tree res = vector_const_binop (TRUNC_DIV_EXPR, op1, op2,
>> +                                      aarch64_const_binop))
>> +     return gimple_build_assign (f.lhs, res);
> 
> To reduce cut-&-paste, it'd be good to put this in a helper:
> 
>  gimple *gimple_folder::fold_const_binary (tree_code code);
> 
> that does the outermost "if" above for "code" rather than TRUNC_DIV_EXPR.
> It could return null on failure.  Then the caller can just be:
> 
>  if (auto *res = f.fold_const_binary (TRUNC_DIV_EXPR))
>    return res;
> 
> This could go right at the top of the function, since it doesn't rely
> on any of the local variables above.
Done.
> 
>> +
>> +    /* If the divisor is a uniform power of 2, fold to a shift
>> +       instruction.  */
>> +    tree divisor_cst = uniform_integer_cst_p (op2);
>> 
>>     if (!divisor_cst || !integer_pow2p (divisor_cst))
>>       return NULL;
>> @@ -770,7 +783,7 @@ public:
>>                                  shapes::binary_uint_opt_n, MODE_n,
>>                                  f.type_suffix_ids, GROUP_none, f.pred);
>>      call = f.redirect_call (instance);
>> -     tree d = INTEGRAL_TYPE_P (TREE_TYPE (divisor)) ? divisor : divisor_cst;
>> +     tree d = INTEGRAL_TYPE_P (TREE_TYPE (op2)) ? op2 : divisor_cst;
>>      new_divisor = wide_int_to_tree (TREE_TYPE (d), tree_log2 (d));
>>       }
>>     else
>> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc
>> index 315d5ac4177..c1b28ebfe4e 100644
>> --- a/gcc/config/aarch64/aarch64-sve-builtins.cc
>> +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
>> @@ -3444,6 +3444,10 @@ aarch64_const_binop (enum tree_code code, tree arg1, tree arg2)
>>       signop sign = TYPE_SIGN (type);
>>       wi::overflow_type overflow = wi::OVF_NONE;
>> 
>> +      /* Return 0 for division by 0.  */
>> +      if (code == TRUNC_DIV_EXPR && integer_zerop (arg2))
>> +     return arg2;
>> +
>>       if (!poly_int_binop (poly_res, code, arg1, arg2, sign, &overflow))
>>         return NULL_TREE;
>>       return force_fit_type (type, poly_res, false,
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_1.c b/gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_1.c
>> new file mode 100644
>> index 00000000000..062fb6e560e
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_1.c
>> @@ -0,0 +1,336 @@
>> +/* { dg-final { check-function-bodies "**" "" } } */
>> +/* { dg-options "-O2" } */
>> +
>> +#include "arm_sve.h"
>> +
>> +/*
>> +** s64_x_pg:
>> +**   mov     z[0-9]+\.d, #1
>> +**   ret
>> +*/
>> +svint64_t s64_x_pg (svbool_t pg)
>> +{
>> +  return svdiv_x (pg, svdup_s64 (5), svdup_s64 (3));
>> +}
>> +
>> +/*
>> +** s64_x_pg_0:
>> +**   mov     z[0-9]+\.b, #0
>> +**   ret
>> +*/
>> +svint64_t s64_x_pg_0 (svbool_t pg)
>> +{
>> +  return svdiv_x (pg, svdup_s64 (0), svdup_s64 (3));
>> +}
>> +
>> +/*
>> +** s64_x_pg_by0:
>> +**   mov     z[0-9]+\.b, #0
>> +**   ret
>> +*/
>> +svint64_t s64_x_pg_by0 (svbool_t pg)
>> +{
>> +  return svdiv_x (pg, svdup_s64 (5), svdup_s64 (0));
>> +}
>> +
>> +/*
>> +** s64_z_pg:
>> +**   mov     z[0-9]+\.d, p[0-7]/z, #1
>> +**   ret
>> +*/
>> +svint64_t s64_z_pg (svbool_t pg)
>> +{
>> +  return svdiv_z (pg, svdup_s64 (5), svdup_s64 (3));
>> +}
>> +
>> +/*
>> +** s64_z_pg_0:
>> +**   mov     z[0-9]+\.d, p[0-7]/z, #0
>> +**   ret
>> +*/
>> +svint64_t s64_z_pg_0 (svbool_t pg)
>> +{
>> +  return svdiv_z (pg, svdup_s64 (0), svdup_s64 (3));
>> +}
>> +
>> +/*
>> +** s64_z_pg_by0:
>> +**   mov     (z[0-9]+\.d), #5
>> +**   mov     (z[0-9]+)\.b, #0
>> +**   sdivr   \2\.d, p[0-7]/m, \2\.d, \1
>> +**   ret
>> +*/
>> +svint64_t s64_z_pg_by0 (svbool_t pg)
>> +{
>> +  return svdiv_z (pg, svdup_s64 (5), svdup_s64 (0));
>> +}
>> +
>> +/*
>> +** s64_m_pg:
>> +**   mov     (z[0-9]+\.d), #3
>> +**   mov     (z[0-9]+\.d), #5
>> +**   sdiv    \2, p[0-7]/m, \2, \1
>> +**   ret
>> +*/
>> +svint64_t s64_m_pg (svbool_t pg)
>> +{
>> +  return svdiv_m (pg, svdup_s64 (5), svdup_s64 (3));
>> +}
>> +
>> +/*
>> +** s64_x_ptrue:
>> +**   mov     z[0-9]+\.d, #1
>> +**   ret
>> +*/
>> +svint64_t s64_x_ptrue ()
>> +{
>> +  return svdiv_x (svptrue_b64 (), svdup_s64 (5), svdup_s64 (3));
>> +}
>> +
>> +/*
>> +** s64_z_ptrue:
>> +**   mov     z[0-9]+\.d, #1
>> +**   ret
>> +*/
>> +svint64_t s64_z_ptrue ()
>> +{
>> +  return svdiv_z (svptrue_b64 (), svdup_s64 (5), svdup_s64 (3));
>> +}
>> +
>> +/*
>> +** s64_m_ptrue:
>> +**   mov     z[0-9]+\.d, #1
>> +**   ret
>> +*/
>> +svint64_t s64_m_ptrue ()
>> +{
>> +  return svdiv_m (svptrue_b64 (), svdup_s64 (5), svdup_s64 (3));
>> +}
>> +
>> +/*
>> +** s64_x_pg_n:
>> +**   mov     z[0-9]+\.d, #1
>> +**   ret
>> +*/
>> +svint64_t s64_x_pg_n (svbool_t pg)
>> +{
>> +  return svdiv_n_s64_x (pg, svdup_s64 (5), 3);
>> +}
>> +
>> +/*
>> +** s64_x_pg_n_s64_0:
>> +**   mov     z[0-9]+\.b, #0
>> +**   ret
>> +*/
>> +svint64_t s64_x_pg_n_s64_0 (svbool_t pg)
>> +{
>> +  return svdiv_n_s64_x (pg, svdup_s64 (0), 3);
>> +}
>> +
>> +/*
>> +** s64_x_pg_n_s64_by0:
>> +**   mov     z[0-9]+\.b, #0
>> +**   ret
>> +*/
>> +svint64_t s64_x_pg_n_s64_by0 (svbool_t pg)
>> +{
>> +  return svdiv_n_s64_x (pg, svdup_s64 (5), 0);
>> +}
>> +
>> +/*
>> +** s64_z_pg_n:
>> +**   mov     z[0-9]+\.d, p[0-7]/z, #1
>> +**   ret
>> +*/
>> +svint64_t s64_z_pg_n (svbool_t pg)
>> +{
>> +  return svdiv_n_s64_z (pg, svdup_s64 (5), 3);
>> +}
>> +
>> +/*
>> +** s64_z_pg_n_s64_0:
>> +**   mov     z[0-9]+\.d, p[0-7]/z, #0
>> +**   ret
>> +*/
>> +svint64_t s64_z_pg_n_s64_0 (svbool_t pg)
>> +{
>> +  return svdiv_n_s64_z (pg, svdup_s64 (0), 3);
>> +}
>> +
>> +/*
>> +** s64_z_pg_n_s64_by0:
>> +**   mov     (z[0-9]+\.d), #5
>> +**   mov     (z[0-9]+)\.b, #0
>> +**   sdivr   \2\.d, p[0-7]/m, \2\.d, \1
>> +**   ret
>> +*/
>> +svint64_t s64_z_pg_n_s64_by0 (svbool_t pg)
>> +{
>> +  return svdiv_n_s64_z (pg, svdup_s64 (5), 0);
>> +}
>> +
>> +/*
>> +** s64_m_pg_n:
>> +**   mov     (z[0-9]+\.d), #3
>> +**   mov     (z[0-9]+\.d), #5
>> +**   sdiv    \2, p[0-7]/m, \2, \1
>> +**   ret
>> +*/
>> +svint64_t s64_m_pg_n (svbool_t pg)
>> +{
>> +  return svdiv_n_s64_m (pg, svdup_s64 (5), 3);
>> +}
>> +
>> +/*
>> +** s64_x_ptrue_n:
>> +**   mov     z[0-9]+\.d, #1
>> +**   ret
>> +*/
>> +svint64_t s64_x_ptrue_n ()
>> +{
>> +  return svdiv_n_s64_x (svptrue_b64 (), svdup_s64 (5), 3);
>> +}
>> +
>> +/*
>> +** s64_z_ptrue_n:
>> +**   mov     z[0-9]+\.d, #1
>> +**   ret
>> +*/
>> +svint64_t s64_z_ptrue_n ()
>> +{
>> +  return svdiv_n_s64_z (svptrue_b64 (), svdup_s64 (5), 3);
>> +}
>> +
>> +/*
>> +** s64_m_ptrue_n:
>> +**   mov     z[0-9]+\.d, #1
>> +**   ret
>> +*/
>> +svint64_t s64_m_ptrue_n ()
>> +{
>> +  return svdiv_n_s64_m (svptrue_b64 (), svdup_s64 (5), 3);
>> +}
>> +
>> +/*
>> +** u64_x_pg:
>> +**   mov     z[0-9]+\.d, #1
>> +**   ret
>> +*/
>> +svuint64_t u64_x_pg (svbool_t pg)
>> +{
>> +  return svdiv_x (pg, svdup_u64 (5), svdup_u64 (3));
>> +}
>> +
>> +/*
>> +** u64_z_pg:
>> +**   mov     z[0-9]+\.d, p[0-7]/z, #1
>> +**   ret
>> +*/
>> +svuint64_t u64_z_pg (svbool_t pg)
>> +{
>> +  return svdiv_z (pg, svdup_u64 (5), svdup_u64 (3));
>> +}
>> +
>> +/*
>> +** u64_m_pg:
>> +**   mov     (z[0-9]+\.d), #3
>> +**   mov     (z[0-9]+\.d), #5
>> +**   udiv    \2, p[0-7]/m, \2, \1
>> +**   ret
>> +*/
>> +svuint64_t u64_m_pg (svbool_t pg)
>> +{
>> +  return svdiv_m (pg, svdup_u64 (5), svdup_u64 (3));
>> +}
>> +
>> +/*
>> +** u64_x_ptrue:
>> +**   mov     z[0-9]+\.d, #1
>> +**   ret
>> +*/
>> +svuint64_t u64_x_ptrue ()
>> +{
>> +  return svdiv_x (svptrue_b64 (), svdup_u64 (5), svdup_u64 (3));
>> +}
>> +
>> +/*
>> +** u64_z_ptrue:
>> +**   mov     z[0-9]+\.d, #1
>> +**   ret
>> +*/
>> +svuint64_t u64_z_ptrue ()
>> +{
>> +  return svdiv_z (svptrue_b64 (), svdup_u64 (5), svdup_u64 (3));
>> +}
>> +
>> +/*
>> +** u64_m_ptrue:
>> +**   mov     z[0-9]+\.d, #1
>> +**   ret
>> +*/
>> +svuint64_t u64_m_ptrue ()
>> +{
>> +  return svdiv_m (svptrue_b64 (), svdup_u64 (5), svdup_u64 (3));
>> +}
>> +
>> +/*
>> +** u64_x_pg_n:
>> +**   mov     z[0-9]+\.d, #1
>> +**   ret
>> +*/
>> +svuint64_t u64_x_pg_n (svbool_t pg)
>> +{
>> +  return svdiv_n_u64_x (pg, svdup_u64 (5), 3);
>> +}
>> +
>> +/*
>> +** u64_z_pg_n:
>> +**   mov     z[0-9]+\.d, p[0-7]/z, #1
>> +**   ret
>> +*/
>> +svuint64_t u64_z_pg_n (svbool_t pg)
>> +{
>> +  return svdiv_n_u64_z (pg, svdup_u64 (5), 3);
>> +}
>> +
>> +/*
>> +** u64_m_pg_n:
>> +**   mov     (z[0-9]+\.d), #3
>> +**   mov     (z[0-9]+\.d), #5
>> +**   udiv    \2, p[0-7]/m, \2, \1
>> +**   ret
>> +*/
>> +svuint64_t u64_m_pg_n (svbool_t pg)
>> +{
>> +  return svdiv_n_u64_m (pg, svdup_u64 (5), 3);
>> +}
>> +
>> +/*
>> +** u64_x_ptrue_n:
>> +**   mov     z[0-9]+\.d, #1
>> +**   ret
>> +*/
>> +svuint64_t u64_x_ptrue_n ()
>> +{
>> +  return svdiv_n_u64_x (svptrue_b64 (), svdup_u64 (5), 3);
>> +}
>> +
>> +/*
>> +** u64_z_ptrue_n:
>> +**   mov     z[0-9]+\.d, #1
>> +**   ret
>> +*/
>> +svuint64_t u64_z_ptrue_n ()
>> +{
>> +  return svdiv_n_u64_z (svptrue_b64 (), svdup_u64 (5), 3);
>> +}
>> +
>> +/*
>> +** u64_m_ptrue_n:
>> +**   mov     z[0-9]+\.d, #1
>> +**   ret
>> +*/
>> +svuint64_t u64_m_ptrue_n ()
>> +{
>> +  return svdiv_n_u64_m (svptrue_b64 (), svdup_u64 (5), 3);
>> +}
> 
> These are good tests, but maybe we could throw in a small number
> of svdupq tests as well, to test for non-uniform cases.  E.g.:
> 
>  svdiv_s32_m (svptrue_b32 (), svdupq_s32 (3, 0, -5, 11), svdupq_s32 (4, 1, -6, 0));
> 
> which hopefully should get optimised to zero.
> 
> Similarly:
> 
>  svdiv_s32_z (svptrue_b32 (), svdupq_s32 (6, -30, 100, -4), svdupq_s32 (-3, 15, -50, 2));
> 
> should get optimised to -2.
> 
> Looks good to me otherwise.
> 
> Thanks,
> Richard
Thanks for suggesting these tests, I added them to the test file.
Best, Jennifer

[-- Attachment #1.2: 0002-SVE-intrinsics-Fold-constant-operands-for-svdiv.patch --]
[-- Type: application/octet-stream, Size: 12233 bytes --]

From d606d237f48ba60d0f82f125bfdd38e7e2a403e4 Mon Sep 17 00:00:00 2001
From: Jennifer Schmitz <jschmitz@nvidia.com>
Date: Fri, 30 Aug 2024 07:03:49 -0700
Subject: [PATCH 2/3] SVE intrinsics: Fold constant operands for svdiv.

This patch implements constant folding for svdiv:
The new function aarch64_const_binop was created, which - in contrast to
int_const_binop - does not treat operations as overflowing. This function is
passed as callback to vector_const_binop from the new gimple_folder
method fold_const_binary, if the predicate is ptrue or predication is _x.
From svdiv_impl::fold, fold_const_binary is called with TRUNC_DIV_EXPR as
tree_code.
In aarch64_const_binop, a case was added for TRUNC_DIV_EXPR to return 0
for division by 0, as defined in the semantics for svdiv.
Tests were added to check the produced assembly for different
predicates, signed and unsigned integers, and the svdiv_n_* case.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>

gcc/
	* config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl::fold):
	Try constant folding.
	* config/aarch64/aarch64-sve-builtins.h: Declare
	gimple_folder::fold_const_binary.
	* config/aarch64/aarch64-sve-builtins.cc (aarch64_const_binop):
	New function to fold binary SVE intrinsics without overflow.
	(gimple_folder::fold_const_binary): New helper function for
	constant folding of SVE intrinsics.

gcc/testsuite/
	* gcc.target/aarch64/sve/const_fold_div_1.c: New test.
---
 .../aarch64/aarch64-sve-builtins-base.cc      |  11 +-
 gcc/config/aarch64/aarch64-sve-builtins.cc    |  41 ++
 gcc/config/aarch64/aarch64-sve-builtins.h     |   2 +
 .../gcc.target/aarch64/sve/const_fold_div_1.c | 358 ++++++++++++++++++
 4 files changed, 409 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_1.c

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index d55bee0b72f..6c94d144dc9 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -755,8 +755,13 @@ public:
   gimple *
   fold (gimple_folder &f) const override
   {
-    tree divisor = gimple_call_arg (f.call, 2);
-    tree divisor_cst = uniform_integer_cst_p (divisor);
+    if (auto *res = f.fold_const_binary (TRUNC_DIV_EXPR))
+      return res;
+
+    /* If the divisor is a uniform power of 2, fold to a shift
+       instruction.  */
+    tree op2 = gimple_call_arg (f.call, 2);
+    tree divisor_cst = uniform_integer_cst_p (op2);
 
     if (!divisor_cst || !integer_pow2p (divisor_cst))
       return NULL;
@@ -770,7 +775,7 @@ public:
 				    shapes::binary_uint_opt_n, MODE_n,
 				    f.type_suffix_ids, GROUP_none, f.pred);
 	call = f.redirect_call (instance);
-	tree d = INTEGRAL_TYPE_P (TREE_TYPE (divisor)) ? divisor : divisor_cst;
+	tree d = INTEGRAL_TYPE_P (TREE_TYPE (op2)) ? op2 : divisor_cst;
 	new_divisor = wide_int_to_tree (TREE_TYPE (d), tree_log2 (d));
       }
     else
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc
index 5ca9ec32b69..60350e08372 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
@@ -1132,6 +1132,30 @@ report_not_enum (location_t location, tree fndecl, unsigned int argno,
 	    " a valid %qT value", actual, argno + 1, fndecl, enumtype);
 }
 
+/* Try to fold constant arguments arg1 and arg2 using the given tree_code.
+   Operations are not treated as overflowing.  */
+static tree
+aarch64_const_binop (enum tree_code code, tree arg1, tree arg2)
+{
+  if (poly_int_tree_p (arg1) && poly_int_tree_p (arg2))
+    {
+      poly_wide_int poly_res;
+      tree type = TREE_TYPE (arg1);
+      signop sign = TYPE_SIGN (type);
+      wi::overflow_type overflow = wi::OVF_NONE;
+
+      /* Return 0 for division by 0.  */
+      if (code == TRUNC_DIV_EXPR && integer_zerop (arg2))
+	return arg2;
+
+      if (!poly_int_binop (poly_res, code, arg1, arg2, sign, &overflow))
+	return NULL_TREE;
+      return force_fit_type (type, poly_res, false,
+			     TREE_OVERFLOW (arg1) | TREE_OVERFLOW (arg2));
+    }
+  return NULL_TREE;
+}
+
 /* Return a hash code for a function_instance.  */
 hashval_t
 function_instance::hash () const
@@ -3616,6 +3640,23 @@ gimple_folder::fold ()
   return base->fold (*this);
 }
 
+/* Try to fold constant integer operands.  */
+gimple *
+gimple_folder::fold_const_binary (enum tree_code code)
+{
+  gcc_assert (gimple_call_num_args (call) == 3);
+  tree pg = gimple_call_arg (call, 0);
+  tree op1 = gimple_call_arg (call, 1);
+  tree op2 = gimple_call_arg (call, 2);
+
+  if (type_suffix (0).integer_p
+      && (pred == PRED_x || is_ptrue (pg, type_suffix (0).element_bytes)))
+    if (tree res = vector_const_binop (code, op1, op2, aarch64_const_binop))
+      return gimple_build_assign (lhs, res);
+
+  return NULL;
+}
+
 function_expander::function_expander (const function_instance &instance,
 				      tree fndecl, tree call_expr_in,
 				      rtx possible_target_in)
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h b/gcc/config/aarch64/aarch64-sve-builtins.h
index 9ab6f202c30..22e9a815039 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.h
+++ b/gcc/config/aarch64/aarch64-sve-builtins.h
@@ -639,6 +639,8 @@ public:
 
   gimple *fold ();
 
+  gimple *fold_const_binary (enum tree_code);
+
   /* Where to insert extra statements that feed the final replacement.  */
   gimple_stmt_iterator *gsi;
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_1.c b/gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_1.c
new file mode 100644
index 00000000000..c15b3fc3aa0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_1.c
@@ -0,0 +1,358 @@
+/* { dg-final { check-function-bodies "**" "" } } */
+/* { dg-options "-O2" } */
+
+#include "arm_sve.h"
+
+/*
+** s64_x_pg:
+**	mov	z[0-9]+\.d, #1
+**	ret
+*/
+svint64_t s64_x_pg (svbool_t pg)
+{
+  return svdiv_x (pg, svdup_s64 (5), svdup_s64 (3));
+}
+
+/*
+** s64_x_pg_0:
+**	mov	z[0-9]+\.b, #0
+**	ret
+*/
+svint64_t s64_x_pg_0 (svbool_t pg)
+{
+  return svdiv_x (pg, svdup_s64 (0), svdup_s64 (3));
+}
+
+/*
+** s64_x_pg_by0:
+**	mov	z[0-9]+\.b, #0
+**	ret
+*/
+svint64_t s64_x_pg_by0 (svbool_t pg)
+{
+  return svdiv_x (pg, svdup_s64 (5), svdup_s64 (0));
+}
+
+/*
+** s64_z_pg:
+**	mov	z[0-9]+\.d, p[0-7]/z, #1
+**	ret
+*/
+svint64_t s64_z_pg (svbool_t pg)
+{
+  return svdiv_z (pg, svdup_s64 (5), svdup_s64 (3));
+}
+
+/*
+** s64_z_pg_0:
+**	mov	z[0-9]+\.d, p[0-7]/z, #0
+**	ret
+*/
+svint64_t s64_z_pg_0 (svbool_t pg)
+{
+  return svdiv_z (pg, svdup_s64 (0), svdup_s64 (3));
+}
+
+/*
+** s64_z_pg_by0:
+**	mov	(z[0-9]+\.d), #5
+**	mov	(z[0-9]+)\.b, #0
+**	sdivr	\2\.d, p[0-7]/m, \2\.d, \1
+**	ret
+*/
+svint64_t s64_z_pg_by0 (svbool_t pg)
+{
+  return svdiv_z (pg, svdup_s64 (5), svdup_s64 (0));
+}
+
+/*
+** s64_m_pg:
+**	mov	(z[0-9]+\.d), #3
+**	mov	(z[0-9]+\.d), #5
+**	sdiv	\2, p[0-7]/m, \2, \1
+**	ret
+*/
+svint64_t s64_m_pg (svbool_t pg)
+{
+  return svdiv_m (pg, svdup_s64 (5), svdup_s64 (3));
+}
+
+/*
+** s64_x_ptrue:
+**	mov	z[0-9]+\.d, #1
+**	ret
+*/
+svint64_t s64_x_ptrue ()
+{
+  return svdiv_x (svptrue_b64 (), svdup_s64 (5), svdup_s64 (3));
+}
+
+/*
+** s64_z_ptrue:
+**	mov	z[0-9]+\.d, #1
+**	ret
+*/
+svint64_t s64_z_ptrue ()
+{
+  return svdiv_z (svptrue_b64 (), svdup_s64 (5), svdup_s64 (3));
+}
+
+/*
+** s64_m_ptrue:
+**	mov	z[0-9]+\.d, #1
+**	ret
+*/
+svint64_t s64_m_ptrue ()
+{
+  return svdiv_m (svptrue_b64 (), svdup_s64 (5), svdup_s64 (3));
+}
+
+/*
+** s64_x_pg_n:
+**	mov	z[0-9]+\.d, #1
+**	ret
+*/
+svint64_t s64_x_pg_n (svbool_t pg)
+{
+  return svdiv_n_s64_x (pg, svdup_s64 (5), 3);
+}
+
+/*
+** s64_x_pg_n_s64_0:
+**	mov	z[0-9]+\.b, #0
+**	ret
+*/
+svint64_t s64_x_pg_n_s64_0 (svbool_t pg)
+{
+  return svdiv_n_s64_x (pg, svdup_s64 (0), 3);
+}
+
+/*
+** s64_x_pg_n_s64_by0:
+**	mov	z[0-9]+\.b, #0
+**	ret
+*/
+svint64_t s64_x_pg_n_s64_by0 (svbool_t pg)
+{
+  return svdiv_n_s64_x (pg, svdup_s64 (5), 0);
+}
+
+/*
+** s64_z_pg_n:
+**	mov	z[0-9]+\.d, p[0-7]/z, #1
+**	ret
+*/
+svint64_t s64_z_pg_n (svbool_t pg)
+{
+  return svdiv_n_s64_z (pg, svdup_s64 (5), 3);
+}
+
+/*
+** s64_z_pg_n_s64_0:
+**	mov	z[0-9]+\.d, p[0-7]/z, #0
+**	ret
+*/
+svint64_t s64_z_pg_n_s64_0 (svbool_t pg)
+{
+  return svdiv_n_s64_z (pg, svdup_s64 (0), 3);
+}
+
+/*
+** s64_z_pg_n_s64_by0:
+**	mov	(z[0-9]+\.d), #5
+**	mov	(z[0-9]+)\.b, #0
+**	sdivr	\2\.d, p[0-7]/m, \2\.d, \1
+**	ret
+*/
+svint64_t s64_z_pg_n_s64_by0 (svbool_t pg)
+{
+  return svdiv_n_s64_z (pg, svdup_s64 (5), 0);
+}
+
+/*
+** s64_m_pg_n:
+**	mov	(z[0-9]+\.d), #3
+**	mov	(z[0-9]+\.d), #5
+**	sdiv	\2, p[0-7]/m, \2, \1
+**	ret
+*/
+svint64_t s64_m_pg_n (svbool_t pg)
+{
+  return svdiv_n_s64_m (pg, svdup_s64 (5), 3);
+}
+
+/*
+** s64_x_ptrue_n:
+**	mov	z[0-9]+\.d, #1
+**	ret
+*/
+svint64_t s64_x_ptrue_n ()
+{
+  return svdiv_n_s64_x (svptrue_b64 (), svdup_s64 (5), 3);
+}
+
+/*
+** s64_z_ptrue_n:
+**	mov	z[0-9]+\.d, #1
+**	ret
+*/
+svint64_t s64_z_ptrue_n ()
+{
+  return svdiv_n_s64_z (svptrue_b64 (), svdup_s64 (5), 3);
+}
+
+/*
+** s64_m_ptrue_n:
+**	mov	z[0-9]+\.d, #1
+**	ret
+*/
+svint64_t s64_m_ptrue_n ()
+{
+  return svdiv_n_s64_m (svptrue_b64 (), svdup_s64 (5), 3);
+}
+
+/*
+** s32_m_ptrue_dupq:
+**	mov	z[0-9]+\.b, #0
+**	ret
+*/
+svint32_t s32_m_ptrue_dupq ()
+{
+  return svdiv_s32_m (svptrue_b32 (), svdupq_s32 (3, 0, -5, 11),
+		      svdupq_s32 (4, 1, -6, 0));
+}
+
+/*
+** s32_z_ptrue_dupq:
+**	mov	z[0-9]+\.s, #-2
+**	ret
+*/
+svint32_t s32_z_ptrue_dupq ()
+{
+  return svdiv_s32_z (svptrue_b32 (), svdupq_s32 (6, -30, 100, -4),
+		      svdupq_s32 (-3, 15, -50, 2));
+}
+
+/*
+** u64_x_pg:
+**	mov	z[0-9]+\.d, #1
+**	ret
+*/
+svuint64_t u64_x_pg (svbool_t pg)
+{
+  return svdiv_x (pg, svdup_u64 (5), svdup_u64 (3));
+}
+
+/*
+** u64_z_pg:
+**	mov	z[0-9]+\.d, p[0-7]/z, #1
+**	ret
+*/
+svuint64_t u64_z_pg (svbool_t pg)
+{
+  return svdiv_z (pg, svdup_u64 (5), svdup_u64 (3));
+}
+
+/*
+** u64_m_pg:
+**	mov	(z[0-9]+\.d), #3
+**	mov	(z[0-9]+\.d), #5
+**	udiv	\2, p[0-7]/m, \2, \1
+**	ret
+*/
+svuint64_t u64_m_pg (svbool_t pg)
+{
+  return svdiv_m (pg, svdup_u64 (5), svdup_u64 (3));
+}
+
+/*
+** u64_x_ptrue:
+**	mov	z[0-9]+\.d, #1
+**	ret
+*/
+svuint64_t u64_x_ptrue ()
+{
+  return svdiv_x (svptrue_b64 (), svdup_u64 (5), svdup_u64 (3));
+}
+
+/*
+** u64_z_ptrue:
+**	mov	z[0-9]+\.d, #1
+**	ret
+*/
+svuint64_t u64_z_ptrue ()
+{
+  return svdiv_z (svptrue_b64 (), svdup_u64 (5), svdup_u64 (3));
+}
+
+/*
+** u64_m_ptrue:
+**	mov	z[0-9]+\.d, #1
+**	ret
+*/
+svuint64_t u64_m_ptrue ()
+{
+  return svdiv_m (svptrue_b64 (), svdup_u64 (5), svdup_u64 (3));
+}
+
+/*
+** u64_x_pg_n:
+**	mov	z[0-9]+\.d, #1
+**	ret
+*/
+svuint64_t u64_x_pg_n (svbool_t pg)
+{
+  return svdiv_n_u64_x (pg, svdup_u64 (5), 3);
+}
+
+/*
+** u64_z_pg_n:
+**	mov	z[0-9]+\.d, p[0-7]/z, #1
+**	ret
+*/
+svuint64_t u64_z_pg_n (svbool_t pg)
+{
+  return svdiv_n_u64_z (pg, svdup_u64 (5), 3);
+}
+
+/*
+** u64_m_pg_n:
+**	mov	(z[0-9]+\.d), #3
+**	mov	(z[0-9]+\.d), #5
+**	udiv	\2, p[0-7]/m, \2, \1
+**	ret
+*/
+svuint64_t u64_m_pg_n (svbool_t pg)
+{
+  return svdiv_n_u64_m (pg, svdup_u64 (5), 3);
+}
+
+/*
+** u64_x_ptrue_n:
+**	mov	z[0-9]+\.d, #1
+**	ret
+*/
+svuint64_t u64_x_ptrue_n ()
+{
+  return svdiv_n_u64_x (svptrue_b64 (), svdup_u64 (5), 3);
+}
+
+/*
+** u64_z_ptrue_n:
+**	mov	z[0-9]+\.d, #1
+**	ret
+*/
+svuint64_t u64_z_ptrue_n ()
+{
+  return svdiv_n_u64_z (svptrue_b64 (), svdup_u64 (5), 3);
+}
+
+/*
+** u64_m_ptrue_n:
+**	mov	z[0-9]+\.d, #1
+**	ret
+*/
+svuint64_t u64_m_ptrue_n ()
+{
+  return svdiv_n_u64_m (svptrue_b64 (), svdup_u64 (5), 3);
+}
-- 
2.34.1


[-- Attachment #1.3: Type: text/plain, Size: 3 bytes --]





[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 4312 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 2/3] SVE intrinsics: Fold constant operands for svdiv.
  2024-09-02 12:05   ` Jennifer Schmitz
@ 2024-09-02 12:35     ` Richard Sandiford
  0 siblings, 0 replies; 4+ messages in thread
From: Richard Sandiford @ 2024-09-02 12:35 UTC (permalink / raw)
  To: Jennifer Schmitz; +Cc: gcc-patches, Richard Biener, Kyrylo Tkachov

Jennifer Schmitz <jschmitz@nvidia.com> writes:
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc
> index 5ca9ec32b69..60350e08372 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
> @@ -1132,6 +1132,30 @@ report_not_enum (location_t location, tree fndecl, unsigned int argno,
>  	    " a valid %qT value", actual, argno + 1, fndecl, enumtype);
>  }
>  
> +/* Try to fold constant arguments arg1 and arg2 using the given tree_code.
> +   Operations are not treated as overflowing.  */

Sorry for the nit, but: the convention is to put argument names in caps,
so ARG1 and ARG2.

> +static tree
> +aarch64_const_binop (enum tree_code code, tree arg1, tree arg2)
> +{
> +  if (poly_int_tree_p (arg1) && poly_int_tree_p (arg2))
> +    {
> +      poly_wide_int poly_res;
> +      tree type = TREE_TYPE (arg1);
> +      signop sign = TYPE_SIGN (type);
> +      wi::overflow_type overflow = wi::OVF_NONE;
> +
> +      /* Return 0 for division by 0.  */

Maybe add ", like SDIV and UDIV do", to make it clearer where this has
come from.

> +      if (code == TRUNC_DIV_EXPR && integer_zerop (arg2))
> +	return arg2;
> +
> +      if (!poly_int_binop (poly_res, code, arg1, arg2, sign, &overflow))
> +	return NULL_TREE;
> +      return force_fit_type (type, poly_res, false,
> +			     TREE_OVERFLOW (arg1) | TREE_OVERFLOW (arg2));
> +    }
> +  return NULL_TREE;
> +}
> +
>  /* Return a hash code for a function_instance.  */
>  hashval_t
>  function_instance::hash () const
> @@ -3616,6 +3640,23 @@ gimple_folder::fold ()
>    return base->fold (*this);
>  }
>  
> +/* Try to fold constant integer operands.  */

Maybe:

/* Try to fold the call to a constant, given that, for integers, the call
   is roughly equivalent to binary operation CODE.  aarch64_const_binop
   handles any differences between CODE and the intrinsic.  */

> +gimple *
> +gimple_folder::fold_const_binary (enum tree_code code)
> +{
> +  gcc_assert (gimple_call_num_args (call) == 3);
> +  tree pg = gimple_call_arg (call, 0);
> +  tree op1 = gimple_call_arg (call, 1);
> +  tree op2 = gimple_call_arg (call, 2);
> +
> +  if (type_suffix (0).integer_p
> +      && (pred == PRED_x || is_ptrue (pg, type_suffix (0).element_bytes)))
> +    if (tree res = vector_const_binop (code, op1, op2, aarch64_const_binop))
> +      return gimple_build_assign (lhs, res);
> +
> +  return NULL;
> +}
> +
>  function_expander::function_expander (const function_instance &instance,
>  				      tree fndecl, tree call_expr_in,
>  				      rtx possible_target_in)
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h b/gcc/config/aarch64/aarch64-sve-builtins.h
> index 9ab6f202c30..22e9a815039 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins.h
> +++ b/gcc/config/aarch64/aarch64-sve-builtins.h
> @@ -639,6 +639,8 @@ public:
>  
>    gimple *fold ();
>  
> +  gimple *fold_const_binary (enum tree_code);
> +

Sorry for being so picky, but could you put this above "fold" instead?
fold is the top-level entry point, so I think it should either come
first or last.

OK for trunk with those changes, thanks.

Richard

>    /* Where to insert extra statements that feed the final replacement.  */
>    gimple_stmt_iterator *gsi;
>  

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-09-02 12:35 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-08-30 11:41 [PATCH 2/3] SVE intrinsics: Fold constant operands for svdiv Jennifer Schmitz
2024-08-30 12:17 ` Richard Sandiford
2024-09-02 12:05   ` Jennifer Schmitz
2024-09-02 12:35     ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).