From: Richard Sandiford <richard.sandiford@arm.com>
To: gcc-patches@gcc.gnu.org
Cc: rguenther@suse.de
Subject: [PATCH 1/2] Add support for conditional xorsign [PR96373]
Date: Fri, 27 Jan 2023 11:06:20 +0000 [thread overview]
Message-ID: <mptpmb0kzir.fsf@arm.com> (raw)
This patch is an optimisation, but it's also a prerequisite for
fixing PR96373 without regressing vect-xorsign_exec.c.
Currently the vectoriser vectorises:
for (i = 0; i < N; i++)
r[i] = a[i] * __builtin_copysignf (1.0f, b[i]);
as two unconditional operations (copysign and mult).
tree-ssa-math-opts.cc later combines them into an "xorsign" function.
This works for both Advanced SIMD and SVE.
However, with the fix for PR96373, the vectoriser will instead
generate a conditional multiplication (IFN_COND_MUL). Something then
needs to fold copysign & IFN_COND_MUL to the equivalent of a conditional
xorsign. Three obvious options were:
(1) Extend tree-ssa-math-opts.cc.
(2) Do the fold in match.pd.
(3) Leave it to rtl combine.
I'm against (3), because this isn't a target-specific optimisation.
(1) would be possible, but would involve open-coding a lot of what
match.pd does for us. And, in contrast to doing the current
tree-ssa-math-opts.cc optimisation in match.pd, there should be
no danger of (2) happening too early. If we have an IFN_COND_MUL
then we're already past the stage of simplifying the original
source code.
There was also a choice between adding a conditional xorsign ifn
and simply open-coding the xorsign. The latter seems simpler,
and means less boiler-plate for target-specific code.
The signed_or_unsigned_type_for change is needed to make sure
that we stay in "SVE space" when doing the optimisation on 128-bit
fixed-length SVE.
Tested on aarch64-linux-gnu. OK to install?
Richard
gcc/
PR tree-optimization/96373
* tree.h (sign_mask_for): Declare.
* tree.cc (sign_mask_for): New function.
(signed_or_unsigned_type_for): For vector types, try to use the
related_int_vector_mode.
* genmatch.cc (commutative_op): Handle conditional internal functions.
* match.pd: Fold an IFN_COND_MUL+copysign into an IFN_COND_XOR+and.
gcc/testsuite/
PR tree-optimization/96373
* gcc.target/aarch64/sve/cond_xorsign_1.c: New test.
* gcc.target/aarch64/sve/cond_xorsign_2.c: Likewise.
---
gcc/genmatch.cc | 15 ++++++++
gcc/match.pd | 14 ++++++++
.../gcc.target/aarch64/sve/cond_xorsign_1.c | 34 +++++++++++++++++++
.../gcc.target/aarch64/sve/cond_xorsign_2.c | 17 ++++++++++
gcc/tree.cc | 33 ++++++++++++++++++
gcc/tree.h | 1 +
6 files changed, 114 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_xorsign_1.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_xorsign_2.c
diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
index d4cb439a851..e147ab9db7a 100644
--- a/gcc/genmatch.cc
+++ b/gcc/genmatch.cc
@@ -489,6 +489,21 @@ commutative_op (id_base *id)
case CFN_FNMS:
return 0;
+ case CFN_COND_ADD:
+ case CFN_COND_MUL:
+ case CFN_COND_MIN:
+ case CFN_COND_MAX:
+ case CFN_COND_FMIN:
+ case CFN_COND_FMAX:
+ case CFN_COND_AND:
+ case CFN_COND_IOR:
+ case CFN_COND_XOR:
+ case CFN_COND_FMA:
+ case CFN_COND_FMS:
+ case CFN_COND_FNMA:
+ case CFN_COND_FNMS:
+ return 1;
+
default:
return -1;
}
diff --git a/gcc/match.pd b/gcc/match.pd
index 56ac743aa6d..f605b798c44 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -339,6 +339,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(if (REAL_VALUE_NEGATIVE (TREE_REAL_CST (@0)))
(COPYSIGN_ALL (negate @0) @1)))
+/* Transform c ? x * copysign (1, y) : z to c ? x ^ signs(y) : z.
+ tree-ssa-math-opts.cc does the corresponding optimization for
+ unconditional multiplications (via xorsign). */
+(simplify
+ (IFN_COND_MUL:c @0 @1 (IFN_COPYSIGN real_onep @2) @3)
+ (with { tree signs = sign_mask_for (type); }
+ (if (signs)
+ (with { tree inttype = TREE_TYPE (signs); }
+ (view_convert:type
+ (IFN_COND_XOR:inttype @0
+ (view_convert:inttype @1)
+ (bit_and (view_convert:inttype @2) { signs; })
+ (view_convert:inttype @3)))))))
+
/* (x >= 0 ? x : 0) + (x <= 0 ? -x : 0) -> abs x. */
(simplify
(plus:c (max @0 integer_zerop) (max (negate @0) integer_zerop))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_xorsign_1.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_xorsign_1.c
new file mode 100644
index 00000000000..338ca605923
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_xorsign_1.c
@@ -0,0 +1,34 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+#define xorsign(A, B, SUFFIX) ((A) * __builtin_copysign##SUFFIX (1.0, B))
+
+#define DEF_LOOP(TYPE, SUFFIX) \
+ void __attribute__ ((noinline, noclone)) \
+ test_##TYPE (TYPE *__restrict r, TYPE *__restrict a, \
+ TYPE *__restrict b, TYPE *__restrict c, \
+ int n) \
+ { \
+ for (int i = 0; i < n; ++i) \
+ r[i] = a[i] < 20 ? xorsign(b[i], c[i], SUFFIX) : b[i]; \
+ }
+
+#define TEST_ALL(T) \
+ T (_Float16, f16) \
+ T (float, f) \
+ T (double, )
+
+TEST_ALL (DEF_LOOP)
+
+/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.h, z[0-9]+\.h,} 1 } } */
+/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.s, z[0-9]+\.s,} 1 } } */
+/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.d, z[0-9]+\.d,} 1 } } */
+
+/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.h, p[0-7]/m,} 1 } } */
+/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.s, p[0-7]/m,} 1 } } */
+/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.d, p[0-7]/m,} 1 } } */
+
+/* { dg-final { scan-assembler-not {\tfmul} } } */
+/* { dg-final { scan-assembler-not {\tmov\tz[^,]*z} } } */
+/* { dg-final { scan-assembler-not {\tmovprfx\t} } } */
+/* { dg-final { scan-assembler-not {\tsel\t} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_xorsign_2.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_xorsign_2.c
new file mode 100644
index 00000000000..274dd0ede59
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_xorsign_2.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msve-vector-bits=128 --param aarch64-autovec-preference=2" } */
+
+#include "cond_xorsign_1.c"
+
+/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.h, z[0-9]+\.h,} 1 } } */
+/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.s, z[0-9]+\.s,} 1 } } */
+/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.d, z[0-9]+\.d,} 1 } } */
+
+/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.h, p[0-7]/m,} 1 } } */
+/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.s, p[0-7]/m,} 1 } } */
+/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.d, p[0-7]/m,} 1 } } */
+
+/* { dg-final { scan-assembler-not {\tfmul} } } */
+/* { dg-final { scan-assembler-not {\tmov\tz[^,]*z} } } */
+/* { dg-final { scan-assembler-not {\tmovprfx\t} } } */
+/* { dg-final { scan-assembler-not {\tsel\t} } } */
diff --git a/gcc/tree.cc b/gcc/tree.cc
index 7473912a065..94d20eaba17 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -2669,6 +2669,35 @@ build_zero_cst (tree type)
}
}
+/* If floating-point type TYPE has an IEEE-style sign bit, return an
+ unsigned constant in which only the sign bit is set. Return null
+ otherwise. */
+
+tree
+sign_mask_for (tree type)
+{
+ /* Avoid having to choose between a real-only sign and a pair of signs.
+ This could be relaxed if the choice becomes obvious later. */
+ if (TREE_CODE (type) == COMPLEX_TYPE)
+ return NULL_TREE;
+
+ auto eltmode = as_a<scalar_float_mode> (element_mode (type));
+ auto bits = REAL_MODE_FORMAT (eltmode)->ieee_bits;
+ if (!bits || !pow2p_hwi (bits))
+ return NULL_TREE;
+
+ tree inttype = unsigned_type_for (type);
+ if (!inttype)
+ return NULL_TREE;
+
+ auto mask = wi::set_bit_in_zero (bits - 1, bits);
+ if (TREE_CODE (inttype) == VECTOR_TYPE)
+ {
+ tree elt = wide_int_to_tree (TREE_TYPE (inttype), mask);
+ return build_vector_from_val (inttype, elt);
+ }
+ return wide_int_to_tree (inttype, mask);
+}
/* Build a BINFO with LEN language slots. */
@@ -10961,6 +10990,10 @@ signed_or_unsigned_type_for (int unsignedp, tree type)
return NULL_TREE;
if (inner == inner2)
return type;
+ machine_mode new_mode;
+ if (VECTOR_MODE_P (TYPE_MODE (type))
+ && related_int_vector_mode (TYPE_MODE (type)).exists (&new_mode))
+ return build_vector_type_for_mode (inner2, new_mode);
return build_vector_type (inner2, TYPE_VECTOR_SUBPARTS (type));
}
diff --git a/gcc/tree.h b/gcc/tree.h
index e730a2a3e56..c656cd5b7bf 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -4675,6 +4675,7 @@ extern tree build_one_cst (tree);
extern tree build_minus_one_cst (tree);
extern tree build_all_ones_cst (tree);
extern tree build_zero_cst (tree);
+extern tree sign_mask_for (tree);
extern tree build_string (unsigned, const char * = NULL);
extern tree build_poly_int_cst (tree, const poly_wide_int_ref &);
extern tree build_tree_list (tree, tree CXX_MEM_STAT_INFO);
--
2.25.1
next reply other threads:[~2023-01-27 11:06 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-27 11:06 Richard Sandiford [this message]
2023-01-27 11:36 ` Richard Biener
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=mptpmb0kzir.fsf@arm.com \
--to=richard.sandiford@arm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=rguenther@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).