* [PATCH] Add FMADDSUB and FMSUBADD SLP vectorization patterns and optabs
@ 2021-07-05 14:09 Richard Biener
2021-07-05 14:25 ` Richard Biener
2021-07-06 2:16 ` Hongtao Liu
0 siblings, 2 replies; 8+ messages in thread
From: Richard Biener @ 2021-07-05 14:09 UTC (permalink / raw)
To: gcc-patches; +Cc: tamar.christina, ubizjak, hongtao.liu
This adds named expanders for vec_fmaddsub<mode>4 and
vec_fmsubadd<mode>4 which map to x86 vfmaddsubXXXp{ds} and
vfmsubaddXXXp{ds} instructions. This complements the previous
addition of ADDSUB support.
x86 lacks SUBADD and the negate variants of FMA with mixed
plus minus so I did not add optabs or patterns for those but
it would not be difficult if there's a target that has them.
Maybe one of the complex fma patterns match those variants?
I did not dare to rewrite the numerous patterns to the new
canonical name but instead added two new expanders. Note I
did not cover AVX512 since the existing patterns are separated
and I have no easy way to test things there. Handling AVX512
should be easy as followup though.
Bootstrap and testing on x86_64-unknown-linux-gnu in progress.
Any comments?
Thanks,
Richard.
2021-07-05 Richard Biener <rguenther@suse.de>
* doc/md.texi (vec_fmaddsub<mode>4): Document.
(vec_fmsubadd<mode>4): Likewise.
* optabs.def (vec_fmaddsub$a4): Add.
(vec_fmsubadd$a4): Likewise.
* internal-fn.def (IFN_VEC_FMADDSUB): Add.
(IFN_VEC_FMSUBADD): Likewise.
* tree-vect-slp-patterns.c (addsub_pattern::recognize):
Refactor to handle IFN_VEC_FMADDSUB and IFN_VEC_FMSUBADD.
(addsub_pattern::build): Likewise.
* tree-vect-slp.c (vect_optimize_slp): CFN_VEC_FMADDSUB
and CFN_VEC_FMSUBADD are not transparent for permutes.
* config/i386/sse.md (vec_fmaddsub<mode>4): New expander.
(vec_fmsubadd<mode>4): Likewise.
* gcc.target/i386/vect-fmaddsubXXXpd.c: New testcase.
* gcc.target/i386/vect-fmaddsubXXXps.c: Likewise.
* gcc.target/i386/vect-fmsubaddXXXpd.c: Likewise.
* gcc.target/i386/vect-fmsubaddXXXps.c: Likewise.
---
gcc/config/i386/sse.md | 19 ++
gcc/doc/md.texi | 14 ++
gcc/internal-fn.def | 3 +-
gcc/optabs.def | 2 +
.../gcc.target/i386/vect-fmaddsubXXXpd.c | 34 ++++
.../gcc.target/i386/vect-fmaddsubXXXps.c | 34 ++++
.../gcc.target/i386/vect-fmsubaddXXXpd.c | 34 ++++
.../gcc.target/i386/vect-fmsubaddXXXps.c | 34 ++++
gcc/tree-vect-slp-patterns.c | 192 +++++++++++++-----
gcc/tree-vect-slp.c | 2 +
10 files changed, 311 insertions(+), 57 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c
create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c
create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c
create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index bcf1605d147..6fc13c184bf 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -4644,6 +4644,25 @@
;;
;; But this doesn't seem useful in practice.
+(define_expand "vec_fmaddsub<mode>4"
+ [(set (match_operand:VF 0 "register_operand")
+ (unspec:VF
+ [(match_operand:VF 1 "nonimmediate_operand")
+ (match_operand:VF 2 "nonimmediate_operand")
+ (match_operand:VF 3 "nonimmediate_operand")]
+ UNSPEC_FMADDSUB))]
+ "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F")
+
+(define_expand "vec_fmsubadd<mode>4"
+ [(set (match_operand:VF 0 "register_operand")
+ (unspec:VF
+ [(match_operand:VF 1 "nonimmediate_operand")
+ (match_operand:VF 2 "nonimmediate_operand")
+ (neg:VF
+ (match_operand:VF 3 "nonimmediate_operand"))]
+ UNSPEC_FMADDSUB))]
+ "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F")
+
(define_expand "fmaddsub_<mode>"
[(set (match_operand:VF 0 "register_operand")
(unspec:VF
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 1b918144330..cc92ebd26aa 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5688,6 +5688,20 @@ Alternating subtract, add with even lanes doing subtract and odd
lanes doing addition. Operands 1 and 2 and the outout operand are vectors
with mode @var{m}.
+@cindex @code{vec_fmaddsub@var{m}4} instruction pattern
+@item @samp{vec_fmaddsub@var{m}4}
+Alternating multiply subtract, add with even lanes doing subtract and odd
+lanes doing addition of the third operand to the multiplication result
+of the first two operands. Operands 1, 2 and 3 and the outout operand are vectors
+with mode @var{m}.
+
+@cindex @code{vec_fmsubadd@var{m}4} instruction pattern
+@item @samp{vec_fmsubadd@var{m}4}
+Alternating multiply add, subtract with even lanes doing addition and odd
+lanes doing subtraction of the third operand to the multiplication result
+of the first two operands. Operands 1, 2 and 3 and the outout operand are vectors
+with mode @var{m}.
+
These instructions are not allowed to @code{FAIL}.
@cindex @code{mulhisi3} instruction pattern
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index c3b8e730960..a7003d5da8e 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -282,7 +282,8 @@ DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary)
DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL, ECF_CONST, cmul, binary)
DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL_CONJ, ECF_CONST, cmul_conj, binary)
DEF_INTERNAL_OPTAB_FN (VEC_ADDSUB, ECF_CONST, vec_addsub, binary)
-
+DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, vec_fmaddsub, ternary)
+DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd, ternary)
/* FP scales. */
DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 41ab2598eb6..51acc1be8f5 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -408,6 +408,8 @@ OPTAB_D (vec_widen_usubl_lo_optab, "vec_widen_usubl_lo_$a")
OPTAB_D (vec_widen_uaddl_hi_optab, "vec_widen_uaddl_hi_$a")
OPTAB_D (vec_widen_uaddl_lo_optab, "vec_widen_uaddl_lo_$a")
OPTAB_D (vec_addsub_optab, "vec_addsub$a3")
+OPTAB_D (vec_fmaddsub_optab, "vec_fmaddsub$a4")
+OPTAB_D (vec_fmsubadd_optab, "vec_fmsubadd$a4")
OPTAB_D (sync_add_optab, "sync_add$I$a")
OPTAB_D (sync_and_optab, "sync_and$I$a")
diff --git a/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c
new file mode 100644
index 00000000000..b30d10731a7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c
@@ -0,0 +1,34 @@
+/* { dg-do run } */
+/* { dg-require-effective-target fma } */
+/* { dg-options "-O3 -mfma -save-temps" } */
+
+#include "fma-check.h"
+
+void __attribute__((noipa))
+check_fmaddsub (double * __restrict a, double *b, double *c, int n)
+{
+ for (int i = 0; i < n; ++i)
+ {
+ a[2*i + 0] = b[2*i + 0] * c[2*i + 0] - a[2*i + 0];
+ a[2*i + 1] = b[2*i + 1] * c[2*i + 1] + a[2*i + 1];
+ }
+}
+
+static void
+fma_test (void)
+{
+ double a[4], b[4], c[4];
+ for (int i = 0; i < 4; ++i)
+ {
+ a[i] = i;
+ b[i] = 3*i;
+ c[i] = 7*i;
+ }
+ check_fmaddsub (a, b, c, 2);
+ const double d[4] = { 0., 22., 82., 192. };
+ for (int i = 0; i < 4; ++i)
+ if (a[i] != d[i])
+ __builtin_abort ();
+}
+
+/* { dg-final { scan-assembler "fmaddsub...pd" } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c
new file mode 100644
index 00000000000..cd2af8725a3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c
@@ -0,0 +1,34 @@
+/* { dg-do run } */
+/* { dg-require-effective-target fma } */
+/* { dg-options "-O3 -mfma -save-temps" } */
+
+#include "fma-check.h"
+
+void __attribute__((noipa))
+check_fmaddsub (float * __restrict a, float *b, float *c, int n)
+{
+ for (int i = 0; i < n; ++i)
+ {
+ a[2*i + 0] = b[2*i + 0] * c[2*i + 0] - a[2*i + 0];
+ a[2*i + 1] = b[2*i + 1] * c[2*i + 1] + a[2*i + 1];
+ }
+}
+
+static void
+fma_test (void)
+{
+ float a[4], b[4], c[4];
+ for (int i = 0; i < 4; ++i)
+ {
+ a[i] = i;
+ b[i] = 3*i;
+ c[i] = 7*i;
+ }
+ check_fmaddsub (a, b, c, 2);
+ const float d[4] = { 0., 22., 82., 192. };
+ for (int i = 0; i < 4; ++i)
+ if (a[i] != d[i])
+ __builtin_abort ();
+}
+
+/* { dg-final { scan-assembler "fmaddsub...ps" } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c
new file mode 100644
index 00000000000..7ca2a275cc1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c
@@ -0,0 +1,34 @@
+/* { dg-do run } */
+/* { dg-require-effective-target fma } */
+/* { dg-options "-O3 -mfma -save-temps" } */
+
+#include "fma-check.h"
+
+void __attribute__((noipa))
+check_fmsubadd (double * __restrict a, double *b, double *c, int n)
+{
+ for (int i = 0; i < n; ++i)
+ {
+ a[2*i + 0] = b[2*i + 0] * c[2*i + 0] + a[2*i + 0];
+ a[2*i + 1] = b[2*i + 1] * c[2*i + 1] - a[2*i + 1];
+ }
+}
+
+static void
+fma_test (void)
+{
+ double a[4], b[4], c[4];
+ for (int i = 0; i < 4; ++i)
+ {
+ a[i] = i;
+ b[i] = 3*i;
+ c[i] = 7*i;
+ }
+ check_fmsubadd (a, b, c, 2);
+ const double d[4] = { 0., 20., 86., 186. };
+ for (int i = 0; i < 4; ++i)
+ if (a[i] != d[i])
+ __builtin_abort ();
+}
+
+/* { dg-final { scan-assembler "fmsubadd...pd" } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c
new file mode 100644
index 00000000000..9ddd0e423db
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c
@@ -0,0 +1,34 @@
+/* { dg-do run } */
+/* { dg-require-effective-target fma } */
+/* { dg-options "-O3 -mfma -save-temps" } */
+
+#include "fma-check.h"
+
+void __attribute__((noipa))
+check_fmsubadd (float * __restrict a, float *b, float *c, int n)
+{
+ for (int i = 0; i < n; ++i)
+ {
+ a[2*i + 0] = b[2*i + 0] * c[2*i + 0] + a[2*i + 0];
+ a[2*i + 1] = b[2*i + 1] * c[2*i + 1] - a[2*i + 1];
+ }
+}
+
+static void
+fma_test (void)
+{
+ float a[4], b[4], c[4];
+ for (int i = 0; i < 4; ++i)
+ {
+ a[i] = i;
+ b[i] = 3*i;
+ c[i] = 7*i;
+ }
+ check_fmsubadd (a, b, c, 2);
+ const float d[4] = { 0., 20., 86., 186. };
+ for (int i = 0; i < 4; ++i)
+ if (a[i] != d[i])
+ __builtin_abort ();
+}
+
+/* { dg-final { scan-assembler "fmsubadd...ps" } } */
diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
index 2671f91972d..f774cac4a4d 100644
--- a/gcc/tree-vect-slp-patterns.c
+++ b/gcc/tree-vect-slp-patterns.c
@@ -1496,8 +1496,8 @@ complex_operations_pattern::build (vec_info * /* vinfo */)
class addsub_pattern : public vect_pattern
{
public:
- addsub_pattern (slp_tree *node)
- : vect_pattern (node, NULL, IFN_VEC_ADDSUB) {};
+ addsub_pattern (slp_tree *node, internal_fn ifn)
+ : vect_pattern (node, NULL, ifn) {};
void build (vec_info *);
@@ -1510,46 +1510,68 @@ addsub_pattern::recognize (slp_tree_to_load_perm_map_t *, slp_tree *node_)
{
slp_tree node = *node_;
if (SLP_TREE_CODE (node) != VEC_PERM_EXPR
- || SLP_TREE_CHILDREN (node).length () != 2)
+ || SLP_TREE_CHILDREN (node).length () != 2
+ || SLP_TREE_LANE_PERMUTATION (node).length () % 2)
return NULL;
/* Match a blend of a plus and a minus op with the same number of plus and
minus lanes on the same operands. */
- slp_tree sub = SLP_TREE_CHILDREN (node)[0];
- slp_tree add = SLP_TREE_CHILDREN (node)[1];
- bool swapped_p = false;
- if (vect_match_expression_p (sub, PLUS_EXPR))
- {
- std::swap (add, sub);
- swapped_p = true;
- }
- if (!(vect_match_expression_p (add, PLUS_EXPR)
- && vect_match_expression_p (sub, MINUS_EXPR)))
+ unsigned l0 = SLP_TREE_LANE_PERMUTATION (node)[0].first;
+ unsigned l1 = SLP_TREE_LANE_PERMUTATION (node)[1].first;
+ if (l0 == l1)
+ return NULL;
+ bool l0add_p = vect_match_expression_p (SLP_TREE_CHILDREN (node)[l0],
+ PLUS_EXPR);
+ if (!l0add_p
+ && !vect_match_expression_p (SLP_TREE_CHILDREN (node)[l0], MINUS_EXPR))
+ return NULL;
+ bool l1add_p = vect_match_expression_p (SLP_TREE_CHILDREN (node)[l1],
+ PLUS_EXPR);
+ if (!l1add_p
+ && !vect_match_expression_p (SLP_TREE_CHILDREN (node)[l1], MINUS_EXPR))
return NULL;
- if (!((SLP_TREE_CHILDREN (sub)[0] == SLP_TREE_CHILDREN (add)[0]
- && SLP_TREE_CHILDREN (sub)[1] == SLP_TREE_CHILDREN (add)[1])
- || (SLP_TREE_CHILDREN (sub)[0] == SLP_TREE_CHILDREN (add)[1]
- && SLP_TREE_CHILDREN (sub)[1] == SLP_TREE_CHILDREN (add)[0])))
+
+ slp_tree l0node = SLP_TREE_CHILDREN (node)[l0];
+ slp_tree l1node = SLP_TREE_CHILDREN (node)[l1];
+ if (!((SLP_TREE_CHILDREN (l0node)[0] == SLP_TREE_CHILDREN (l1node)[0]
+ && SLP_TREE_CHILDREN (l0node)[1] == SLP_TREE_CHILDREN (l1node)[1])
+ || (SLP_TREE_CHILDREN (l0node)[0] == SLP_TREE_CHILDREN (l1node)[1]
+ && SLP_TREE_CHILDREN (l0node)[1] == SLP_TREE_CHILDREN (l1node)[0])))
return NULL;
for (unsigned i = 0; i < SLP_TREE_LANE_PERMUTATION (node).length (); ++i)
{
std::pair<unsigned, unsigned> perm = SLP_TREE_LANE_PERMUTATION (node)[i];
- if (swapped_p)
- perm.first = perm.first == 0 ? 1 : 0;
- /* It has to be alternating -, +, -, ...
+ /* It has to be alternating -, +, -,
While we could permute the .ADDSUB inputs and the .ADDSUB output
that's only profitable over the add + sub + blend if at least
one of the permute is optimized which we can't determine here. */
- if (perm.first != (i & 1)
+ if (perm.first != ((i & 1) ? l1 : l0)
|| perm.second != i)
return NULL;
}
- if (!vect_pattern_validate_optab (IFN_VEC_ADDSUB, node))
- return NULL;
+ /* Now we have either { -, +, -, + ... } (!l0add_p) or { +, -, +, - ... }
+ (l0add_p), see whether we have FMA variants. */
+ if (!l0add_p
+ && vect_match_expression_p (SLP_TREE_CHILDREN (l0node)[0], MULT_EXPR))
+ {
+ /* (c * d) -+ a */
+ if (vect_pattern_validate_optab (IFN_VEC_FMADDSUB, node))
+ return new addsub_pattern (node_, IFN_VEC_FMADDSUB);
+ }
+ else if (l0add_p
+ && vect_match_expression_p (SLP_TREE_CHILDREN (l1node)[0], MULT_EXPR))
+ {
+ /* (c * d) +- a */
+ if (vect_pattern_validate_optab (IFN_VEC_FMSUBADD, node))
+ return new addsub_pattern (node_, IFN_VEC_FMSUBADD);
+ }
- return new addsub_pattern (node_);
+ if (!l0add_p && vect_pattern_validate_optab (IFN_VEC_ADDSUB, node))
+ return new addsub_pattern (node_, IFN_VEC_ADDSUB);
+
+ return NULL;
}
void
@@ -1557,38 +1579,96 @@ addsub_pattern::build (vec_info *vinfo)
{
slp_tree node = *m_node;
- slp_tree sub = SLP_TREE_CHILDREN (node)[0];
- slp_tree add = SLP_TREE_CHILDREN (node)[1];
- if (vect_match_expression_p (sub, PLUS_EXPR))
- std::swap (add, sub);
-
- /* Modify the blend node in-place. */
- SLP_TREE_CHILDREN (node)[0] = SLP_TREE_CHILDREN (sub)[0];
- SLP_TREE_CHILDREN (node)[1] = SLP_TREE_CHILDREN (sub)[1];
- SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[0])++;
- SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[1])++;
-
- /* Build IFN_VEC_ADDSUB from the sub representative operands. */
- stmt_vec_info rep = SLP_TREE_REPRESENTATIVE (sub);
- gcall *call = gimple_build_call_internal (IFN_VEC_ADDSUB, 2,
- gimple_assign_rhs1 (rep->stmt),
- gimple_assign_rhs2 (rep->stmt));
- gimple_call_set_lhs (call, make_ssa_name
- (TREE_TYPE (gimple_assign_lhs (rep->stmt))));
- gimple_call_set_nothrow (call, true);
- gimple_set_bb (call, gimple_bb (rep->stmt));
- stmt_vec_info new_rep = vinfo->add_pattern_stmt (call, rep);
- SLP_TREE_REPRESENTATIVE (node) = new_rep;
- STMT_VINFO_RELEVANT (new_rep) = vect_used_in_scope;
- STMT_SLP_TYPE (new_rep) = pure_slp;
- STMT_VINFO_VECTYPE (new_rep) = SLP_TREE_VECTYPE (node);
- STMT_VINFO_SLP_VECT_ONLY_PATTERN (new_rep) = true;
- STMT_VINFO_REDUC_DEF (new_rep) = STMT_VINFO_REDUC_DEF (vect_orig_stmt (rep));
- SLP_TREE_CODE (node) = ERROR_MARK;
- SLP_TREE_LANE_PERMUTATION (node).release ();
-
- vect_free_slp_tree (sub);
- vect_free_slp_tree (add);
+ unsigned l0 = SLP_TREE_LANE_PERMUTATION (node)[0].first;
+ unsigned l1 = SLP_TREE_LANE_PERMUTATION (node)[1].first;
+
+ switch (m_ifn)
+ {
+ case IFN_VEC_ADDSUB:
+ {
+ slp_tree sub = SLP_TREE_CHILDREN (node)[l0];
+ slp_tree add = SLP_TREE_CHILDREN (node)[l1];
+
+ /* Modify the blend node in-place. */
+ SLP_TREE_CHILDREN (node)[0] = SLP_TREE_CHILDREN (sub)[0];
+ SLP_TREE_CHILDREN (node)[1] = SLP_TREE_CHILDREN (sub)[1];
+ SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[0])++;
+ SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[1])++;
+
+ /* Build IFN_VEC_ADDSUB from the sub representative operands. */
+ stmt_vec_info rep = SLP_TREE_REPRESENTATIVE (sub);
+ gcall *call = gimple_build_call_internal (IFN_VEC_ADDSUB, 2,
+ gimple_assign_rhs1 (rep->stmt),
+ gimple_assign_rhs2 (rep->stmt));
+ gimple_call_set_lhs (call, make_ssa_name
+ (TREE_TYPE (gimple_assign_lhs (rep->stmt))));
+ gimple_call_set_nothrow (call, true);
+ gimple_set_bb (call, gimple_bb (rep->stmt));
+ stmt_vec_info new_rep = vinfo->add_pattern_stmt (call, rep);
+ SLP_TREE_REPRESENTATIVE (node) = new_rep;
+ STMT_VINFO_RELEVANT (new_rep) = vect_used_in_scope;
+ STMT_SLP_TYPE (new_rep) = pure_slp;
+ STMT_VINFO_VECTYPE (new_rep) = SLP_TREE_VECTYPE (node);
+ STMT_VINFO_SLP_VECT_ONLY_PATTERN (new_rep) = true;
+ STMT_VINFO_REDUC_DEF (new_rep) = STMT_VINFO_REDUC_DEF (vect_orig_stmt (rep));
+ SLP_TREE_CODE (node) = ERROR_MARK;
+ SLP_TREE_LANE_PERMUTATION (node).release ();
+
+ vect_free_slp_tree (sub);
+ vect_free_slp_tree (add);
+ break;
+ }
+ case IFN_VEC_FMADDSUB:
+ case IFN_VEC_FMSUBADD:
+ {
+ slp_tree sub, add;
+ if (m_ifn == IFN_VEC_FMADDSUB)
+ {
+ sub = SLP_TREE_CHILDREN (node)[l0];
+ add = SLP_TREE_CHILDREN (node)[l1];
+ }
+ else /* m_ifn == IFN_VEC_FMSUBADD */
+ {
+ sub = SLP_TREE_CHILDREN (node)[l1];
+ add = SLP_TREE_CHILDREN (node)[l0];
+ }
+ slp_tree mul = SLP_TREE_CHILDREN (sub)[0];
+ /* Modify the blend node in-place. */
+ SLP_TREE_CHILDREN (node).safe_grow (3, true);
+ SLP_TREE_CHILDREN (node)[0] = SLP_TREE_CHILDREN (mul)[0];
+ SLP_TREE_CHILDREN (node)[1] = SLP_TREE_CHILDREN (mul)[1];
+ SLP_TREE_CHILDREN (node)[2] = SLP_TREE_CHILDREN (sub)[1];
+ SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[0])++;
+ SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[1])++;
+ SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[2])++;
+
+ /* Build IFN_VEC_FMADDSUB from the mul/sub representative operands. */
+ stmt_vec_info srep = SLP_TREE_REPRESENTATIVE (sub);
+ stmt_vec_info mrep = SLP_TREE_REPRESENTATIVE (mul);
+ gcall *call = gimple_build_call_internal (m_ifn, 3,
+ gimple_assign_rhs1 (mrep->stmt),
+ gimple_assign_rhs2 (mrep->stmt),
+ gimple_assign_rhs2 (srep->stmt));
+ gimple_call_set_lhs (call, make_ssa_name
+ (TREE_TYPE (gimple_assign_lhs (srep->stmt))));
+ gimple_call_set_nothrow (call, true);
+ gimple_set_bb (call, gimple_bb (srep->stmt));
+ stmt_vec_info new_rep = vinfo->add_pattern_stmt (call, srep);
+ SLP_TREE_REPRESENTATIVE (node) = new_rep;
+ STMT_VINFO_RELEVANT (new_rep) = vect_used_in_scope;
+ STMT_SLP_TYPE (new_rep) = pure_slp;
+ STMT_VINFO_VECTYPE (new_rep) = SLP_TREE_VECTYPE (node);
+ STMT_VINFO_SLP_VECT_ONLY_PATTERN (new_rep) = true;
+ STMT_VINFO_REDUC_DEF (new_rep) = STMT_VINFO_REDUC_DEF (vect_orig_stmt (srep));
+ SLP_TREE_CODE (node) = ERROR_MARK;
+ SLP_TREE_LANE_PERMUTATION (node).release ();
+
+ vect_free_slp_tree (sub);
+ vect_free_slp_tree (add);
+ break;
+ }
+ default:;
+ }
}
/*******************************************************************************
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index f08797c2bc0..5357cd0e7a4 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -3728,6 +3728,8 @@ vect_optimize_slp (vec_info *vinfo)
case CFN_COMPLEX_MUL:
case CFN_COMPLEX_MUL_CONJ:
case CFN_VEC_ADDSUB:
+ case CFN_VEC_FMADDSUB:
+ case CFN_VEC_FMSUBADD:
vertices[idx].perm_in = 0;
vertices[idx].perm_out = 0;
default:;
--
2.26.2
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Add FMADDSUB and FMSUBADD SLP vectorization patterns and optabs
2021-07-05 14:09 [PATCH] Add FMADDSUB and FMSUBADD SLP vectorization patterns and optabs Richard Biener
@ 2021-07-05 14:25 ` Richard Biener
2021-07-05 14:38 ` Richard Biener
2021-07-06 2:16 ` Hongtao Liu
1 sibling, 1 reply; 8+ messages in thread
From: Richard Biener @ 2021-07-05 14:25 UTC (permalink / raw)
To: Richard Biener; +Cc: GCC Patches, Hongtao Liu
On Mon, Jul 5, 2021 at 4:09 PM Richard Biener <rguenther@suse.de> wrote:
>
> This adds named expanders for vec_fmaddsub<mode>4 and
> vec_fmsubadd<mode>4 which map to x86 vfmaddsubXXXp{ds} and
> vfmsubaddXXXp{ds} instructions. This complements the previous
> addition of ADDSUB support.
>
> x86 lacks SUBADD and the negate variants of FMA with mixed
> plus minus so I did not add optabs or patterns for those but
> it would not be difficult if there's a target that has them.
> Maybe one of the complex fma patterns match those variants?
>
> I did not dare to rewrite the numerous patterns to the new
> canonical name but instead added two new expanders. Note I
> did not cover AVX512 since the existing patterns are separated
> and I have no easy way to test things there. Handling AVX512
> should be easy as followup though.
>
> Bootstrap and testing on x86_64-unknown-linux-gnu in progress.
FYI, building libgfortran matmul_c4 we hit
/home/rguenther/src/trunk/libgfortran/generated/matmul_c4.c:1781:1:
error: unrecognizable insn:
1781 | }
| ^
(insn 5408 5407 5409 213 (set (reg:V8SF 1454 [ vect__4368.5363 ])
(unspec:V8SF [
(reg:V8SF 4391)
(reg:V8SF 4398)
(reg:V8SF 4415 [ vect__2005.5362 ])
] UNSPEC_FMADDSUB)) -1
(nil))
during RTL pass: vregs
so it looks like the existing fmaddsub_<mode> expander cannot be
simply re-purposed?
> Any comments?
>
> Thanks,
> Richard.
>
> 2021-07-05 Richard Biener <rguenther@suse.de>
>
> * doc/md.texi (vec_fmaddsub<mode>4): Document.
> (vec_fmsubadd<mode>4): Likewise.
> * optabs.def (vec_fmaddsub$a4): Add.
> (vec_fmsubadd$a4): Likewise.
> * internal-fn.def (IFN_VEC_FMADDSUB): Add.
> (IFN_VEC_FMSUBADD): Likewise.
> * tree-vect-slp-patterns.c (addsub_pattern::recognize):
> Refactor to handle IFN_VEC_FMADDSUB and IFN_VEC_FMSUBADD.
> (addsub_pattern::build): Likewise.
> * tree-vect-slp.c (vect_optimize_slp): CFN_VEC_FMADDSUB
> and CFN_VEC_FMSUBADD are not transparent for permutes.
> * config/i386/sse.md (vec_fmaddsub<mode>4): New expander.
> (vec_fmsubadd<mode>4): Likewise.
>
> * gcc.target/i386/vect-fmaddsubXXXpd.c: New testcase.
> * gcc.target/i386/vect-fmaddsubXXXps.c: Likewise.
> * gcc.target/i386/vect-fmsubaddXXXpd.c: Likewise.
> * gcc.target/i386/vect-fmsubaddXXXps.c: Likewise.
> ---
> gcc/config/i386/sse.md | 19 ++
> gcc/doc/md.texi | 14 ++
> gcc/internal-fn.def | 3 +-
> gcc/optabs.def | 2 +
> .../gcc.target/i386/vect-fmaddsubXXXpd.c | 34 ++++
> .../gcc.target/i386/vect-fmaddsubXXXps.c | 34 ++++
> .../gcc.target/i386/vect-fmsubaddXXXpd.c | 34 ++++
> .../gcc.target/i386/vect-fmsubaddXXXps.c | 34 ++++
> gcc/tree-vect-slp-patterns.c | 192 +++++++++++++-----
> gcc/tree-vect-slp.c | 2 +
> 10 files changed, 311 insertions(+), 57 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c
> create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c
> create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c
> create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index bcf1605d147..6fc13c184bf 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -4644,6 +4644,25 @@
> ;;
> ;; But this doesn't seem useful in practice.
>
> +(define_expand "vec_fmaddsub<mode>4"
> + [(set (match_operand:VF 0 "register_operand")
> + (unspec:VF
> + [(match_operand:VF 1 "nonimmediate_operand")
> + (match_operand:VF 2 "nonimmediate_operand")
> + (match_operand:VF 3 "nonimmediate_operand")]
> + UNSPEC_FMADDSUB))]
> + "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F")
> +
> +(define_expand "vec_fmsubadd<mode>4"
> + [(set (match_operand:VF 0 "register_operand")
> + (unspec:VF
> + [(match_operand:VF 1 "nonimmediate_operand")
> + (match_operand:VF 2 "nonimmediate_operand")
> + (neg:VF
> + (match_operand:VF 3 "nonimmediate_operand"))]
> + UNSPEC_FMADDSUB))]
> + "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F")
> +
> (define_expand "fmaddsub_<mode>"
> [(set (match_operand:VF 0 "register_operand")
> (unspec:VF
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 1b918144330..cc92ebd26aa 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5688,6 +5688,20 @@ Alternating subtract, add with even lanes doing subtract and odd
> lanes doing addition. Operands 1 and 2 and the outout operand are vectors
> with mode @var{m}.
>
> +@cindex @code{vec_fmaddsub@var{m}4} instruction pattern
> +@item @samp{vec_fmaddsub@var{m}4}
> +Alternating multiply subtract, add with even lanes doing subtract and odd
> +lanes doing addition of the third operand to the multiplication result
> +of the first two operands. Operands 1, 2 and 3 and the outout operand are vectors
> +with mode @var{m}.
> +
> +@cindex @code{vec_fmsubadd@var{m}4} instruction pattern
> +@item @samp{vec_fmsubadd@var{m}4}
> +Alternating multiply add, subtract with even lanes doing addition and odd
> +lanes doing subtraction of the third operand to the multiplication result
> +of the first two operands. Operands 1, 2 and 3 and the outout operand are vectors
> +with mode @var{m}.
> +
> These instructions are not allowed to @code{FAIL}.
>
> @cindex @code{mulhisi3} instruction pattern
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index c3b8e730960..a7003d5da8e 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -282,7 +282,8 @@ DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary)
> DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL, ECF_CONST, cmul, binary)
> DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL_CONJ, ECF_CONST, cmul_conj, binary)
> DEF_INTERNAL_OPTAB_FN (VEC_ADDSUB, ECF_CONST, vec_addsub, binary)
> -
> +DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, vec_fmaddsub, ternary)
> +DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd, ternary)
>
> /* FP scales. */
> DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index 41ab2598eb6..51acc1be8f5 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -408,6 +408,8 @@ OPTAB_D (vec_widen_usubl_lo_optab, "vec_widen_usubl_lo_$a")
> OPTAB_D (vec_widen_uaddl_hi_optab, "vec_widen_uaddl_hi_$a")
> OPTAB_D (vec_widen_uaddl_lo_optab, "vec_widen_uaddl_lo_$a")
> OPTAB_D (vec_addsub_optab, "vec_addsub$a3")
> +OPTAB_D (vec_fmaddsub_optab, "vec_fmaddsub$a4")
> +OPTAB_D (vec_fmsubadd_optab, "vec_fmsubadd$a4")
>
> OPTAB_D (sync_add_optab, "sync_add$I$a")
> OPTAB_D (sync_and_optab, "sync_and$I$a")
> diff --git a/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c
> new file mode 100644
> index 00000000000..b30d10731a7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c
> @@ -0,0 +1,34 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target fma } */
> +/* { dg-options "-O3 -mfma -save-temps" } */
> +
> +#include "fma-check.h"
> +
> +void __attribute__((noipa))
> +check_fmaddsub (double * __restrict a, double *b, double *c, int n)
> +{
> + for (int i = 0; i < n; ++i)
> + {
> + a[2*i + 0] = b[2*i + 0] * c[2*i + 0] - a[2*i + 0];
> + a[2*i + 1] = b[2*i + 1] * c[2*i + 1] + a[2*i + 1];
> + }
> +}
> +
> +static void
> +fma_test (void)
> +{
> + double a[4], b[4], c[4];
> + for (int i = 0; i < 4; ++i)
> + {
> + a[i] = i;
> + b[i] = 3*i;
> + c[i] = 7*i;
> + }
> + check_fmaddsub (a, b, c, 2);
> + const double d[4] = { 0., 22., 82., 192. };
> + for (int i = 0; i < 4; ++i)
> + if (a[i] != d[i])
> + __builtin_abort ();
> +}
> +
> +/* { dg-final { scan-assembler "fmaddsub...pd" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c
> new file mode 100644
> index 00000000000..cd2af8725a3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c
> @@ -0,0 +1,34 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target fma } */
> +/* { dg-options "-O3 -mfma -save-temps" } */
> +
> +#include "fma-check.h"
> +
> +void __attribute__((noipa))
> +check_fmaddsub (float * __restrict a, float *b, float *c, int n)
> +{
> + for (int i = 0; i < n; ++i)
> + {
> + a[2*i + 0] = b[2*i + 0] * c[2*i + 0] - a[2*i + 0];
> + a[2*i + 1] = b[2*i + 1] * c[2*i + 1] + a[2*i + 1];
> + }
> +}
> +
> +static void
> +fma_test (void)
> +{
> + float a[4], b[4], c[4];
> + for (int i = 0; i < 4; ++i)
> + {
> + a[i] = i;
> + b[i] = 3*i;
> + c[i] = 7*i;
> + }
> + check_fmaddsub (a, b, c, 2);
> + const float d[4] = { 0., 22., 82., 192. };
> + for (int i = 0; i < 4; ++i)
> + if (a[i] != d[i])
> + __builtin_abort ();
> +}
> +
> +/* { dg-final { scan-assembler "fmaddsub...ps" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c
> new file mode 100644
> index 00000000000..7ca2a275cc1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c
> @@ -0,0 +1,34 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target fma } */
> +/* { dg-options "-O3 -mfma -save-temps" } */
> +
> +#include "fma-check.h"
> +
> +void __attribute__((noipa))
> +check_fmsubadd (double * __restrict a, double *b, double *c, int n)
> +{
> + for (int i = 0; i < n; ++i)
> + {
> + a[2*i + 0] = b[2*i + 0] * c[2*i + 0] + a[2*i + 0];
> + a[2*i + 1] = b[2*i + 1] * c[2*i + 1] - a[2*i + 1];
> + }
> +}
> +
> +static void
> +fma_test (void)
> +{
> + double a[4], b[4], c[4];
> + for (int i = 0; i < 4; ++i)
> + {
> + a[i] = i;
> + b[i] = 3*i;
> + c[i] = 7*i;
> + }
> + check_fmsubadd (a, b, c, 2);
> + const double d[4] = { 0., 20., 86., 186. };
> + for (int i = 0; i < 4; ++i)
> + if (a[i] != d[i])
> + __builtin_abort ();
> +}
> +
> +/* { dg-final { scan-assembler "fmsubadd...pd" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c
> new file mode 100644
> index 00000000000..9ddd0e423db
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c
> @@ -0,0 +1,34 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target fma } */
> +/* { dg-options "-O3 -mfma -save-temps" } */
> +
> +#include "fma-check.h"
> +
> +void __attribute__((noipa))
> +check_fmsubadd (float * __restrict a, float *b, float *c, int n)
> +{
> + for (int i = 0; i < n; ++i)
> + {
> + a[2*i + 0] = b[2*i + 0] * c[2*i + 0] + a[2*i + 0];
> + a[2*i + 1] = b[2*i + 1] * c[2*i + 1] - a[2*i + 1];
> + }
> +}
> +
> +static void
> +fma_test (void)
> +{
> + float a[4], b[4], c[4];
> + for (int i = 0; i < 4; ++i)
> + {
> + a[i] = i;
> + b[i] = 3*i;
> + c[i] = 7*i;
> + }
> + check_fmsubadd (a, b, c, 2);
> + const float d[4] = { 0., 20., 86., 186. };
> + for (int i = 0; i < 4; ++i)
> + if (a[i] != d[i])
> + __builtin_abort ();
> +}
> +
> +/* { dg-final { scan-assembler "fmsubadd...ps" } } */
> diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
> index 2671f91972d..f774cac4a4d 100644
> --- a/gcc/tree-vect-slp-patterns.c
> +++ b/gcc/tree-vect-slp-patterns.c
> @@ -1496,8 +1496,8 @@ complex_operations_pattern::build (vec_info * /* vinfo */)
> class addsub_pattern : public vect_pattern
> {
> public:
> - addsub_pattern (slp_tree *node)
> - : vect_pattern (node, NULL, IFN_VEC_ADDSUB) {};
> + addsub_pattern (slp_tree *node, internal_fn ifn)
> + : vect_pattern (node, NULL, ifn) {};
>
> void build (vec_info *);
>
> @@ -1510,46 +1510,68 @@ addsub_pattern::recognize (slp_tree_to_load_perm_map_t *, slp_tree *node_)
> {
> slp_tree node = *node_;
> if (SLP_TREE_CODE (node) != VEC_PERM_EXPR
> - || SLP_TREE_CHILDREN (node).length () != 2)
> + || SLP_TREE_CHILDREN (node).length () != 2
> + || SLP_TREE_LANE_PERMUTATION (node).length () % 2)
> return NULL;
>
> /* Match a blend of a plus and a minus op with the same number of plus and
> minus lanes on the same operands. */
> - slp_tree sub = SLP_TREE_CHILDREN (node)[0];
> - slp_tree add = SLP_TREE_CHILDREN (node)[1];
> - bool swapped_p = false;
> - if (vect_match_expression_p (sub, PLUS_EXPR))
> - {
> - std::swap (add, sub);
> - swapped_p = true;
> - }
> - if (!(vect_match_expression_p (add, PLUS_EXPR)
> - && vect_match_expression_p (sub, MINUS_EXPR)))
> + unsigned l0 = SLP_TREE_LANE_PERMUTATION (node)[0].first;
> + unsigned l1 = SLP_TREE_LANE_PERMUTATION (node)[1].first;
> + if (l0 == l1)
> + return NULL;
> + bool l0add_p = vect_match_expression_p (SLP_TREE_CHILDREN (node)[l0],
> + PLUS_EXPR);
> + if (!l0add_p
> + && !vect_match_expression_p (SLP_TREE_CHILDREN (node)[l0], MINUS_EXPR))
> + return NULL;
> + bool l1add_p = vect_match_expression_p (SLP_TREE_CHILDREN (node)[l1],
> + PLUS_EXPR);
> + if (!l1add_p
> + && !vect_match_expression_p (SLP_TREE_CHILDREN (node)[l1], MINUS_EXPR))
> return NULL;
> - if (!((SLP_TREE_CHILDREN (sub)[0] == SLP_TREE_CHILDREN (add)[0]
> - && SLP_TREE_CHILDREN (sub)[1] == SLP_TREE_CHILDREN (add)[1])
> - || (SLP_TREE_CHILDREN (sub)[0] == SLP_TREE_CHILDREN (add)[1]
> - && SLP_TREE_CHILDREN (sub)[1] == SLP_TREE_CHILDREN (add)[0])))
> +
> + slp_tree l0node = SLP_TREE_CHILDREN (node)[l0];
> + slp_tree l1node = SLP_TREE_CHILDREN (node)[l1];
> + if (!((SLP_TREE_CHILDREN (l0node)[0] == SLP_TREE_CHILDREN (l1node)[0]
> + && SLP_TREE_CHILDREN (l0node)[1] == SLP_TREE_CHILDREN (l1node)[1])
> + || (SLP_TREE_CHILDREN (l0node)[0] == SLP_TREE_CHILDREN (l1node)[1]
> + && SLP_TREE_CHILDREN (l0node)[1] == SLP_TREE_CHILDREN (l1node)[0])))
> return NULL;
>
> for (unsigned i = 0; i < SLP_TREE_LANE_PERMUTATION (node).length (); ++i)
> {
> std::pair<unsigned, unsigned> perm = SLP_TREE_LANE_PERMUTATION (node)[i];
> - if (swapped_p)
> - perm.first = perm.first == 0 ? 1 : 0;
> - /* It has to be alternating -, +, -, ...
> + /* It has to be alternating -, +, -,
> While we could permute the .ADDSUB inputs and the .ADDSUB output
> that's only profitable over the add + sub + blend if at least
> one of the permute is optimized which we can't determine here. */
> - if (perm.first != (i & 1)
> + if (perm.first != ((i & 1) ? l1 : l0)
> || perm.second != i)
> return NULL;
> }
>
> - if (!vect_pattern_validate_optab (IFN_VEC_ADDSUB, node))
> - return NULL;
> + /* Now we have either { -, +, -, + ... } (!l0add_p) or { +, -, +, - ... }
> + (l0add_p), see whether we have FMA variants. */
> + if (!l0add_p
> + && vect_match_expression_p (SLP_TREE_CHILDREN (l0node)[0], MULT_EXPR))
> + {
> + /* (c * d) -+ a */
> + if (vect_pattern_validate_optab (IFN_VEC_FMADDSUB, node))
> + return new addsub_pattern (node_, IFN_VEC_FMADDSUB);
> + }
> + else if (l0add_p
> + && vect_match_expression_p (SLP_TREE_CHILDREN (l1node)[0], MULT_EXPR))
> + {
> + /* (c * d) +- a */
> + if (vect_pattern_validate_optab (IFN_VEC_FMSUBADD, node))
> + return new addsub_pattern (node_, IFN_VEC_FMSUBADD);
> + }
>
> - return new addsub_pattern (node_);
> + if (!l0add_p && vect_pattern_validate_optab (IFN_VEC_ADDSUB, node))
> + return new addsub_pattern (node_, IFN_VEC_ADDSUB);
> +
> + return NULL;
> }
>
> void
> @@ -1557,38 +1579,96 @@ addsub_pattern::build (vec_info *vinfo)
> {
> slp_tree node = *m_node;
>
> - slp_tree sub = SLP_TREE_CHILDREN (node)[0];
> - slp_tree add = SLP_TREE_CHILDREN (node)[1];
> - if (vect_match_expression_p (sub, PLUS_EXPR))
> - std::swap (add, sub);
> -
> - /* Modify the blend node in-place. */
> - SLP_TREE_CHILDREN (node)[0] = SLP_TREE_CHILDREN (sub)[0];
> - SLP_TREE_CHILDREN (node)[1] = SLP_TREE_CHILDREN (sub)[1];
> - SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[0])++;
> - SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[1])++;
> -
> - /* Build IFN_VEC_ADDSUB from the sub representative operands. */
> - stmt_vec_info rep = SLP_TREE_REPRESENTATIVE (sub);
> - gcall *call = gimple_build_call_internal (IFN_VEC_ADDSUB, 2,
> - gimple_assign_rhs1 (rep->stmt),
> - gimple_assign_rhs2 (rep->stmt));
> - gimple_call_set_lhs (call, make_ssa_name
> - (TREE_TYPE (gimple_assign_lhs (rep->stmt))));
> - gimple_call_set_nothrow (call, true);
> - gimple_set_bb (call, gimple_bb (rep->stmt));
> - stmt_vec_info new_rep = vinfo->add_pattern_stmt (call, rep);
> - SLP_TREE_REPRESENTATIVE (node) = new_rep;
> - STMT_VINFO_RELEVANT (new_rep) = vect_used_in_scope;
> - STMT_SLP_TYPE (new_rep) = pure_slp;
> - STMT_VINFO_VECTYPE (new_rep) = SLP_TREE_VECTYPE (node);
> - STMT_VINFO_SLP_VECT_ONLY_PATTERN (new_rep) = true;
> - STMT_VINFO_REDUC_DEF (new_rep) = STMT_VINFO_REDUC_DEF (vect_orig_stmt (rep));
> - SLP_TREE_CODE (node) = ERROR_MARK;
> - SLP_TREE_LANE_PERMUTATION (node).release ();
> -
> - vect_free_slp_tree (sub);
> - vect_free_slp_tree (add);
> + unsigned l0 = SLP_TREE_LANE_PERMUTATION (node)[0].first;
> + unsigned l1 = SLP_TREE_LANE_PERMUTATION (node)[1].first;
> +
> + switch (m_ifn)
> + {
> + case IFN_VEC_ADDSUB:
> + {
> + slp_tree sub = SLP_TREE_CHILDREN (node)[l0];
> + slp_tree add = SLP_TREE_CHILDREN (node)[l1];
> +
> + /* Modify the blend node in-place. */
> + SLP_TREE_CHILDREN (node)[0] = SLP_TREE_CHILDREN (sub)[0];
> + SLP_TREE_CHILDREN (node)[1] = SLP_TREE_CHILDREN (sub)[1];
> + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[0])++;
> + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[1])++;
> +
> + /* Build IFN_VEC_ADDSUB from the sub representative operands. */
> + stmt_vec_info rep = SLP_TREE_REPRESENTATIVE (sub);
> + gcall *call = gimple_build_call_internal (IFN_VEC_ADDSUB, 2,
> + gimple_assign_rhs1 (rep->stmt),
> + gimple_assign_rhs2 (rep->stmt));
> + gimple_call_set_lhs (call, make_ssa_name
> + (TREE_TYPE (gimple_assign_lhs (rep->stmt))));
> + gimple_call_set_nothrow (call, true);
> + gimple_set_bb (call, gimple_bb (rep->stmt));
> + stmt_vec_info new_rep = vinfo->add_pattern_stmt (call, rep);
> + SLP_TREE_REPRESENTATIVE (node) = new_rep;
> + STMT_VINFO_RELEVANT (new_rep) = vect_used_in_scope;
> + STMT_SLP_TYPE (new_rep) = pure_slp;
> + STMT_VINFO_VECTYPE (new_rep) = SLP_TREE_VECTYPE (node);
> + STMT_VINFO_SLP_VECT_ONLY_PATTERN (new_rep) = true;
> + STMT_VINFO_REDUC_DEF (new_rep) = STMT_VINFO_REDUC_DEF (vect_orig_stmt (rep));
> + SLP_TREE_CODE (node) = ERROR_MARK;
> + SLP_TREE_LANE_PERMUTATION (node).release ();
> +
> + vect_free_slp_tree (sub);
> + vect_free_slp_tree (add);
> + break;
> + }
> + case IFN_VEC_FMADDSUB:
> + case IFN_VEC_FMSUBADD:
> + {
> + slp_tree sub, add;
> + if (m_ifn == IFN_VEC_FMADDSUB)
> + {
> + sub = SLP_TREE_CHILDREN (node)[l0];
> + add = SLP_TREE_CHILDREN (node)[l1];
> + }
> + else /* m_ifn == IFN_VEC_FMSUBADD */
> + {
> + sub = SLP_TREE_CHILDREN (node)[l1];
> + add = SLP_TREE_CHILDREN (node)[l0];
> + }
> + slp_tree mul = SLP_TREE_CHILDREN (sub)[0];
> + /* Modify the blend node in-place. */
> + SLP_TREE_CHILDREN (node).safe_grow (3, true);
> + SLP_TREE_CHILDREN (node)[0] = SLP_TREE_CHILDREN (mul)[0];
> + SLP_TREE_CHILDREN (node)[1] = SLP_TREE_CHILDREN (mul)[1];
> + SLP_TREE_CHILDREN (node)[2] = SLP_TREE_CHILDREN (sub)[1];
> + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[0])++;
> + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[1])++;
> + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[2])++;
> +
> + /* Build IFN_VEC_FMADDSUB from the mul/sub representative operands. */
> + stmt_vec_info srep = SLP_TREE_REPRESENTATIVE (sub);
> + stmt_vec_info mrep = SLP_TREE_REPRESENTATIVE (mul);
> + gcall *call = gimple_build_call_internal (m_ifn, 3,
> + gimple_assign_rhs1 (mrep->stmt),
> + gimple_assign_rhs2 (mrep->stmt),
> + gimple_assign_rhs2 (srep->stmt));
> + gimple_call_set_lhs (call, make_ssa_name
> + (TREE_TYPE (gimple_assign_lhs (srep->stmt))));
> + gimple_call_set_nothrow (call, true);
> + gimple_set_bb (call, gimple_bb (srep->stmt));
> + stmt_vec_info new_rep = vinfo->add_pattern_stmt (call, srep);
> + SLP_TREE_REPRESENTATIVE (node) = new_rep;
> + STMT_VINFO_RELEVANT (new_rep) = vect_used_in_scope;
> + STMT_SLP_TYPE (new_rep) = pure_slp;
> + STMT_VINFO_VECTYPE (new_rep) = SLP_TREE_VECTYPE (node);
> + STMT_VINFO_SLP_VECT_ONLY_PATTERN (new_rep) = true;
> + STMT_VINFO_REDUC_DEF (new_rep) = STMT_VINFO_REDUC_DEF (vect_orig_stmt (srep));
> + SLP_TREE_CODE (node) = ERROR_MARK;
> + SLP_TREE_LANE_PERMUTATION (node).release ();
> +
> + vect_free_slp_tree (sub);
> + vect_free_slp_tree (add);
> + break;
> + }
> + default:;
> + }
> }
>
> /*******************************************************************************
> diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
> index f08797c2bc0..5357cd0e7a4 100644
> --- a/gcc/tree-vect-slp.c
> +++ b/gcc/tree-vect-slp.c
> @@ -3728,6 +3728,8 @@ vect_optimize_slp (vec_info *vinfo)
> case CFN_COMPLEX_MUL:
> case CFN_COMPLEX_MUL_CONJ:
> case CFN_VEC_ADDSUB:
> + case CFN_VEC_FMADDSUB:
> + case CFN_VEC_FMSUBADD:
> vertices[idx].perm_in = 0;
> vertices[idx].perm_out = 0;
> default:;
> --
> 2.26.2
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Add FMADDSUB and FMSUBADD SLP vectorization patterns and optabs
2021-07-05 14:25 ` Richard Biener
@ 2021-07-05 14:38 ` Richard Biener
0 siblings, 0 replies; 8+ messages in thread
From: Richard Biener @ 2021-07-05 14:38 UTC (permalink / raw)
To: Richard Biener; +Cc: GCC Patches, Hongtao Liu
On Mon, 5 Jul 2021, Richard Biener wrote:
> On Mon, Jul 5, 2021 at 4:09 PM Richard Biener <rguenther@suse.de> wrote:
> >
> > This adds named expanders for vec_fmaddsub<mode>4 and
> > vec_fmsubadd<mode>4 which map to x86 vfmaddsubXXXp{ds} and
> > vfmsubaddXXXp{ds} instructions. This complements the previous
> > addition of ADDSUB support.
> >
> > x86 lacks SUBADD and the negate variants of FMA with mixed
> > plus minus so I did not add optabs or patterns for those but
> > it would not be difficult if there's a target that has them.
> > Maybe one of the complex fma patterns match those variants?
> >
> > I did not dare to rewrite the numerous patterns to the new
> > canonical name but instead added two new expanders. Note I
> > did not cover AVX512 since the existing patterns are separated
> > and I have no easy way to test things there. Handling AVX512
> > should be easy as followup though.
> >
> > Bootstrap and testing on x86_64-unknown-linux-gnu in progress.
>
> FYI, building libgfortran matmul_c4 we hit
>
> /home/rguenther/src/trunk/libgfortran/generated/matmul_c4.c:1781:1:
> error: unrecognizable insn:
> 1781 | }
> | ^
> (insn 5408 5407 5409 213 (set (reg:V8SF 1454 [ vect__4368.5363 ])
> (unspec:V8SF [
> (reg:V8SF 4391)
> (reg:V8SF 4398)
> (reg:V8SF 4415 [ vect__2005.5362 ])
> ] UNSPEC_FMADDSUB)) -1
> (nil))
> during RTL pass: vregs
>
> so it looks like the existing fmaddsub_<mode> expander cannot be
> simply re-purposed?
Ah, using the VF_128_256 iterator and removing the || TARGET_AVX512F
predication fixes it. There's a avx512f but not fma target variant
of matmul which likely lacks avx512vl for the above. So consider it
changed this way. Not sure if there's a more appropriate iterator
that catches this case.
Richard.
> > Any comments?
> >
> > Thanks,
> > Richard.
> >
> > 2021-07-05 Richard Biener <rguenther@suse.de>
> >
> > * doc/md.texi (vec_fmaddsub<mode>4): Document.
> > (vec_fmsubadd<mode>4): Likewise.
> > * optabs.def (vec_fmaddsub$a4): Add.
> > (vec_fmsubadd$a4): Likewise.
> > * internal-fn.def (IFN_VEC_FMADDSUB): Add.
> > (IFN_VEC_FMSUBADD): Likewise.
> > * tree-vect-slp-patterns.c (addsub_pattern::recognize):
> > Refactor to handle IFN_VEC_FMADDSUB and IFN_VEC_FMSUBADD.
> > (addsub_pattern::build): Likewise.
> > * tree-vect-slp.c (vect_optimize_slp): CFN_VEC_FMADDSUB
> > and CFN_VEC_FMSUBADD are not transparent for permutes.
> > * config/i386/sse.md (vec_fmaddsub<mode>4): New expander.
> > (vec_fmsubadd<mode>4): Likewise.
> >
> > * gcc.target/i386/vect-fmaddsubXXXpd.c: New testcase.
> > * gcc.target/i386/vect-fmaddsubXXXps.c: Likewise.
> > * gcc.target/i386/vect-fmsubaddXXXpd.c: Likewise.
> > * gcc.target/i386/vect-fmsubaddXXXps.c: Likewise.
> > ---
> > gcc/config/i386/sse.md | 19 ++
> > gcc/doc/md.texi | 14 ++
> > gcc/internal-fn.def | 3 +-
> > gcc/optabs.def | 2 +
> > .../gcc.target/i386/vect-fmaddsubXXXpd.c | 34 ++++
> > .../gcc.target/i386/vect-fmaddsubXXXps.c | 34 ++++
> > .../gcc.target/i386/vect-fmsubaddXXXpd.c | 34 ++++
> > .../gcc.target/i386/vect-fmsubaddXXXps.c | 34 ++++
> > gcc/tree-vect-slp-patterns.c | 192 +++++++++++++-----
> > gcc/tree-vect-slp.c | 2 +
> > 10 files changed, 311 insertions(+), 57 deletions(-)
> > create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c
> >
> > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > index bcf1605d147..6fc13c184bf 100644
> > --- a/gcc/config/i386/sse.md
> > +++ b/gcc/config/i386/sse.md
> > @@ -4644,6 +4644,25 @@
> > ;;
> > ;; But this doesn't seem useful in practice.
> >
> > +(define_expand "vec_fmaddsub<mode>4"
> > + [(set (match_operand:VF 0 "register_operand")
> > + (unspec:VF
> > + [(match_operand:VF 1 "nonimmediate_operand")
> > + (match_operand:VF 2 "nonimmediate_operand")
> > + (match_operand:VF 3 "nonimmediate_operand")]
> > + UNSPEC_FMADDSUB))]
> > + "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F")
> > +
> > +(define_expand "vec_fmsubadd<mode>4"
> > + [(set (match_operand:VF 0 "register_operand")
> > + (unspec:VF
> > + [(match_operand:VF 1 "nonimmediate_operand")
> > + (match_operand:VF 2 "nonimmediate_operand")
> > + (neg:VF
> > + (match_operand:VF 3 "nonimmediate_operand"))]
> > + UNSPEC_FMADDSUB))]
> > + "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F")
> > +
> > (define_expand "fmaddsub_<mode>"
> > [(set (match_operand:VF 0 "register_operand")
> > (unspec:VF
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > index 1b918144330..cc92ebd26aa 100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -5688,6 +5688,20 @@ Alternating subtract, add with even lanes doing subtract and odd
> > lanes doing addition. Operands 1 and 2 and the outout operand are vectors
> > with mode @var{m}.
> >
> > +@cindex @code{vec_fmaddsub@var{m}4} instruction pattern
> > +@item @samp{vec_fmaddsub@var{m}4}
> > +Alternating multiply subtract, add with even lanes doing subtract and odd
> > +lanes doing addition of the third operand to the multiplication result
> > +of the first two operands. Operands 1, 2 and 3 and the outout operand are vectors
> > +with mode @var{m}.
> > +
> > +@cindex @code{vec_fmsubadd@var{m}4} instruction pattern
> > +@item @samp{vec_fmsubadd@var{m}4}
> > +Alternating multiply add, subtract with even lanes doing addition and odd
> > +lanes doing subtraction of the third operand to the multiplication result
> > +of the first two operands. Operands 1, 2 and 3 and the outout operand are vectors
> > +with mode @var{m}.
> > +
> > These instructions are not allowed to @code{FAIL}.
> >
> > @cindex @code{mulhisi3} instruction pattern
> > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> > index c3b8e730960..a7003d5da8e 100644
> > --- a/gcc/internal-fn.def
> > +++ b/gcc/internal-fn.def
> > @@ -282,7 +282,8 @@ DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary)
> > DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL, ECF_CONST, cmul, binary)
> > DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL_CONJ, ECF_CONST, cmul_conj, binary)
> > DEF_INTERNAL_OPTAB_FN (VEC_ADDSUB, ECF_CONST, vec_addsub, binary)
> > -
> > +DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, vec_fmaddsub, ternary)
> > +DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd, ternary)
> >
> > /* FP scales. */
> > DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
> > diff --git a/gcc/optabs.def b/gcc/optabs.def
> > index 41ab2598eb6..51acc1be8f5 100644
> > --- a/gcc/optabs.def
> > +++ b/gcc/optabs.def
> > @@ -408,6 +408,8 @@ OPTAB_D (vec_widen_usubl_lo_optab, "vec_widen_usubl_lo_$a")
> > OPTAB_D (vec_widen_uaddl_hi_optab, "vec_widen_uaddl_hi_$a")
> > OPTAB_D (vec_widen_uaddl_lo_optab, "vec_widen_uaddl_lo_$a")
> > OPTAB_D (vec_addsub_optab, "vec_addsub$a3")
> > +OPTAB_D (vec_fmaddsub_optab, "vec_fmaddsub$a4")
> > +OPTAB_D (vec_fmsubadd_optab, "vec_fmsubadd$a4")
> >
> > OPTAB_D (sync_add_optab, "sync_add$I$a")
> > OPTAB_D (sync_and_optab, "sync_and$I$a")
> > diff --git a/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c
> > new file mode 100644
> > index 00000000000..b30d10731a7
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c
> > @@ -0,0 +1,34 @@
> > +/* { dg-do run } */
> > +/* { dg-require-effective-target fma } */
> > +/* { dg-options "-O3 -mfma -save-temps" } */
> > +
> > +#include "fma-check.h"
> > +
> > +void __attribute__((noipa))
> > +check_fmaddsub (double * __restrict a, double *b, double *c, int n)
> > +{
> > + for (int i = 0; i < n; ++i)
> > + {
> > + a[2*i + 0] = b[2*i + 0] * c[2*i + 0] - a[2*i + 0];
> > + a[2*i + 1] = b[2*i + 1] * c[2*i + 1] + a[2*i + 1];
> > + }
> > +}
> > +
> > +static void
> > +fma_test (void)
> > +{
> > + double a[4], b[4], c[4];
> > + for (int i = 0; i < 4; ++i)
> > + {
> > + a[i] = i;
> > + b[i] = 3*i;
> > + c[i] = 7*i;
> > + }
> > + check_fmaddsub (a, b, c, 2);
> > + const double d[4] = { 0., 22., 82., 192. };
> > + for (int i = 0; i < 4; ++i)
> > + if (a[i] != d[i])
> > + __builtin_abort ();
> > +}
> > +
> > +/* { dg-final { scan-assembler "fmaddsub...pd" } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c
> > new file mode 100644
> > index 00000000000..cd2af8725a3
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c
> > @@ -0,0 +1,34 @@
> > +/* { dg-do run } */
> > +/* { dg-require-effective-target fma } */
> > +/* { dg-options "-O3 -mfma -save-temps" } */
> > +
> > +#include "fma-check.h"
> > +
> > +void __attribute__((noipa))
> > +check_fmaddsub (float * __restrict a, float *b, float *c, int n)
> > +{
> > + for (int i = 0; i < n; ++i)
> > + {
> > + a[2*i + 0] = b[2*i + 0] * c[2*i + 0] - a[2*i + 0];
> > + a[2*i + 1] = b[2*i + 1] * c[2*i + 1] + a[2*i + 1];
> > + }
> > +}
> > +
> > +static void
> > +fma_test (void)
> > +{
> > + float a[4], b[4], c[4];
> > + for (int i = 0; i < 4; ++i)
> > + {
> > + a[i] = i;
> > + b[i] = 3*i;
> > + c[i] = 7*i;
> > + }
> > + check_fmaddsub (a, b, c, 2);
> > + const float d[4] = { 0., 22., 82., 192. };
> > + for (int i = 0; i < 4; ++i)
> > + if (a[i] != d[i])
> > + __builtin_abort ();
> > +}
> > +
> > +/* { dg-final { scan-assembler "fmaddsub...ps" } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c
> > new file mode 100644
> > index 00000000000..7ca2a275cc1
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c
> > @@ -0,0 +1,34 @@
> > +/* { dg-do run } */
> > +/* { dg-require-effective-target fma } */
> > +/* { dg-options "-O3 -mfma -save-temps" } */
> > +
> > +#include "fma-check.h"
> > +
> > +void __attribute__((noipa))
> > +check_fmsubadd (double * __restrict a, double *b, double *c, int n)
> > +{
> > + for (int i = 0; i < n; ++i)
> > + {
> > + a[2*i + 0] = b[2*i + 0] * c[2*i + 0] + a[2*i + 0];
> > + a[2*i + 1] = b[2*i + 1] * c[2*i + 1] - a[2*i + 1];
> > + }
> > +}
> > +
> > +static void
> > +fma_test (void)
> > +{
> > + double a[4], b[4], c[4];
> > + for (int i = 0; i < 4; ++i)
> > + {
> > + a[i] = i;
> > + b[i] = 3*i;
> > + c[i] = 7*i;
> > + }
> > + check_fmsubadd (a, b, c, 2);
> > + const double d[4] = { 0., 20., 86., 186. };
> > + for (int i = 0; i < 4; ++i)
> > + if (a[i] != d[i])
> > + __builtin_abort ();
> > +}
> > +
> > +/* { dg-final { scan-assembler "fmsubadd...pd" } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c
> > new file mode 100644
> > index 00000000000..9ddd0e423db
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c
> > @@ -0,0 +1,34 @@
> > +/* { dg-do run } */
> > +/* { dg-require-effective-target fma } */
> > +/* { dg-options "-O3 -mfma -save-temps" } */
> > +
> > +#include "fma-check.h"
> > +
> > +void __attribute__((noipa))
> > +check_fmsubadd (float * __restrict a, float *b, float *c, int n)
> > +{
> > + for (int i = 0; i < n; ++i)
> > + {
> > + a[2*i + 0] = b[2*i + 0] * c[2*i + 0] + a[2*i + 0];
> > + a[2*i + 1] = b[2*i + 1] * c[2*i + 1] - a[2*i + 1];
> > + }
> > +}
> > +
> > +static void
> > +fma_test (void)
> > +{
> > + float a[4], b[4], c[4];
> > + for (int i = 0; i < 4; ++i)
> > + {
> > + a[i] = i;
> > + b[i] = 3*i;
> > + c[i] = 7*i;
> > + }
> > + check_fmsubadd (a, b, c, 2);
> > + const float d[4] = { 0., 20., 86., 186. };
> > + for (int i = 0; i < 4; ++i)
> > + if (a[i] != d[i])
> > + __builtin_abort ();
> > +}
> > +
> > +/* { dg-final { scan-assembler "fmsubadd...ps" } } */
> > diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
> > index 2671f91972d..f774cac4a4d 100644
> > --- a/gcc/tree-vect-slp-patterns.c
> > +++ b/gcc/tree-vect-slp-patterns.c
> > @@ -1496,8 +1496,8 @@ complex_operations_pattern::build (vec_info * /* vinfo */)
> > class addsub_pattern : public vect_pattern
> > {
> > public:
> > - addsub_pattern (slp_tree *node)
> > - : vect_pattern (node, NULL, IFN_VEC_ADDSUB) {};
> > + addsub_pattern (slp_tree *node, internal_fn ifn)
> > + : vect_pattern (node, NULL, ifn) {};
> >
> > void build (vec_info *);
> >
> > @@ -1510,46 +1510,68 @@ addsub_pattern::recognize (slp_tree_to_load_perm_map_t *, slp_tree *node_)
> > {
> > slp_tree node = *node_;
> > if (SLP_TREE_CODE (node) != VEC_PERM_EXPR
> > - || SLP_TREE_CHILDREN (node).length () != 2)
> > + || SLP_TREE_CHILDREN (node).length () != 2
> > + || SLP_TREE_LANE_PERMUTATION (node).length () % 2)
> > return NULL;
> >
> > /* Match a blend of a plus and a minus op with the same number of plus and
> > minus lanes on the same operands. */
> > - slp_tree sub = SLP_TREE_CHILDREN (node)[0];
> > - slp_tree add = SLP_TREE_CHILDREN (node)[1];
> > - bool swapped_p = false;
> > - if (vect_match_expression_p (sub, PLUS_EXPR))
> > - {
> > - std::swap (add, sub);
> > - swapped_p = true;
> > - }
> > - if (!(vect_match_expression_p (add, PLUS_EXPR)
> > - && vect_match_expression_p (sub, MINUS_EXPR)))
> > + unsigned l0 = SLP_TREE_LANE_PERMUTATION (node)[0].first;
> > + unsigned l1 = SLP_TREE_LANE_PERMUTATION (node)[1].first;
> > + if (l0 == l1)
> > + return NULL;
> > + bool l0add_p = vect_match_expression_p (SLP_TREE_CHILDREN (node)[l0],
> > + PLUS_EXPR);
> > + if (!l0add_p
> > + && !vect_match_expression_p (SLP_TREE_CHILDREN (node)[l0], MINUS_EXPR))
> > + return NULL;
> > + bool l1add_p = vect_match_expression_p (SLP_TREE_CHILDREN (node)[l1],
> > + PLUS_EXPR);
> > + if (!l1add_p
> > + && !vect_match_expression_p (SLP_TREE_CHILDREN (node)[l1], MINUS_EXPR))
> > return NULL;
> > - if (!((SLP_TREE_CHILDREN (sub)[0] == SLP_TREE_CHILDREN (add)[0]
> > - && SLP_TREE_CHILDREN (sub)[1] == SLP_TREE_CHILDREN (add)[1])
> > - || (SLP_TREE_CHILDREN (sub)[0] == SLP_TREE_CHILDREN (add)[1]
> > - && SLP_TREE_CHILDREN (sub)[1] == SLP_TREE_CHILDREN (add)[0])))
> > +
> > + slp_tree l0node = SLP_TREE_CHILDREN (node)[l0];
> > + slp_tree l1node = SLP_TREE_CHILDREN (node)[l1];
> > + if (!((SLP_TREE_CHILDREN (l0node)[0] == SLP_TREE_CHILDREN (l1node)[0]
> > + && SLP_TREE_CHILDREN (l0node)[1] == SLP_TREE_CHILDREN (l1node)[1])
> > + || (SLP_TREE_CHILDREN (l0node)[0] == SLP_TREE_CHILDREN (l1node)[1]
> > + && SLP_TREE_CHILDREN (l0node)[1] == SLP_TREE_CHILDREN (l1node)[0])))
> > return NULL;
> >
> > for (unsigned i = 0; i < SLP_TREE_LANE_PERMUTATION (node).length (); ++i)
> > {
> > std::pair<unsigned, unsigned> perm = SLP_TREE_LANE_PERMUTATION (node)[i];
> > - if (swapped_p)
> > - perm.first = perm.first == 0 ? 1 : 0;
> > - /* It has to be alternating -, +, -, ...
> > + /* It has to be alternating -, +, -,
> > While we could permute the .ADDSUB inputs and the .ADDSUB output
> > that's only profitable over the add + sub + blend if at least
> > one of the permute is optimized which we can't determine here. */
> > - if (perm.first != (i & 1)
> > + if (perm.first != ((i & 1) ? l1 : l0)
> > || perm.second != i)
> > return NULL;
> > }
> >
> > - if (!vect_pattern_validate_optab (IFN_VEC_ADDSUB, node))
> > - return NULL;
> > + /* Now we have either { -, +, -, + ... } (!l0add_p) or { +, -, +, - ... }
> > + (l0add_p), see whether we have FMA variants. */
> > + if (!l0add_p
> > + && vect_match_expression_p (SLP_TREE_CHILDREN (l0node)[0], MULT_EXPR))
> > + {
> > + /* (c * d) -+ a */
> > + if (vect_pattern_validate_optab (IFN_VEC_FMADDSUB, node))
> > + return new addsub_pattern (node_, IFN_VEC_FMADDSUB);
> > + }
> > + else if (l0add_p
> > + && vect_match_expression_p (SLP_TREE_CHILDREN (l1node)[0], MULT_EXPR))
> > + {
> > + /* (c * d) +- a */
> > + if (vect_pattern_validate_optab (IFN_VEC_FMSUBADD, node))
> > + return new addsub_pattern (node_, IFN_VEC_FMSUBADD);
> > + }
> >
> > - return new addsub_pattern (node_);
> > + if (!l0add_p && vect_pattern_validate_optab (IFN_VEC_ADDSUB, node))
> > + return new addsub_pattern (node_, IFN_VEC_ADDSUB);
> > +
> > + return NULL;
> > }
> >
> > void
> > @@ -1557,38 +1579,96 @@ addsub_pattern::build (vec_info *vinfo)
> > {
> > slp_tree node = *m_node;
> >
> > - slp_tree sub = SLP_TREE_CHILDREN (node)[0];
> > - slp_tree add = SLP_TREE_CHILDREN (node)[1];
> > - if (vect_match_expression_p (sub, PLUS_EXPR))
> > - std::swap (add, sub);
> > -
> > - /* Modify the blend node in-place. */
> > - SLP_TREE_CHILDREN (node)[0] = SLP_TREE_CHILDREN (sub)[0];
> > - SLP_TREE_CHILDREN (node)[1] = SLP_TREE_CHILDREN (sub)[1];
> > - SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[0])++;
> > - SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[1])++;
> > -
> > - /* Build IFN_VEC_ADDSUB from the sub representative operands. */
> > - stmt_vec_info rep = SLP_TREE_REPRESENTATIVE (sub);
> > - gcall *call = gimple_build_call_internal (IFN_VEC_ADDSUB, 2,
> > - gimple_assign_rhs1 (rep->stmt),
> > - gimple_assign_rhs2 (rep->stmt));
> > - gimple_call_set_lhs (call, make_ssa_name
> > - (TREE_TYPE (gimple_assign_lhs (rep->stmt))));
> > - gimple_call_set_nothrow (call, true);
> > - gimple_set_bb (call, gimple_bb (rep->stmt));
> > - stmt_vec_info new_rep = vinfo->add_pattern_stmt (call, rep);
> > - SLP_TREE_REPRESENTATIVE (node) = new_rep;
> > - STMT_VINFO_RELEVANT (new_rep) = vect_used_in_scope;
> > - STMT_SLP_TYPE (new_rep) = pure_slp;
> > - STMT_VINFO_VECTYPE (new_rep) = SLP_TREE_VECTYPE (node);
> > - STMT_VINFO_SLP_VECT_ONLY_PATTERN (new_rep) = true;
> > - STMT_VINFO_REDUC_DEF (new_rep) = STMT_VINFO_REDUC_DEF (vect_orig_stmt (rep));
> > - SLP_TREE_CODE (node) = ERROR_MARK;
> > - SLP_TREE_LANE_PERMUTATION (node).release ();
> > -
> > - vect_free_slp_tree (sub);
> > - vect_free_slp_tree (add);
> > + unsigned l0 = SLP_TREE_LANE_PERMUTATION (node)[0].first;
> > + unsigned l1 = SLP_TREE_LANE_PERMUTATION (node)[1].first;
> > +
> > + switch (m_ifn)
> > + {
> > + case IFN_VEC_ADDSUB:
> > + {
> > + slp_tree sub = SLP_TREE_CHILDREN (node)[l0];
> > + slp_tree add = SLP_TREE_CHILDREN (node)[l1];
> > +
> > + /* Modify the blend node in-place. */
> > + SLP_TREE_CHILDREN (node)[0] = SLP_TREE_CHILDREN (sub)[0];
> > + SLP_TREE_CHILDREN (node)[1] = SLP_TREE_CHILDREN (sub)[1];
> > + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[0])++;
> > + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[1])++;
> > +
> > + /* Build IFN_VEC_ADDSUB from the sub representative operands. */
> > + stmt_vec_info rep = SLP_TREE_REPRESENTATIVE (sub);
> > + gcall *call = gimple_build_call_internal (IFN_VEC_ADDSUB, 2,
> > + gimple_assign_rhs1 (rep->stmt),
> > + gimple_assign_rhs2 (rep->stmt));
> > + gimple_call_set_lhs (call, make_ssa_name
> > + (TREE_TYPE (gimple_assign_lhs (rep->stmt))));
> > + gimple_call_set_nothrow (call, true);
> > + gimple_set_bb (call, gimple_bb (rep->stmt));
> > + stmt_vec_info new_rep = vinfo->add_pattern_stmt (call, rep);
> > + SLP_TREE_REPRESENTATIVE (node) = new_rep;
> > + STMT_VINFO_RELEVANT (new_rep) = vect_used_in_scope;
> > + STMT_SLP_TYPE (new_rep) = pure_slp;
> > + STMT_VINFO_VECTYPE (new_rep) = SLP_TREE_VECTYPE (node);
> > + STMT_VINFO_SLP_VECT_ONLY_PATTERN (new_rep) = true;
> > + STMT_VINFO_REDUC_DEF (new_rep) = STMT_VINFO_REDUC_DEF (vect_orig_stmt (rep));
> > + SLP_TREE_CODE (node) = ERROR_MARK;
> > + SLP_TREE_LANE_PERMUTATION (node).release ();
> > +
> > + vect_free_slp_tree (sub);
> > + vect_free_slp_tree (add);
> > + break;
> > + }
> > + case IFN_VEC_FMADDSUB:
> > + case IFN_VEC_FMSUBADD:
> > + {
> > + slp_tree sub, add;
> > + if (m_ifn == IFN_VEC_FMADDSUB)
> > + {
> > + sub = SLP_TREE_CHILDREN (node)[l0];
> > + add = SLP_TREE_CHILDREN (node)[l1];
> > + }
> > + else /* m_ifn == IFN_VEC_FMSUBADD */
> > + {
> > + sub = SLP_TREE_CHILDREN (node)[l1];
> > + add = SLP_TREE_CHILDREN (node)[l0];
> > + }
> > + slp_tree mul = SLP_TREE_CHILDREN (sub)[0];
> > + /* Modify the blend node in-place. */
> > + SLP_TREE_CHILDREN (node).safe_grow (3, true);
> > + SLP_TREE_CHILDREN (node)[0] = SLP_TREE_CHILDREN (mul)[0];
> > + SLP_TREE_CHILDREN (node)[1] = SLP_TREE_CHILDREN (mul)[1];
> > + SLP_TREE_CHILDREN (node)[2] = SLP_TREE_CHILDREN (sub)[1];
> > + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[0])++;
> > + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[1])++;
> > + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[2])++;
> > +
> > + /* Build IFN_VEC_FMADDSUB from the mul/sub representative operands. */
> > + stmt_vec_info srep = SLP_TREE_REPRESENTATIVE (sub);
> > + stmt_vec_info mrep = SLP_TREE_REPRESENTATIVE (mul);
> > + gcall *call = gimple_build_call_internal (m_ifn, 3,
> > + gimple_assign_rhs1 (mrep->stmt),
> > + gimple_assign_rhs2 (mrep->stmt),
> > + gimple_assign_rhs2 (srep->stmt));
> > + gimple_call_set_lhs (call, make_ssa_name
> > + (TREE_TYPE (gimple_assign_lhs (srep->stmt))));
> > + gimple_call_set_nothrow (call, true);
> > + gimple_set_bb (call, gimple_bb (srep->stmt));
> > + stmt_vec_info new_rep = vinfo->add_pattern_stmt (call, srep);
> > + SLP_TREE_REPRESENTATIVE (node) = new_rep;
> > + STMT_VINFO_RELEVANT (new_rep) = vect_used_in_scope;
> > + STMT_SLP_TYPE (new_rep) = pure_slp;
> > + STMT_VINFO_VECTYPE (new_rep) = SLP_TREE_VECTYPE (node);
> > + STMT_VINFO_SLP_VECT_ONLY_PATTERN (new_rep) = true;
> > + STMT_VINFO_REDUC_DEF (new_rep) = STMT_VINFO_REDUC_DEF (vect_orig_stmt (srep));
> > + SLP_TREE_CODE (node) = ERROR_MARK;
> > + SLP_TREE_LANE_PERMUTATION (node).release ();
> > +
> > + vect_free_slp_tree (sub);
> > + vect_free_slp_tree (add);
> > + break;
> > + }
> > + default:;
> > + }
> > }
> >
> > /*******************************************************************************
> > diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
> > index f08797c2bc0..5357cd0e7a4 100644
> > --- a/gcc/tree-vect-slp.c
> > +++ b/gcc/tree-vect-slp.c
> > @@ -3728,6 +3728,8 @@ vect_optimize_slp (vec_info *vinfo)
> > case CFN_COMPLEX_MUL:
> > case CFN_COMPLEX_MUL_CONJ:
> > case CFN_VEC_ADDSUB:
> > + case CFN_VEC_FMADDSUB:
> > + case CFN_VEC_FMSUBADD:
> > vertices[idx].perm_in = 0;
> > vertices[idx].perm_out = 0;
> > default:;
> > --
> > 2.26.2
>
--
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Add FMADDSUB and FMSUBADD SLP vectorization patterns and optabs
2021-07-05 14:09 [PATCH] Add FMADDSUB and FMSUBADD SLP vectorization patterns and optabs Richard Biener
2021-07-05 14:25 ` Richard Biener
@ 2021-07-06 2:16 ` Hongtao Liu
2021-07-06 7:42 ` Richard Biener
1 sibling, 1 reply; 8+ messages in thread
From: Hongtao Liu @ 2021-07-06 2:16 UTC (permalink / raw)
To: Richard Biener; +Cc: GCC Patches, Liu, Hongtao
On Mon, Jul 5, 2021 at 10:09 PM Richard Biener <rguenther@suse.de> wrote:
>
> This adds named expanders for vec_fmaddsub<mode>4 and
> vec_fmsubadd<mode>4 which map to x86 vfmaddsubXXXp{ds} and
> vfmsubaddXXXp{ds} instructions. This complements the previous
> addition of ADDSUB support.
>
> x86 lacks SUBADD and the negate variants of FMA with mixed
> plus minus so I did not add optabs or patterns for those but
> it would not be difficult if there's a target that has them.
> Maybe one of the complex fma patterns match those variants?
>
> I did not dare to rewrite the numerous patterns to the new
> canonical name but instead added two new expanders. Note I
> did not cover AVX512 since the existing patterns are separated
> and I have no easy way to test things there. Handling AVX512
> should be easy as followup though.
>
> Bootstrap and testing on x86_64-unknown-linux-gnu in progress.
>
> Any comments?
>
> Thanks,
> Richard.
>
> 2021-07-05 Richard Biener <rguenther@suse.de>
>
> * doc/md.texi (vec_fmaddsub<mode>4): Document.
> (vec_fmsubadd<mode>4): Likewise.
> * optabs.def (vec_fmaddsub$a4): Add.
> (vec_fmsubadd$a4): Likewise.
> * internal-fn.def (IFN_VEC_FMADDSUB): Add.
> (IFN_VEC_FMSUBADD): Likewise.
> * tree-vect-slp-patterns.c (addsub_pattern::recognize):
> Refactor to handle IFN_VEC_FMADDSUB and IFN_VEC_FMSUBADD.
> (addsub_pattern::build): Likewise.
> * tree-vect-slp.c (vect_optimize_slp): CFN_VEC_FMADDSUB
> and CFN_VEC_FMSUBADD are not transparent for permutes.
> * config/i386/sse.md (vec_fmaddsub<mode>4): New expander.
> (vec_fmsubadd<mode>4): Likewise.
>
> * gcc.target/i386/vect-fmaddsubXXXpd.c: New testcase.
> * gcc.target/i386/vect-fmaddsubXXXps.c: Likewise.
> * gcc.target/i386/vect-fmsubaddXXXpd.c: Likewise.
> * gcc.target/i386/vect-fmsubaddXXXps.c: Likewise.
> ---
> gcc/config/i386/sse.md | 19 ++
> gcc/doc/md.texi | 14 ++
> gcc/internal-fn.def | 3 +-
> gcc/optabs.def | 2 +
> .../gcc.target/i386/vect-fmaddsubXXXpd.c | 34 ++++
> .../gcc.target/i386/vect-fmaddsubXXXps.c | 34 ++++
> .../gcc.target/i386/vect-fmsubaddXXXpd.c | 34 ++++
> .../gcc.target/i386/vect-fmsubaddXXXps.c | 34 ++++
> gcc/tree-vect-slp-patterns.c | 192 +++++++++++++-----
> gcc/tree-vect-slp.c | 2 +
> 10 files changed, 311 insertions(+), 57 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c
> create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c
> create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c
> create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index bcf1605d147..6fc13c184bf 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -4644,6 +4644,25 @@
> ;;
> ;; But this doesn't seem useful in practice.
>
> +(define_expand "vec_fmaddsub<mode>4"
> + [(set (match_operand:VF 0 "register_operand")
> + (unspec:VF
> + [(match_operand:VF 1 "nonimmediate_operand")
> + (match_operand:VF 2 "nonimmediate_operand")
> + (match_operand:VF 3 "nonimmediate_operand")]
> + UNSPEC_FMADDSUB))]
> + "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F")
> +
> +(define_expand "vec_fmsubadd<mode>4"
> + [(set (match_operand:VF 0 "register_operand")
> + (unspec:VF
> + [(match_operand:VF 1 "nonimmediate_operand")
> + (match_operand:VF 2 "nonimmediate_operand")
> + (neg:VF
> + (match_operand:VF 3 "nonimmediate_operand"))]
> + UNSPEC_FMADDSUB))]
> + "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F")
> +
W/ condition like
"TARGET_FMA || TARGET_FMA4
|| (<MODE_SIZE> == 64 || TARGET_AVX512VL)“?
the original expander "fmaddsub_<mode>" is only used by builtins which
have it's own guard for AVX512VL, It doesn't matter if it doesn't have
TARGET_AVX512VL
BDESC (OPTION_MASK_ISA_AVX512VL, 0,
CODE_FOR_avx512vl_fmaddsub_v4df_mask,
"__builtin_ia32_vfmaddsubpd256_mask",
IX86_BUILTIN_VFMADDSUBPD256_MASK, UNKNOWN, (int)
V4DF_FTYPE_V4DF_V4DF_V4DF_UQI)
> (define_expand "fmaddsub_<mode>"
> [(set (match_operand:VF 0 "register_operand")
> (unspec:VF
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 1b918144330..cc92ebd26aa 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5688,6 +5688,20 @@ Alternating subtract, add with even lanes doing subtract and odd
> lanes doing addition. Operands 1 and 2 and the outout operand are vectors
> with mode @var{m}.
>
> +@cindex @code{vec_fmaddsub@var{m}4} instruction pattern
> +@item @samp{vec_fmaddsub@var{m}4}
> +Alternating multiply subtract, add with even lanes doing subtract and odd
> +lanes doing addition of the third operand to the multiplication result
> +of the first two operands. Operands 1, 2 and 3 and the outout operand are vectors
> +with mode @var{m}.
> +
> +@cindex @code{vec_fmsubadd@var{m}4} instruction pattern
> +@item @samp{vec_fmsubadd@var{m}4}
> +Alternating multiply add, subtract with even lanes doing addition and odd
> +lanes doing subtraction of the third operand to the multiplication result
> +of the first two operands. Operands 1, 2 and 3 and the outout operand are vectors
> +with mode @var{m}.
> +
> These instructions are not allowed to @code{FAIL}.
>
> @cindex @code{mulhisi3} instruction pattern
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index c3b8e730960..a7003d5da8e 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -282,7 +282,8 @@ DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary)
> DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL, ECF_CONST, cmul, binary)
> DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL_CONJ, ECF_CONST, cmul_conj, binary)
> DEF_INTERNAL_OPTAB_FN (VEC_ADDSUB, ECF_CONST, vec_addsub, binary)
> -
> +DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, vec_fmaddsub, ternary)
> +DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd, ternary)
>
> /* FP scales. */
> DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index 41ab2598eb6..51acc1be8f5 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -408,6 +408,8 @@ OPTAB_D (vec_widen_usubl_lo_optab, "vec_widen_usubl_lo_$a")
> OPTAB_D (vec_widen_uaddl_hi_optab, "vec_widen_uaddl_hi_$a")
> OPTAB_D (vec_widen_uaddl_lo_optab, "vec_widen_uaddl_lo_$a")
> OPTAB_D (vec_addsub_optab, "vec_addsub$a3")
> +OPTAB_D (vec_fmaddsub_optab, "vec_fmaddsub$a4")
> +OPTAB_D (vec_fmsubadd_optab, "vec_fmsubadd$a4")
>
> OPTAB_D (sync_add_optab, "sync_add$I$a")
> OPTAB_D (sync_and_optab, "sync_and$I$a")
> diff --git a/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c
> new file mode 100644
> index 00000000000..b30d10731a7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c
> @@ -0,0 +1,34 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target fma } */
> +/* { dg-options "-O3 -mfma -save-temps" } */
> +
> +#include "fma-check.h"
> +
> +void __attribute__((noipa))
> +check_fmaddsub (double * __restrict a, double *b, double *c, int n)
> +{
> + for (int i = 0; i < n; ++i)
> + {
> + a[2*i + 0] = b[2*i + 0] * c[2*i + 0] - a[2*i + 0];
> + a[2*i + 1] = b[2*i + 1] * c[2*i + 1] + a[2*i + 1];
> + }
> +}
> +
> +static void
> +fma_test (void)
> +{
> + double a[4], b[4], c[4];
> + for (int i = 0; i < 4; ++i)
> + {
> + a[i] = i;
> + b[i] = 3*i;
> + c[i] = 7*i;
> + }
> + check_fmaddsub (a, b, c, 2);
> + const double d[4] = { 0., 22., 82., 192. };
> + for (int i = 0; i < 4; ++i)
> + if (a[i] != d[i])
> + __builtin_abort ();
> +}
> +
> +/* { dg-final { scan-assembler "fmaddsub...pd" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c
> new file mode 100644
> index 00000000000..cd2af8725a3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c
> @@ -0,0 +1,34 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target fma } */
> +/* { dg-options "-O3 -mfma -save-temps" } */
> +
> +#include "fma-check.h"
> +
> +void __attribute__((noipa))
> +check_fmaddsub (float * __restrict a, float *b, float *c, int n)
> +{
> + for (int i = 0; i < n; ++i)
> + {
> + a[2*i + 0] = b[2*i + 0] * c[2*i + 0] - a[2*i + 0];
> + a[2*i + 1] = b[2*i + 1] * c[2*i + 1] + a[2*i + 1];
> + }
> +}
> +
> +static void
> +fma_test (void)
> +{
> + float a[4], b[4], c[4];
> + for (int i = 0; i < 4; ++i)
> + {
> + a[i] = i;
> + b[i] = 3*i;
> + c[i] = 7*i;
> + }
> + check_fmaddsub (a, b, c, 2);
> + const float d[4] = { 0., 22., 82., 192. };
> + for (int i = 0; i < 4; ++i)
> + if (a[i] != d[i])
> + __builtin_abort ();
> +}
> +
> +/* { dg-final { scan-assembler "fmaddsub...ps" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c
> new file mode 100644
> index 00000000000..7ca2a275cc1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c
> @@ -0,0 +1,34 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target fma } */
> +/* { dg-options "-O3 -mfma -save-temps" } */
> +
> +#include "fma-check.h"
> +
> +void __attribute__((noipa))
> +check_fmsubadd (double * __restrict a, double *b, double *c, int n)
> +{
> + for (int i = 0; i < n; ++i)
> + {
> + a[2*i + 0] = b[2*i + 0] * c[2*i + 0] + a[2*i + 0];
> + a[2*i + 1] = b[2*i + 1] * c[2*i + 1] - a[2*i + 1];
> + }
> +}
> +
> +static void
> +fma_test (void)
> +{
> + double a[4], b[4], c[4];
> + for (int i = 0; i < 4; ++i)
> + {
> + a[i] = i;
> + b[i] = 3*i;
> + c[i] = 7*i;
> + }
> + check_fmsubadd (a, b, c, 2);
> + const double d[4] = { 0., 20., 86., 186. };
> + for (int i = 0; i < 4; ++i)
> + if (a[i] != d[i])
> + __builtin_abort ();
> +}
> +
> +/* { dg-final { scan-assembler "fmsubadd...pd" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c
> new file mode 100644
> index 00000000000..9ddd0e423db
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c
> @@ -0,0 +1,34 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target fma } */
> +/* { dg-options "-O3 -mfma -save-temps" } */
> +
> +#include "fma-check.h"
> +
> +void __attribute__((noipa))
> +check_fmsubadd (float * __restrict a, float *b, float *c, int n)
> +{
> + for (int i = 0; i < n; ++i)
> + {
> + a[2*i + 0] = b[2*i + 0] * c[2*i + 0] + a[2*i + 0];
> + a[2*i + 1] = b[2*i + 1] * c[2*i + 1] - a[2*i + 1];
> + }
> +}
> +
> +static void
> +fma_test (void)
> +{
> + float a[4], b[4], c[4];
> + for (int i = 0; i < 4; ++i)
> + {
> + a[i] = i;
> + b[i] = 3*i;
> + c[i] = 7*i;
> + }
> + check_fmsubadd (a, b, c, 2);
> + const float d[4] = { 0., 20., 86., 186. };
> + for (int i = 0; i < 4; ++i)
> + if (a[i] != d[i])
> + __builtin_abort ();
> +}
> +
> +/* { dg-final { scan-assembler "fmsubadd...ps" } } */
> diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
> index 2671f91972d..f774cac4a4d 100644
> --- a/gcc/tree-vect-slp-patterns.c
> +++ b/gcc/tree-vect-slp-patterns.c
> @@ -1496,8 +1496,8 @@ complex_operations_pattern::build (vec_info * /* vinfo */)
> class addsub_pattern : public vect_pattern
> {
> public:
> - addsub_pattern (slp_tree *node)
> - : vect_pattern (node, NULL, IFN_VEC_ADDSUB) {};
> + addsub_pattern (slp_tree *node, internal_fn ifn)
> + : vect_pattern (node, NULL, ifn) {};
>
> void build (vec_info *);
>
> @@ -1510,46 +1510,68 @@ addsub_pattern::recognize (slp_tree_to_load_perm_map_t *, slp_tree *node_)
> {
> slp_tree node = *node_;
> if (SLP_TREE_CODE (node) != VEC_PERM_EXPR
> - || SLP_TREE_CHILDREN (node).length () != 2)
> + || SLP_TREE_CHILDREN (node).length () != 2
> + || SLP_TREE_LANE_PERMUTATION (node).length () % 2)
> return NULL;
>
> /* Match a blend of a plus and a minus op with the same number of plus and
> minus lanes on the same operands. */
> - slp_tree sub = SLP_TREE_CHILDREN (node)[0];
> - slp_tree add = SLP_TREE_CHILDREN (node)[1];
> - bool swapped_p = false;
> - if (vect_match_expression_p (sub, PLUS_EXPR))
> - {
> - std::swap (add, sub);
> - swapped_p = true;
> - }
> - if (!(vect_match_expression_p (add, PLUS_EXPR)
> - && vect_match_expression_p (sub, MINUS_EXPR)))
> + unsigned l0 = SLP_TREE_LANE_PERMUTATION (node)[0].first;
> + unsigned l1 = SLP_TREE_LANE_PERMUTATION (node)[1].first;
> + if (l0 == l1)
> + return NULL;
> + bool l0add_p = vect_match_expression_p (SLP_TREE_CHILDREN (node)[l0],
> + PLUS_EXPR);
> + if (!l0add_p
> + && !vect_match_expression_p (SLP_TREE_CHILDREN (node)[l0], MINUS_EXPR))
> + return NULL;
> + bool l1add_p = vect_match_expression_p (SLP_TREE_CHILDREN (node)[l1],
> + PLUS_EXPR);
> + if (!l1add_p
> + && !vect_match_expression_p (SLP_TREE_CHILDREN (node)[l1], MINUS_EXPR))
> return NULL;
> - if (!((SLP_TREE_CHILDREN (sub)[0] == SLP_TREE_CHILDREN (add)[0]
> - && SLP_TREE_CHILDREN (sub)[1] == SLP_TREE_CHILDREN (add)[1])
> - || (SLP_TREE_CHILDREN (sub)[0] == SLP_TREE_CHILDREN (add)[1]
> - && SLP_TREE_CHILDREN (sub)[1] == SLP_TREE_CHILDREN (add)[0])))
> +
> + slp_tree l0node = SLP_TREE_CHILDREN (node)[l0];
> + slp_tree l1node = SLP_TREE_CHILDREN (node)[l1];
> + if (!((SLP_TREE_CHILDREN (l0node)[0] == SLP_TREE_CHILDREN (l1node)[0]
> + && SLP_TREE_CHILDREN (l0node)[1] == SLP_TREE_CHILDREN (l1node)[1])
> + || (SLP_TREE_CHILDREN (l0node)[0] == SLP_TREE_CHILDREN (l1node)[1]
> + && SLP_TREE_CHILDREN (l0node)[1] == SLP_TREE_CHILDREN (l1node)[0])))
> return NULL;
>
> for (unsigned i = 0; i < SLP_TREE_LANE_PERMUTATION (node).length (); ++i)
> {
> std::pair<unsigned, unsigned> perm = SLP_TREE_LANE_PERMUTATION (node)[i];
> - if (swapped_p)
> - perm.first = perm.first == 0 ? 1 : 0;
> - /* It has to be alternating -, +, -, ...
> + /* It has to be alternating -, +, -,
> While we could permute the .ADDSUB inputs and the .ADDSUB output
> that's only profitable over the add + sub + blend if at least
> one of the permute is optimized which we can't determine here. */
> - if (perm.first != (i & 1)
> + if (perm.first != ((i & 1) ? l1 : l0)
> || perm.second != i)
> return NULL;
> }
>
> - if (!vect_pattern_validate_optab (IFN_VEC_ADDSUB, node))
> - return NULL;
> + /* Now we have either { -, +, -, + ... } (!l0add_p) or { +, -, +, - ... }
> + (l0add_p), see whether we have FMA variants. */
> + if (!l0add_p
> + && vect_match_expression_p (SLP_TREE_CHILDREN (l0node)[0], MULT_EXPR))
> + {
> + /* (c * d) -+ a */
> + if (vect_pattern_validate_optab (IFN_VEC_FMADDSUB, node))
> + return new addsub_pattern (node_, IFN_VEC_FMADDSUB);
> + }
> + else if (l0add_p
> + && vect_match_expression_p (SLP_TREE_CHILDREN (l1node)[0], MULT_EXPR))
> + {
> + /* (c * d) +- a */
> + if (vect_pattern_validate_optab (IFN_VEC_FMSUBADD, node))
> + return new addsub_pattern (node_, IFN_VEC_FMSUBADD);
> + }
>
> - return new addsub_pattern (node_);
> + if (!l0add_p && vect_pattern_validate_optab (IFN_VEC_ADDSUB, node))
> + return new addsub_pattern (node_, IFN_VEC_ADDSUB);
> +
> + return NULL;
> }
>
> void
> @@ -1557,38 +1579,96 @@ addsub_pattern::build (vec_info *vinfo)
> {
> slp_tree node = *m_node;
>
> - slp_tree sub = SLP_TREE_CHILDREN (node)[0];
> - slp_tree add = SLP_TREE_CHILDREN (node)[1];
> - if (vect_match_expression_p (sub, PLUS_EXPR))
> - std::swap (add, sub);
> -
> - /* Modify the blend node in-place. */
> - SLP_TREE_CHILDREN (node)[0] = SLP_TREE_CHILDREN (sub)[0];
> - SLP_TREE_CHILDREN (node)[1] = SLP_TREE_CHILDREN (sub)[1];
> - SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[0])++;
> - SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[1])++;
> -
> - /* Build IFN_VEC_ADDSUB from the sub representative operands. */
> - stmt_vec_info rep = SLP_TREE_REPRESENTATIVE (sub);
> - gcall *call = gimple_build_call_internal (IFN_VEC_ADDSUB, 2,
> - gimple_assign_rhs1 (rep->stmt),
> - gimple_assign_rhs2 (rep->stmt));
> - gimple_call_set_lhs (call, make_ssa_name
> - (TREE_TYPE (gimple_assign_lhs (rep->stmt))));
> - gimple_call_set_nothrow (call, true);
> - gimple_set_bb (call, gimple_bb (rep->stmt));
> - stmt_vec_info new_rep = vinfo->add_pattern_stmt (call, rep);
> - SLP_TREE_REPRESENTATIVE (node) = new_rep;
> - STMT_VINFO_RELEVANT (new_rep) = vect_used_in_scope;
> - STMT_SLP_TYPE (new_rep) = pure_slp;
> - STMT_VINFO_VECTYPE (new_rep) = SLP_TREE_VECTYPE (node);
> - STMT_VINFO_SLP_VECT_ONLY_PATTERN (new_rep) = true;
> - STMT_VINFO_REDUC_DEF (new_rep) = STMT_VINFO_REDUC_DEF (vect_orig_stmt (rep));
> - SLP_TREE_CODE (node) = ERROR_MARK;
> - SLP_TREE_LANE_PERMUTATION (node).release ();
> -
> - vect_free_slp_tree (sub);
> - vect_free_slp_tree (add);
> + unsigned l0 = SLP_TREE_LANE_PERMUTATION (node)[0].first;
> + unsigned l1 = SLP_TREE_LANE_PERMUTATION (node)[1].first;
> +
> + switch (m_ifn)
> + {
> + case IFN_VEC_ADDSUB:
> + {
> + slp_tree sub = SLP_TREE_CHILDREN (node)[l0];
> + slp_tree add = SLP_TREE_CHILDREN (node)[l1];
> +
> + /* Modify the blend node in-place. */
> + SLP_TREE_CHILDREN (node)[0] = SLP_TREE_CHILDREN (sub)[0];
> + SLP_TREE_CHILDREN (node)[1] = SLP_TREE_CHILDREN (sub)[1];
> + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[0])++;
> + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[1])++;
> +
> + /* Build IFN_VEC_ADDSUB from the sub representative operands. */
> + stmt_vec_info rep = SLP_TREE_REPRESENTATIVE (sub);
> + gcall *call = gimple_build_call_internal (IFN_VEC_ADDSUB, 2,
> + gimple_assign_rhs1 (rep->stmt),
> + gimple_assign_rhs2 (rep->stmt));
> + gimple_call_set_lhs (call, make_ssa_name
> + (TREE_TYPE (gimple_assign_lhs (rep->stmt))));
> + gimple_call_set_nothrow (call, true);
> + gimple_set_bb (call, gimple_bb (rep->stmt));
> + stmt_vec_info new_rep = vinfo->add_pattern_stmt (call, rep);
> + SLP_TREE_REPRESENTATIVE (node) = new_rep;
> + STMT_VINFO_RELEVANT (new_rep) = vect_used_in_scope;
> + STMT_SLP_TYPE (new_rep) = pure_slp;
> + STMT_VINFO_VECTYPE (new_rep) = SLP_TREE_VECTYPE (node);
> + STMT_VINFO_SLP_VECT_ONLY_PATTERN (new_rep) = true;
> + STMT_VINFO_REDUC_DEF (new_rep) = STMT_VINFO_REDUC_DEF (vect_orig_stmt (rep));
> + SLP_TREE_CODE (node) = ERROR_MARK;
> + SLP_TREE_LANE_PERMUTATION (node).release ();
> +
> + vect_free_slp_tree (sub);
> + vect_free_slp_tree (add);
> + break;
> + }
> + case IFN_VEC_FMADDSUB:
> + case IFN_VEC_FMSUBADD:
> + {
> + slp_tree sub, add;
> + if (m_ifn == IFN_VEC_FMADDSUB)
> + {
> + sub = SLP_TREE_CHILDREN (node)[l0];
> + add = SLP_TREE_CHILDREN (node)[l1];
> + }
> + else /* m_ifn == IFN_VEC_FMSUBADD */
> + {
> + sub = SLP_TREE_CHILDREN (node)[l1];
> + add = SLP_TREE_CHILDREN (node)[l0];
> + }
> + slp_tree mul = SLP_TREE_CHILDREN (sub)[0];
> + /* Modify the blend node in-place. */
> + SLP_TREE_CHILDREN (node).safe_grow (3, true);
> + SLP_TREE_CHILDREN (node)[0] = SLP_TREE_CHILDREN (mul)[0];
> + SLP_TREE_CHILDREN (node)[1] = SLP_TREE_CHILDREN (mul)[1];
> + SLP_TREE_CHILDREN (node)[2] = SLP_TREE_CHILDREN (sub)[1];
> + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[0])++;
> + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[1])++;
> + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[2])++;
> +
> + /* Build IFN_VEC_FMADDSUB from the mul/sub representative operands. */
> + stmt_vec_info srep = SLP_TREE_REPRESENTATIVE (sub);
> + stmt_vec_info mrep = SLP_TREE_REPRESENTATIVE (mul);
> + gcall *call = gimple_build_call_internal (m_ifn, 3,
> + gimple_assign_rhs1 (mrep->stmt),
> + gimple_assign_rhs2 (mrep->stmt),
> + gimple_assign_rhs2 (srep->stmt));
> + gimple_call_set_lhs (call, make_ssa_name
> + (TREE_TYPE (gimple_assign_lhs (srep->stmt))));
> + gimple_call_set_nothrow (call, true);
> + gimple_set_bb (call, gimple_bb (srep->stmt));
> + stmt_vec_info new_rep = vinfo->add_pattern_stmt (call, srep);
> + SLP_TREE_REPRESENTATIVE (node) = new_rep;
> + STMT_VINFO_RELEVANT (new_rep) = vect_used_in_scope;
> + STMT_SLP_TYPE (new_rep) = pure_slp;
> + STMT_VINFO_VECTYPE (new_rep) = SLP_TREE_VECTYPE (node);
> + STMT_VINFO_SLP_VECT_ONLY_PATTERN (new_rep) = true;
> + STMT_VINFO_REDUC_DEF (new_rep) = STMT_VINFO_REDUC_DEF (vect_orig_stmt (srep));
> + SLP_TREE_CODE (node) = ERROR_MARK;
> + SLP_TREE_LANE_PERMUTATION (node).release ();
> +
> + vect_free_slp_tree (sub);
> + vect_free_slp_tree (add);
> + break;
> + }
> + default:;
> + }
> }
>
> /*******************************************************************************
> diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
> index f08797c2bc0..5357cd0e7a4 100644
> --- a/gcc/tree-vect-slp.c
> +++ b/gcc/tree-vect-slp.c
> @@ -3728,6 +3728,8 @@ vect_optimize_slp (vec_info *vinfo)
> case CFN_COMPLEX_MUL:
> case CFN_COMPLEX_MUL_CONJ:
> case CFN_VEC_ADDSUB:
> + case CFN_VEC_FMADDSUB:
> + case CFN_VEC_FMSUBADD:
> vertices[idx].perm_in = 0;
> vertices[idx].perm_out = 0;
> default:;
> --
> 2.26.2
--
BR,
Hongtao
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Add FMADDSUB and FMSUBADD SLP vectorization patterns and optabs
2021-07-06 2:16 ` Hongtao Liu
@ 2021-07-06 7:42 ` Richard Biener
2021-07-06 8:29 ` Hongtao Liu
0 siblings, 1 reply; 8+ messages in thread
From: Richard Biener @ 2021-07-06 7:42 UTC (permalink / raw)
To: Hongtao Liu; +Cc: GCC Patches, Liu, Hongtao
On Tue, 6 Jul 2021, Hongtao Liu wrote:
> On Mon, Jul 5, 2021 at 10:09 PM Richard Biener <rguenther@suse.de> wrote:
> >
> > This adds named expanders for vec_fmaddsub<mode>4 and
> > vec_fmsubadd<mode>4 which map to x86 vfmaddsubXXXp{ds} and
> > vfmsubaddXXXp{ds} instructions. This complements the previous
> > addition of ADDSUB support.
> >
> > x86 lacks SUBADD and the negate variants of FMA with mixed
> > plus minus so I did not add optabs or patterns for those but
> > it would not be difficult if there's a target that has them.
> > Maybe one of the complex fma patterns match those variants?
> >
> > I did not dare to rewrite the numerous patterns to the new
> > canonical name but instead added two new expanders. Note I
> > did not cover AVX512 since the existing patterns are separated
> > and I have no easy way to test things there. Handling AVX512
> > should be easy as followup though.
> >
> > Bootstrap and testing on x86_64-unknown-linux-gnu in progress.
> >
> > Any comments?
> >
> > Thanks,
> > Richard.
> >
> > 2021-07-05 Richard Biener <rguenther@suse.de>
> >
> > * doc/md.texi (vec_fmaddsub<mode>4): Document.
> > (vec_fmsubadd<mode>4): Likewise.
> > * optabs.def (vec_fmaddsub$a4): Add.
> > (vec_fmsubadd$a4): Likewise.
> > * internal-fn.def (IFN_VEC_FMADDSUB): Add.
> > (IFN_VEC_FMSUBADD): Likewise.
> > * tree-vect-slp-patterns.c (addsub_pattern::recognize):
> > Refactor to handle IFN_VEC_FMADDSUB and IFN_VEC_FMSUBADD.
> > (addsub_pattern::build): Likewise.
> > * tree-vect-slp.c (vect_optimize_slp): CFN_VEC_FMADDSUB
> > and CFN_VEC_FMSUBADD are not transparent for permutes.
> > * config/i386/sse.md (vec_fmaddsub<mode>4): New expander.
> > (vec_fmsubadd<mode>4): Likewise.
> >
> > * gcc.target/i386/vect-fmaddsubXXXpd.c: New testcase.
> > * gcc.target/i386/vect-fmaddsubXXXps.c: Likewise.
> > * gcc.target/i386/vect-fmsubaddXXXpd.c: Likewise.
> > * gcc.target/i386/vect-fmsubaddXXXps.c: Likewise.
> > ---
> > gcc/config/i386/sse.md | 19 ++
> > gcc/doc/md.texi | 14 ++
> > gcc/internal-fn.def | 3 +-
> > gcc/optabs.def | 2 +
> > .../gcc.target/i386/vect-fmaddsubXXXpd.c | 34 ++++
> > .../gcc.target/i386/vect-fmaddsubXXXps.c | 34 ++++
> > .../gcc.target/i386/vect-fmsubaddXXXpd.c | 34 ++++
> > .../gcc.target/i386/vect-fmsubaddXXXps.c | 34 ++++
> > gcc/tree-vect-slp-patterns.c | 192 +++++++++++++-----
> > gcc/tree-vect-slp.c | 2 +
> > 10 files changed, 311 insertions(+), 57 deletions(-)
> > create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c
> > create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c
> >
> > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > index bcf1605d147..6fc13c184bf 100644
> > --- a/gcc/config/i386/sse.md
> > +++ b/gcc/config/i386/sse.md
> > @@ -4644,6 +4644,25 @@
> > ;;
> > ;; But this doesn't seem useful in practice.
> >
> > +(define_expand "vec_fmaddsub<mode>4"
> > + [(set (match_operand:VF 0 "register_operand")
> > + (unspec:VF
> > + [(match_operand:VF 1 "nonimmediate_operand")
> > + (match_operand:VF 2 "nonimmediate_operand")
> > + (match_operand:VF 3 "nonimmediate_operand")]
> > + UNSPEC_FMADDSUB))]
> > + "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F")
> > +
> > +(define_expand "vec_fmsubadd<mode>4"
> > + [(set (match_operand:VF 0 "register_operand")
> > + (unspec:VF
> > + [(match_operand:VF 1 "nonimmediate_operand")
> > + (match_operand:VF 2 "nonimmediate_operand")
> > + (neg:VF
> > + (match_operand:VF 3 "nonimmediate_operand"))]
> > + UNSPEC_FMADDSUB))]
> > + "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F")
> > +
>
> W/ condition like
> "TARGET_FMA || TARGET_FMA4
> || (<MODE_SIZE> == 64 || TARGET_AVX512VL)“?
>
> the original expander "fmaddsub_<mode>" is only used by builtins which
> have it's own guard for AVX512VL, It doesn't matter if it doesn't have
> TARGET_AVX512VL
> BDESC (OPTION_MASK_ISA_AVX512VL, 0,
> CODE_FOR_avx512vl_fmaddsub_v4df_mask,
> "__builtin_ia32_vfmaddsubpd256_mask",
> IX86_BUILTIN_VFMADDSUBPD256_MASK, UNKNOWN, (int)
> V4DF_FTYPE_V4DF_V4DF_V4DF_UQI)
OK, that seems to work!
Bootstrapped and tested on x86_64-unknown-linux-gnu - are the
x86 backend changes OK?
Thanks,
Richard.
This adds named expanders for vec_fmaddsub<mode>4 and
vec_fmsubadd<mode>4 which map to x86 vfmaddsubXXXp{ds} and
vfmsubaddXXXp{ds} instructions. This complements the previous
addition of ADDSUB support.
x86 lacks SUBADD and the negate variants of FMA with mixed
plus minus so I did not add optabs or patterns for those but
it would not be difficult if there's a target that has them.
2021-07-05 Richard Biener <rguenther@suse.de>
* doc/md.texi (vec_fmaddsub<mode>4): Document.
(vec_fmsubadd<mode>4): Likewise.
* optabs.def (vec_fmaddsub$a4): Add.
(vec_fmsubadd$a4): Likewise.
* internal-fn.def (IFN_VEC_FMADDSUB): Add.
(IFN_VEC_FMSUBADD): Likewise.
* tree-vect-slp-patterns.c (addsub_pattern::recognize):
Refactor to handle IFN_VEC_FMADDSUB and IFN_VEC_FMSUBADD.
(addsub_pattern::build): Likewise.
* tree-vect-slp.c (vect_optimize_slp): CFN_VEC_FMADDSUB
and CFN_VEC_FMSUBADD are not transparent for permutes.
* config/i386/sse.md (vec_fmaddsub<mode>4): New expander.
(vec_fmsubadd<mode>4): Likewise.
* gcc.target/i386/vect-fmaddsubXXXpd.c: New testcase.
* gcc.target/i386/vect-fmaddsubXXXps.c: Likewise.
* gcc.target/i386/vect-fmsubaddXXXpd.c: Likewise.
* gcc.target/i386/vect-fmsubaddXXXps.c: Likewise.
---
gcc/config/i386/sse.md | 19 ++
gcc/doc/md.texi | 14 ++
gcc/internal-fn.def | 3 +-
gcc/optabs.def | 2 +
.../gcc.target/i386/vect-fmaddsubXXXpd.c | 34 ++++
.../gcc.target/i386/vect-fmaddsubXXXps.c | 34 ++++
.../gcc.target/i386/vect-fmsubaddXXXpd.c | 34 ++++
.../gcc.target/i386/vect-fmsubaddXXXps.c | 34 ++++
gcc/tree-vect-slp-patterns.c | 192 +++++++++++++-----
gcc/tree-vect-slp.c | 2 +
10 files changed, 311 insertions(+), 57 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c
create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c
create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c
create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index bcf1605d147..17c9e571d5d 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -4644,6 +4644,25 @@
;;
;; But this doesn't seem useful in practice.
+(define_expand "vec_fmaddsub<mode>4"
+ [(set (match_operand:VF 0 "register_operand")
+ (unspec:VF
+ [(match_operand:VF 1 "nonimmediate_operand")
+ (match_operand:VF 2 "nonimmediate_operand")
+ (match_operand:VF 3 "nonimmediate_operand")]
+ UNSPEC_FMADDSUB))]
+ "TARGET_FMA || TARGET_FMA4 || (<MODE_SIZE> == 64 || TARGET_AVX512VL)")
+
+(define_expand "vec_fmsubadd<mode>4"
+ [(set (match_operand:VF 0 "register_operand")
+ (unspec:VF
+ [(match_operand:VF 1 "nonimmediate_operand")
+ (match_operand:VF 2 "nonimmediate_operand")
+ (neg:VF
+ (match_operand:VF 3 "nonimmediate_operand"))]
+ UNSPEC_FMADDSUB))]
+ "TARGET_FMA || TARGET_FMA4 || (<MODE_SIZE> == 64 || TARGET_AVX512VL)")
+
(define_expand "fmaddsub_<mode>"
[(set (match_operand:VF 0 "register_operand")
(unspec:VF
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 1b918144330..cc92ebd26aa 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5688,6 +5688,20 @@ Alternating subtract, add with even lanes doing subtract and odd
lanes doing addition. Operands 1 and 2 and the outout operand are vectors
with mode @var{m}.
+@cindex @code{vec_fmaddsub@var{m}4} instruction pattern
+@item @samp{vec_fmaddsub@var{m}4}
+Alternating multiply subtract, add with even lanes doing subtract and odd
+lanes doing addition of the third operand to the multiplication result
+of the first two operands. Operands 1, 2 and 3 and the outout operand are vectors
+with mode @var{m}.
+
+@cindex @code{vec_fmsubadd@var{m}4} instruction pattern
+@item @samp{vec_fmsubadd@var{m}4}
+Alternating multiply add, subtract with even lanes doing addition and odd
+lanes doing subtraction of the third operand to the multiplication result
+of the first two operands. Operands 1, 2 and 3 and the outout operand are vectors
+with mode @var{m}.
+
These instructions are not allowed to @code{FAIL}.
@cindex @code{mulhisi3} instruction pattern
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index c3b8e730960..a7003d5da8e 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -282,7 +282,8 @@ DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary)
DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL, ECF_CONST, cmul, binary)
DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL_CONJ, ECF_CONST, cmul_conj, binary)
DEF_INTERNAL_OPTAB_FN (VEC_ADDSUB, ECF_CONST, vec_addsub, binary)
-
+DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, vec_fmaddsub, ternary)
+DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd, ternary)
/* FP scales. */
DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 41ab2598eb6..51acc1be8f5 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -408,6 +408,8 @@ OPTAB_D (vec_widen_usubl_lo_optab, "vec_widen_usubl_lo_$a")
OPTAB_D (vec_widen_uaddl_hi_optab, "vec_widen_uaddl_hi_$a")
OPTAB_D (vec_widen_uaddl_lo_optab, "vec_widen_uaddl_lo_$a")
OPTAB_D (vec_addsub_optab, "vec_addsub$a3")
+OPTAB_D (vec_fmaddsub_optab, "vec_fmaddsub$a4")
+OPTAB_D (vec_fmsubadd_optab, "vec_fmsubadd$a4")
OPTAB_D (sync_add_optab, "sync_add$I$a")
OPTAB_D (sync_and_optab, "sync_and$I$a")
diff --git a/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c
new file mode 100644
index 00000000000..b30d10731a7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c
@@ -0,0 +1,34 @@
+/* { dg-do run } */
+/* { dg-require-effective-target fma } */
+/* { dg-options "-O3 -mfma -save-temps" } */
+
+#include "fma-check.h"
+
+void __attribute__((noipa))
+check_fmaddsub (double * __restrict a, double *b, double *c, int n)
+{
+ for (int i = 0; i < n; ++i)
+ {
+ a[2*i + 0] = b[2*i + 0] * c[2*i + 0] - a[2*i + 0];
+ a[2*i + 1] = b[2*i + 1] * c[2*i + 1] + a[2*i + 1];
+ }
+}
+
+static void
+fma_test (void)
+{
+ double a[4], b[4], c[4];
+ for (int i = 0; i < 4; ++i)
+ {
+ a[i] = i;
+ b[i] = 3*i;
+ c[i] = 7*i;
+ }
+ check_fmaddsub (a, b, c, 2);
+ const double d[4] = { 0., 22., 82., 192. };
+ for (int i = 0; i < 4; ++i)
+ if (a[i] != d[i])
+ __builtin_abort ();
+}
+
+/* { dg-final { scan-assembler "fmaddsub...pd" } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c
new file mode 100644
index 00000000000..cd2af8725a3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c
@@ -0,0 +1,34 @@
+/* { dg-do run } */
+/* { dg-require-effective-target fma } */
+/* { dg-options "-O3 -mfma -save-temps" } */
+
+#include "fma-check.h"
+
+void __attribute__((noipa))
+check_fmaddsub (float * __restrict a, float *b, float *c, int n)
+{
+ for (int i = 0; i < n; ++i)
+ {
+ a[2*i + 0] = b[2*i + 0] * c[2*i + 0] - a[2*i + 0];
+ a[2*i + 1] = b[2*i + 1] * c[2*i + 1] + a[2*i + 1];
+ }
+}
+
+static void
+fma_test (void)
+{
+ float a[4], b[4], c[4];
+ for (int i = 0; i < 4; ++i)
+ {
+ a[i] = i;
+ b[i] = 3*i;
+ c[i] = 7*i;
+ }
+ check_fmaddsub (a, b, c, 2);
+ const float d[4] = { 0., 22., 82., 192. };
+ for (int i = 0; i < 4; ++i)
+ if (a[i] != d[i])
+ __builtin_abort ();
+}
+
+/* { dg-final { scan-assembler "fmaddsub...ps" } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c
new file mode 100644
index 00000000000..7ca2a275cc1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c
@@ -0,0 +1,34 @@
+/* { dg-do run } */
+/* { dg-require-effective-target fma } */
+/* { dg-options "-O3 -mfma -save-temps" } */
+
+#include "fma-check.h"
+
+void __attribute__((noipa))
+check_fmsubadd (double * __restrict a, double *b, double *c, int n)
+{
+ for (int i = 0; i < n; ++i)
+ {
+ a[2*i + 0] = b[2*i + 0] * c[2*i + 0] + a[2*i + 0];
+ a[2*i + 1] = b[2*i + 1] * c[2*i + 1] - a[2*i + 1];
+ }
+}
+
+static void
+fma_test (void)
+{
+ double a[4], b[4], c[4];
+ for (int i = 0; i < 4; ++i)
+ {
+ a[i] = i;
+ b[i] = 3*i;
+ c[i] = 7*i;
+ }
+ check_fmsubadd (a, b, c, 2);
+ const double d[4] = { 0., 20., 86., 186. };
+ for (int i = 0; i < 4; ++i)
+ if (a[i] != d[i])
+ __builtin_abort ();
+}
+
+/* { dg-final { scan-assembler "fmsubadd...pd" } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c
new file mode 100644
index 00000000000..9ddd0e423db
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c
@@ -0,0 +1,34 @@
+/* { dg-do run } */
+/* { dg-require-effective-target fma } */
+/* { dg-options "-O3 -mfma -save-temps" } */
+
+#include "fma-check.h"
+
+void __attribute__((noipa))
+check_fmsubadd (float * __restrict a, float *b, float *c, int n)
+{
+ for (int i = 0; i < n; ++i)
+ {
+ a[2*i + 0] = b[2*i + 0] * c[2*i + 0] + a[2*i + 0];
+ a[2*i + 1] = b[2*i + 1] * c[2*i + 1] - a[2*i + 1];
+ }
+}
+
+static void
+fma_test (void)
+{
+ float a[4], b[4], c[4];
+ for (int i = 0; i < 4; ++i)
+ {
+ a[i] = i;
+ b[i] = 3*i;
+ c[i] = 7*i;
+ }
+ check_fmsubadd (a, b, c, 2);
+ const float d[4] = { 0., 20., 86., 186. };
+ for (int i = 0; i < 4; ++i)
+ if (a[i] != d[i])
+ __builtin_abort ();
+}
+
+/* { dg-final { scan-assembler "fmsubadd...ps" } } */
diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
index 2671f91972d..f774cac4a4d 100644
--- a/gcc/tree-vect-slp-patterns.c
+++ b/gcc/tree-vect-slp-patterns.c
@@ -1496,8 +1496,8 @@ complex_operations_pattern::build (vec_info * /* vinfo */)
class addsub_pattern : public vect_pattern
{
public:
- addsub_pattern (slp_tree *node)
- : vect_pattern (node, NULL, IFN_VEC_ADDSUB) {};
+ addsub_pattern (slp_tree *node, internal_fn ifn)
+ : vect_pattern (node, NULL, ifn) {};
void build (vec_info *);
@@ -1510,46 +1510,68 @@ addsub_pattern::recognize (slp_tree_to_load_perm_map_t *, slp_tree *node_)
{
slp_tree node = *node_;
if (SLP_TREE_CODE (node) != VEC_PERM_EXPR
- || SLP_TREE_CHILDREN (node).length () != 2)
+ || SLP_TREE_CHILDREN (node).length () != 2
+ || SLP_TREE_LANE_PERMUTATION (node).length () % 2)
return NULL;
/* Match a blend of a plus and a minus op with the same number of plus and
minus lanes on the same operands. */
- slp_tree sub = SLP_TREE_CHILDREN (node)[0];
- slp_tree add = SLP_TREE_CHILDREN (node)[1];
- bool swapped_p = false;
- if (vect_match_expression_p (sub, PLUS_EXPR))
- {
- std::swap (add, sub);
- swapped_p = true;
- }
- if (!(vect_match_expression_p (add, PLUS_EXPR)
- && vect_match_expression_p (sub, MINUS_EXPR)))
+ unsigned l0 = SLP_TREE_LANE_PERMUTATION (node)[0].first;
+ unsigned l1 = SLP_TREE_LANE_PERMUTATION (node)[1].first;
+ if (l0 == l1)
+ return NULL;
+ bool l0add_p = vect_match_expression_p (SLP_TREE_CHILDREN (node)[l0],
+ PLUS_EXPR);
+ if (!l0add_p
+ && !vect_match_expression_p (SLP_TREE_CHILDREN (node)[l0], MINUS_EXPR))
+ return NULL;
+ bool l1add_p = vect_match_expression_p (SLP_TREE_CHILDREN (node)[l1],
+ PLUS_EXPR);
+ if (!l1add_p
+ && !vect_match_expression_p (SLP_TREE_CHILDREN (node)[l1], MINUS_EXPR))
return NULL;
- if (!((SLP_TREE_CHILDREN (sub)[0] == SLP_TREE_CHILDREN (add)[0]
- && SLP_TREE_CHILDREN (sub)[1] == SLP_TREE_CHILDREN (add)[1])
- || (SLP_TREE_CHILDREN (sub)[0] == SLP_TREE_CHILDREN (add)[1]
- && SLP_TREE_CHILDREN (sub)[1] == SLP_TREE_CHILDREN (add)[0])))
+
+ slp_tree l0node = SLP_TREE_CHILDREN (node)[l0];
+ slp_tree l1node = SLP_TREE_CHILDREN (node)[l1];
+ if (!((SLP_TREE_CHILDREN (l0node)[0] == SLP_TREE_CHILDREN (l1node)[0]
+ && SLP_TREE_CHILDREN (l0node)[1] == SLP_TREE_CHILDREN (l1node)[1])
+ || (SLP_TREE_CHILDREN (l0node)[0] == SLP_TREE_CHILDREN (l1node)[1]
+ && SLP_TREE_CHILDREN (l0node)[1] == SLP_TREE_CHILDREN (l1node)[0])))
return NULL;
for (unsigned i = 0; i < SLP_TREE_LANE_PERMUTATION (node).length (); ++i)
{
std::pair<unsigned, unsigned> perm = SLP_TREE_LANE_PERMUTATION (node)[i];
- if (swapped_p)
- perm.first = perm.first == 0 ? 1 : 0;
- /* It has to be alternating -, +, -, ...
+ /* It has to be alternating -, +, -,
While we could permute the .ADDSUB inputs and the .ADDSUB output
that's only profitable over the add + sub + blend if at least
one of the permute is optimized which we can't determine here. */
- if (perm.first != (i & 1)
+ if (perm.first != ((i & 1) ? l1 : l0)
|| perm.second != i)
return NULL;
}
- if (!vect_pattern_validate_optab (IFN_VEC_ADDSUB, node))
- return NULL;
+ /* Now we have either { -, +, -, + ... } (!l0add_p) or { +, -, +, - ... }
+ (l0add_p), see whether we have FMA variants. */
+ if (!l0add_p
+ && vect_match_expression_p (SLP_TREE_CHILDREN (l0node)[0], MULT_EXPR))
+ {
+ /* (c * d) -+ a */
+ if (vect_pattern_validate_optab (IFN_VEC_FMADDSUB, node))
+ return new addsub_pattern (node_, IFN_VEC_FMADDSUB);
+ }
+ else if (l0add_p
+ && vect_match_expression_p (SLP_TREE_CHILDREN (l1node)[0], MULT_EXPR))
+ {
+ /* (c * d) +- a */
+ if (vect_pattern_validate_optab (IFN_VEC_FMSUBADD, node))
+ return new addsub_pattern (node_, IFN_VEC_FMSUBADD);
+ }
- return new addsub_pattern (node_);
+ if (!l0add_p && vect_pattern_validate_optab (IFN_VEC_ADDSUB, node))
+ return new addsub_pattern (node_, IFN_VEC_ADDSUB);
+
+ return NULL;
}
void
@@ -1557,38 +1579,96 @@ addsub_pattern::build (vec_info *vinfo)
{
slp_tree node = *m_node;
- slp_tree sub = SLP_TREE_CHILDREN (node)[0];
- slp_tree add = SLP_TREE_CHILDREN (node)[1];
- if (vect_match_expression_p (sub, PLUS_EXPR))
- std::swap (add, sub);
-
- /* Modify the blend node in-place. */
- SLP_TREE_CHILDREN (node)[0] = SLP_TREE_CHILDREN (sub)[0];
- SLP_TREE_CHILDREN (node)[1] = SLP_TREE_CHILDREN (sub)[1];
- SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[0])++;
- SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[1])++;
-
- /* Build IFN_VEC_ADDSUB from the sub representative operands. */
- stmt_vec_info rep = SLP_TREE_REPRESENTATIVE (sub);
- gcall *call = gimple_build_call_internal (IFN_VEC_ADDSUB, 2,
- gimple_assign_rhs1 (rep->stmt),
- gimple_assign_rhs2 (rep->stmt));
- gimple_call_set_lhs (call, make_ssa_name
- (TREE_TYPE (gimple_assign_lhs (rep->stmt))));
- gimple_call_set_nothrow (call, true);
- gimple_set_bb (call, gimple_bb (rep->stmt));
- stmt_vec_info new_rep = vinfo->add_pattern_stmt (call, rep);
- SLP_TREE_REPRESENTATIVE (node) = new_rep;
- STMT_VINFO_RELEVANT (new_rep) = vect_used_in_scope;
- STMT_SLP_TYPE (new_rep) = pure_slp;
- STMT_VINFO_VECTYPE (new_rep) = SLP_TREE_VECTYPE (node);
- STMT_VINFO_SLP_VECT_ONLY_PATTERN (new_rep) = true;
- STMT_VINFO_REDUC_DEF (new_rep) = STMT_VINFO_REDUC_DEF (vect_orig_stmt (rep));
- SLP_TREE_CODE (node) = ERROR_MARK;
- SLP_TREE_LANE_PERMUTATION (node).release ();
-
- vect_free_slp_tree (sub);
- vect_free_slp_tree (add);
+ unsigned l0 = SLP_TREE_LANE_PERMUTATION (node)[0].first;
+ unsigned l1 = SLP_TREE_LANE_PERMUTATION (node)[1].first;
+
+ switch (m_ifn)
+ {
+ case IFN_VEC_ADDSUB:
+ {
+ slp_tree sub = SLP_TREE_CHILDREN (node)[l0];
+ slp_tree add = SLP_TREE_CHILDREN (node)[l1];
+
+ /* Modify the blend node in-place. */
+ SLP_TREE_CHILDREN (node)[0] = SLP_TREE_CHILDREN (sub)[0];
+ SLP_TREE_CHILDREN (node)[1] = SLP_TREE_CHILDREN (sub)[1];
+ SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[0])++;
+ SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[1])++;
+
+ /* Build IFN_VEC_ADDSUB from the sub representative operands. */
+ stmt_vec_info rep = SLP_TREE_REPRESENTATIVE (sub);
+ gcall *call = gimple_build_call_internal (IFN_VEC_ADDSUB, 2,
+ gimple_assign_rhs1 (rep->stmt),
+ gimple_assign_rhs2 (rep->stmt));
+ gimple_call_set_lhs (call, make_ssa_name
+ (TREE_TYPE (gimple_assign_lhs (rep->stmt))));
+ gimple_call_set_nothrow (call, true);
+ gimple_set_bb (call, gimple_bb (rep->stmt));
+ stmt_vec_info new_rep = vinfo->add_pattern_stmt (call, rep);
+ SLP_TREE_REPRESENTATIVE (node) = new_rep;
+ STMT_VINFO_RELEVANT (new_rep) = vect_used_in_scope;
+ STMT_SLP_TYPE (new_rep) = pure_slp;
+ STMT_VINFO_VECTYPE (new_rep) = SLP_TREE_VECTYPE (node);
+ STMT_VINFO_SLP_VECT_ONLY_PATTERN (new_rep) = true;
+ STMT_VINFO_REDUC_DEF (new_rep) = STMT_VINFO_REDUC_DEF (vect_orig_stmt (rep));
+ SLP_TREE_CODE (node) = ERROR_MARK;
+ SLP_TREE_LANE_PERMUTATION (node).release ();
+
+ vect_free_slp_tree (sub);
+ vect_free_slp_tree (add);
+ break;
+ }
+ case IFN_VEC_FMADDSUB:
+ case IFN_VEC_FMSUBADD:
+ {
+ slp_tree sub, add;
+ if (m_ifn == IFN_VEC_FMADDSUB)
+ {
+ sub = SLP_TREE_CHILDREN (node)[l0];
+ add = SLP_TREE_CHILDREN (node)[l1];
+ }
+ else /* m_ifn == IFN_VEC_FMSUBADD */
+ {
+ sub = SLP_TREE_CHILDREN (node)[l1];
+ add = SLP_TREE_CHILDREN (node)[l0];
+ }
+ slp_tree mul = SLP_TREE_CHILDREN (sub)[0];
+ /* Modify the blend node in-place. */
+ SLP_TREE_CHILDREN (node).safe_grow (3, true);
+ SLP_TREE_CHILDREN (node)[0] = SLP_TREE_CHILDREN (mul)[0];
+ SLP_TREE_CHILDREN (node)[1] = SLP_TREE_CHILDREN (mul)[1];
+ SLP_TREE_CHILDREN (node)[2] = SLP_TREE_CHILDREN (sub)[1];
+ SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[0])++;
+ SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[1])++;
+ SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[2])++;
+
+ /* Build IFN_VEC_FMADDSUB from the mul/sub representative operands. */
+ stmt_vec_info srep = SLP_TREE_REPRESENTATIVE (sub);
+ stmt_vec_info mrep = SLP_TREE_REPRESENTATIVE (mul);
+ gcall *call = gimple_build_call_internal (m_ifn, 3,
+ gimple_assign_rhs1 (mrep->stmt),
+ gimple_assign_rhs2 (mrep->stmt),
+ gimple_assign_rhs2 (srep->stmt));
+ gimple_call_set_lhs (call, make_ssa_name
+ (TREE_TYPE (gimple_assign_lhs (srep->stmt))));
+ gimple_call_set_nothrow (call, true);
+ gimple_set_bb (call, gimple_bb (srep->stmt));
+ stmt_vec_info new_rep = vinfo->add_pattern_stmt (call, srep);
+ SLP_TREE_REPRESENTATIVE (node) = new_rep;
+ STMT_VINFO_RELEVANT (new_rep) = vect_used_in_scope;
+ STMT_SLP_TYPE (new_rep) = pure_slp;
+ STMT_VINFO_VECTYPE (new_rep) = SLP_TREE_VECTYPE (node);
+ STMT_VINFO_SLP_VECT_ONLY_PATTERN (new_rep) = true;
+ STMT_VINFO_REDUC_DEF (new_rep) = STMT_VINFO_REDUC_DEF (vect_orig_stmt (srep));
+ SLP_TREE_CODE (node) = ERROR_MARK;
+ SLP_TREE_LANE_PERMUTATION (node).release ();
+
+ vect_free_slp_tree (sub);
+ vect_free_slp_tree (add);
+ break;
+ }
+ default:;
+ }
}
/*******************************************************************************
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index f08797c2bc0..5357cd0e7a4 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -3728,6 +3728,8 @@ vect_optimize_slp (vec_info *vinfo)
case CFN_COMPLEX_MUL:
case CFN_COMPLEX_MUL_CONJ:
case CFN_VEC_ADDSUB:
+ case CFN_VEC_FMADDSUB:
+ case CFN_VEC_FMSUBADD:
vertices[idx].perm_in = 0;
vertices[idx].perm_out = 0;
default:;
--
2.26.2
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Add FMADDSUB and FMSUBADD SLP vectorization patterns and optabs
2021-07-06 7:42 ` Richard Biener
@ 2021-07-06 8:29 ` Hongtao Liu
2021-07-07 7:30 ` Hongtao Liu
0 siblings, 1 reply; 8+ messages in thread
From: Hongtao Liu @ 2021-07-06 8:29 UTC (permalink / raw)
To: Richard Biener; +Cc: GCC Patches, Liu, Hongtao
On Tue, Jul 6, 2021 at 3:42 PM Richard Biener <rguenther@suse.de> wrote:
>
> On Tue, 6 Jul 2021, Hongtao Liu wrote:
>
> > On Mon, Jul 5, 2021 at 10:09 PM Richard Biener <rguenther@suse.de> wrote:
> > >
> > > This adds named expanders for vec_fmaddsub<mode>4 and
> > > vec_fmsubadd<mode>4 which map to x86 vfmaddsubXXXp{ds} and
> > > vfmsubaddXXXp{ds} instructions. This complements the previous
> > > addition of ADDSUB support.
> > >
> > > x86 lacks SUBADD and the negate variants of FMA with mixed
> > > plus minus so I did not add optabs or patterns for those but
> > > it would not be difficult if there's a target that has them.
> > > Maybe one of the complex fma patterns match those variants?
> > >
> > > I did not dare to rewrite the numerous patterns to the new
> > > canonical name but instead added two new expanders. Note I
> > > did not cover AVX512 since the existing patterns are separated
> > > and I have no easy way to test things there. Handling AVX512
> > > should be easy as followup though.
> > >
> > > Bootstrap and testing on x86_64-unknown-linux-gnu in progress.
> > >
> > > Any comments?
> > >
> > > Thanks,
> > > Richard.
> > >
> > > 2021-07-05 Richard Biener <rguenther@suse.de>
> > >
> > > * doc/md.texi (vec_fmaddsub<mode>4): Document.
> > > (vec_fmsubadd<mode>4): Likewise.
> > > * optabs.def (vec_fmaddsub$a4): Add.
> > > (vec_fmsubadd$a4): Likewise.
> > > * internal-fn.def (IFN_VEC_FMADDSUB): Add.
> > > (IFN_VEC_FMSUBADD): Likewise.
> > > * tree-vect-slp-patterns.c (addsub_pattern::recognize):
> > > Refactor to handle IFN_VEC_FMADDSUB and IFN_VEC_FMSUBADD.
> > > (addsub_pattern::build): Likewise.
> > > * tree-vect-slp.c (vect_optimize_slp): CFN_VEC_FMADDSUB
> > > and CFN_VEC_FMSUBADD are not transparent for permutes.
> > > * config/i386/sse.md (vec_fmaddsub<mode>4): New expander.
> > > (vec_fmsubadd<mode>4): Likewise.
> > >
> > > * gcc.target/i386/vect-fmaddsubXXXpd.c: New testcase.
> > > * gcc.target/i386/vect-fmaddsubXXXps.c: Likewise.
> > > * gcc.target/i386/vect-fmsubaddXXXpd.c: Likewise.
> > > * gcc.target/i386/vect-fmsubaddXXXps.c: Likewise.
> > > ---
> > > gcc/config/i386/sse.md | 19 ++
> > > gcc/doc/md.texi | 14 ++
> > > gcc/internal-fn.def | 3 +-
> > > gcc/optabs.def | 2 +
> > > .../gcc.target/i386/vect-fmaddsubXXXpd.c | 34 ++++
> > > .../gcc.target/i386/vect-fmaddsubXXXps.c | 34 ++++
> > > .../gcc.target/i386/vect-fmsubaddXXXpd.c | 34 ++++
> > > .../gcc.target/i386/vect-fmsubaddXXXps.c | 34 ++++
> > > gcc/tree-vect-slp-patterns.c | 192 +++++++++++++-----
> > > gcc/tree-vect-slp.c | 2 +
> > > 10 files changed, 311 insertions(+), 57 deletions(-)
> > > create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c
> > > create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c
> > >
> > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > > index bcf1605d147..6fc13c184bf 100644
> > > --- a/gcc/config/i386/sse.md
> > > +++ b/gcc/config/i386/sse.md
> > > @@ -4644,6 +4644,25 @@
> > > ;;
> > > ;; But this doesn't seem useful in practice.
> > >
> > > +(define_expand "vec_fmaddsub<mode>4"
> > > + [(set (match_operand:VF 0 "register_operand")
> > > + (unspec:VF
> > > + [(match_operand:VF 1 "nonimmediate_operand")
> > > + (match_operand:VF 2 "nonimmediate_operand")
> > > + (match_operand:VF 3 "nonimmediate_operand")]
> > > + UNSPEC_FMADDSUB))]
> > > + "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F")
> > > +
> > > +(define_expand "vec_fmsubadd<mode>4"
> > > + [(set (match_operand:VF 0 "register_operand")
> > > + (unspec:VF
> > > + [(match_operand:VF 1 "nonimmediate_operand")
> > > + (match_operand:VF 2 "nonimmediate_operand")
> > > + (neg:VF
> > > + (match_operand:VF 3 "nonimmediate_operand"))]
> > > + UNSPEC_FMADDSUB))]
> > > + "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F")
> > > +
> >
> > W/ condition like
> > "TARGET_FMA || TARGET_FMA4
> > || (<MODE_SIZE> == 64 || TARGET_AVX512VL)“?
> >
> > the original expander "fmaddsub_<mode>" is only used by builtins which
> > have it's own guard for AVX512VL, It doesn't matter if it doesn't have
> > TARGET_AVX512VL
> > BDESC (OPTION_MASK_ISA_AVX512VL, 0,
> > CODE_FOR_avx512vl_fmaddsub_v4df_mask,
> > "__builtin_ia32_vfmaddsubpd256_mask",
> > IX86_BUILTIN_VFMADDSUBPD256_MASK, UNKNOWN, (int)
> > V4DF_FTYPE_V4DF_V4DF_V4DF_UQI)
>
> OK, that seems to work!
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu - are the
> x86 backend changes OK?
>
Yes, LGTM.
> Thanks,
> Richard.
>
>
> This adds named expanders for vec_fmaddsub<mode>4 and
> vec_fmsubadd<mode>4 which map to x86 vfmaddsubXXXp{ds} and
> vfmsubaddXXXp{ds} instructions. This complements the previous
> addition of ADDSUB support.
>
> x86 lacks SUBADD and the negate variants of FMA with mixed
> plus minus so I did not add optabs or patterns for those but
> it would not be difficult if there's a target that has them.
>
> 2021-07-05 Richard Biener <rguenther@suse.de>
>
> * doc/md.texi (vec_fmaddsub<mode>4): Document.
> (vec_fmsubadd<mode>4): Likewise.
> * optabs.def (vec_fmaddsub$a4): Add.
> (vec_fmsubadd$a4): Likewise.
> * internal-fn.def (IFN_VEC_FMADDSUB): Add.
> (IFN_VEC_FMSUBADD): Likewise.
> * tree-vect-slp-patterns.c (addsub_pattern::recognize):
> Refactor to handle IFN_VEC_FMADDSUB and IFN_VEC_FMSUBADD.
> (addsub_pattern::build): Likewise.
> * tree-vect-slp.c (vect_optimize_slp): CFN_VEC_FMADDSUB
> and CFN_VEC_FMSUBADD are not transparent for permutes.
> * config/i386/sse.md (vec_fmaddsub<mode>4): New expander.
> (vec_fmsubadd<mode>4): Likewise.
>
> * gcc.target/i386/vect-fmaddsubXXXpd.c: New testcase.
> * gcc.target/i386/vect-fmaddsubXXXps.c: Likewise.
> * gcc.target/i386/vect-fmsubaddXXXpd.c: Likewise.
> * gcc.target/i386/vect-fmsubaddXXXps.c: Likewise.
> ---
> gcc/config/i386/sse.md | 19 ++
> gcc/doc/md.texi | 14 ++
> gcc/internal-fn.def | 3 +-
> gcc/optabs.def | 2 +
> .../gcc.target/i386/vect-fmaddsubXXXpd.c | 34 ++++
> .../gcc.target/i386/vect-fmaddsubXXXps.c | 34 ++++
> .../gcc.target/i386/vect-fmsubaddXXXpd.c | 34 ++++
> .../gcc.target/i386/vect-fmsubaddXXXps.c | 34 ++++
> gcc/tree-vect-slp-patterns.c | 192 +++++++++++++-----
> gcc/tree-vect-slp.c | 2 +
> 10 files changed, 311 insertions(+), 57 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c
> create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c
> create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c
> create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index bcf1605d147..17c9e571d5d 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -4644,6 +4644,25 @@
> ;;
> ;; But this doesn't seem useful in practice.
>
> +(define_expand "vec_fmaddsub<mode>4"
> + [(set (match_operand:VF 0 "register_operand")
> + (unspec:VF
> + [(match_operand:VF 1 "nonimmediate_operand")
> + (match_operand:VF 2 "nonimmediate_operand")
> + (match_operand:VF 3 "nonimmediate_operand")]
> + UNSPEC_FMADDSUB))]
> + "TARGET_FMA || TARGET_FMA4 || (<MODE_SIZE> == 64 || TARGET_AVX512VL)")
> +
> +(define_expand "vec_fmsubadd<mode>4"
> + [(set (match_operand:VF 0 "register_operand")
> + (unspec:VF
> + [(match_operand:VF 1 "nonimmediate_operand")
> + (match_operand:VF 2 "nonimmediate_operand")
> + (neg:VF
> + (match_operand:VF 3 "nonimmediate_operand"))]
> + UNSPEC_FMADDSUB))]
> + "TARGET_FMA || TARGET_FMA4 || (<MODE_SIZE> == 64 || TARGET_AVX512VL)")
> +
> (define_expand "fmaddsub_<mode>"
> [(set (match_operand:VF 0 "register_operand")
> (unspec:VF
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 1b918144330..cc92ebd26aa 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5688,6 +5688,20 @@ Alternating subtract, add with even lanes doing subtract and odd
> lanes doing addition. Operands 1 and 2 and the outout operand are vectors
> with mode @var{m}.
>
> +@cindex @code{vec_fmaddsub@var{m}4} instruction pattern
> +@item @samp{vec_fmaddsub@var{m}4}
> +Alternating multiply subtract, add with even lanes doing subtract and odd
> +lanes doing addition of the third operand to the multiplication result
> +of the first two operands. Operands 1, 2 and 3 and the outout operand are vectors
> +with mode @var{m}.
> +
> +@cindex @code{vec_fmsubadd@var{m}4} instruction pattern
> +@item @samp{vec_fmsubadd@var{m}4}
> +Alternating multiply add, subtract with even lanes doing addition and odd
> +lanes doing subtraction of the third operand to the multiplication result
> +of the first two operands. Operands 1, 2 and 3 and the outout operand are vectors
> +with mode @var{m}.
> +
> These instructions are not allowed to @code{FAIL}.
>
> @cindex @code{mulhisi3} instruction pattern
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index c3b8e730960..a7003d5da8e 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -282,7 +282,8 @@ DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary)
> DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL, ECF_CONST, cmul, binary)
> DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL_CONJ, ECF_CONST, cmul_conj, binary)
> DEF_INTERNAL_OPTAB_FN (VEC_ADDSUB, ECF_CONST, vec_addsub, binary)
> -
> +DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, vec_fmaddsub, ternary)
> +DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd, ternary)
>
> /* FP scales. */
> DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index 41ab2598eb6..51acc1be8f5 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -408,6 +408,8 @@ OPTAB_D (vec_widen_usubl_lo_optab, "vec_widen_usubl_lo_$a")
> OPTAB_D (vec_widen_uaddl_hi_optab, "vec_widen_uaddl_hi_$a")
> OPTAB_D (vec_widen_uaddl_lo_optab, "vec_widen_uaddl_lo_$a")
> OPTAB_D (vec_addsub_optab, "vec_addsub$a3")
> +OPTAB_D (vec_fmaddsub_optab, "vec_fmaddsub$a4")
> +OPTAB_D (vec_fmsubadd_optab, "vec_fmsubadd$a4")
>
> OPTAB_D (sync_add_optab, "sync_add$I$a")
> OPTAB_D (sync_and_optab, "sync_and$I$a")
> diff --git a/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c
> new file mode 100644
> index 00000000000..b30d10731a7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c
> @@ -0,0 +1,34 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target fma } */
> +/* { dg-options "-O3 -mfma -save-temps" } */
> +
> +#include "fma-check.h"
> +
> +void __attribute__((noipa))
> +check_fmaddsub (double * __restrict a, double *b, double *c, int n)
> +{
> + for (int i = 0; i < n; ++i)
> + {
> + a[2*i + 0] = b[2*i + 0] * c[2*i + 0] - a[2*i + 0];
> + a[2*i + 1] = b[2*i + 1] * c[2*i + 1] + a[2*i + 1];
> + }
> +}
> +
> +static void
> +fma_test (void)
> +{
> + double a[4], b[4], c[4];
> + for (int i = 0; i < 4; ++i)
> + {
> + a[i] = i;
> + b[i] = 3*i;
> + c[i] = 7*i;
> + }
> + check_fmaddsub (a, b, c, 2);
> + const double d[4] = { 0., 22., 82., 192. };
> + for (int i = 0; i < 4; ++i)
> + if (a[i] != d[i])
> + __builtin_abort ();
> +}
> +
> +/* { dg-final { scan-assembler "fmaddsub...pd" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c
> new file mode 100644
> index 00000000000..cd2af8725a3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c
> @@ -0,0 +1,34 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target fma } */
> +/* { dg-options "-O3 -mfma -save-temps" } */
> +
> +#include "fma-check.h"
> +
> +void __attribute__((noipa))
> +check_fmaddsub (float * __restrict a, float *b, float *c, int n)
> +{
> + for (int i = 0; i < n; ++i)
> + {
> + a[2*i + 0] = b[2*i + 0] * c[2*i + 0] - a[2*i + 0];
> + a[2*i + 1] = b[2*i + 1] * c[2*i + 1] + a[2*i + 1];
> + }
> +}
> +
> +static void
> +fma_test (void)
> +{
> + float a[4], b[4], c[4];
> + for (int i = 0; i < 4; ++i)
> + {
> + a[i] = i;
> + b[i] = 3*i;
> + c[i] = 7*i;
> + }
> + check_fmaddsub (a, b, c, 2);
> + const float d[4] = { 0., 22., 82., 192. };
> + for (int i = 0; i < 4; ++i)
> + if (a[i] != d[i])
> + __builtin_abort ();
> +}
> +
> +/* { dg-final { scan-assembler "fmaddsub...ps" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c
> new file mode 100644
> index 00000000000..7ca2a275cc1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c
> @@ -0,0 +1,34 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target fma } */
> +/* { dg-options "-O3 -mfma -save-temps" } */
> +
> +#include "fma-check.h"
> +
> +void __attribute__((noipa))
> +check_fmsubadd (double * __restrict a, double *b, double *c, int n)
> +{
> + for (int i = 0; i < n; ++i)
> + {
> + a[2*i + 0] = b[2*i + 0] * c[2*i + 0] + a[2*i + 0];
> + a[2*i + 1] = b[2*i + 1] * c[2*i + 1] - a[2*i + 1];
> + }
> +}
> +
> +static void
> +fma_test (void)
> +{
> + double a[4], b[4], c[4];
> + for (int i = 0; i < 4; ++i)
> + {
> + a[i] = i;
> + b[i] = 3*i;
> + c[i] = 7*i;
> + }
> + check_fmsubadd (a, b, c, 2);
> + const double d[4] = { 0., 20., 86., 186. };
> + for (int i = 0; i < 4; ++i)
> + if (a[i] != d[i])
> + __builtin_abort ();
> +}
> +
> +/* { dg-final { scan-assembler "fmsubadd...pd" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c
> new file mode 100644
> index 00000000000..9ddd0e423db
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c
> @@ -0,0 +1,34 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target fma } */
> +/* { dg-options "-O3 -mfma -save-temps" } */
> +
> +#include "fma-check.h"
> +
> +void __attribute__((noipa))
> +check_fmsubadd (float * __restrict a, float *b, float *c, int n)
> +{
> + for (int i = 0; i < n; ++i)
> + {
> + a[2*i + 0] = b[2*i + 0] * c[2*i + 0] + a[2*i + 0];
> + a[2*i + 1] = b[2*i + 1] * c[2*i + 1] - a[2*i + 1];
> + }
> +}
> +
> +static void
> +fma_test (void)
> +{
> + float a[4], b[4], c[4];
> + for (int i = 0; i < 4; ++i)
> + {
> + a[i] = i;
> + b[i] = 3*i;
> + c[i] = 7*i;
> + }
> + check_fmsubadd (a, b, c, 2);
> + const float d[4] = { 0., 20., 86., 186. };
> + for (int i = 0; i < 4; ++i)
> + if (a[i] != d[i])
> + __builtin_abort ();
> +}
> +
> +/* { dg-final { scan-assembler "fmsubadd...ps" } } */
> diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
> index 2671f91972d..f774cac4a4d 100644
> --- a/gcc/tree-vect-slp-patterns.c
> +++ b/gcc/tree-vect-slp-patterns.c
> @@ -1496,8 +1496,8 @@ complex_operations_pattern::build (vec_info * /* vinfo */)
> class addsub_pattern : public vect_pattern
> {
> public:
> - addsub_pattern (slp_tree *node)
> - : vect_pattern (node, NULL, IFN_VEC_ADDSUB) {};
> + addsub_pattern (slp_tree *node, internal_fn ifn)
> + : vect_pattern (node, NULL, ifn) {};
>
> void build (vec_info *);
>
> @@ -1510,46 +1510,68 @@ addsub_pattern::recognize (slp_tree_to_load_perm_map_t *, slp_tree *node_)
> {
> slp_tree node = *node_;
> if (SLP_TREE_CODE (node) != VEC_PERM_EXPR
> - || SLP_TREE_CHILDREN (node).length () != 2)
> + || SLP_TREE_CHILDREN (node).length () != 2
> + || SLP_TREE_LANE_PERMUTATION (node).length () % 2)
> return NULL;
>
> /* Match a blend of a plus and a minus op with the same number of plus and
> minus lanes on the same operands. */
> - slp_tree sub = SLP_TREE_CHILDREN (node)[0];
> - slp_tree add = SLP_TREE_CHILDREN (node)[1];
> - bool swapped_p = false;
> - if (vect_match_expression_p (sub, PLUS_EXPR))
> - {
> - std::swap (add, sub);
> - swapped_p = true;
> - }
> - if (!(vect_match_expression_p (add, PLUS_EXPR)
> - && vect_match_expression_p (sub, MINUS_EXPR)))
> + unsigned l0 = SLP_TREE_LANE_PERMUTATION (node)[0].first;
> + unsigned l1 = SLP_TREE_LANE_PERMUTATION (node)[1].first;
> + if (l0 == l1)
> + return NULL;
> + bool l0add_p = vect_match_expression_p (SLP_TREE_CHILDREN (node)[l0],
> + PLUS_EXPR);
> + if (!l0add_p
> + && !vect_match_expression_p (SLP_TREE_CHILDREN (node)[l0], MINUS_EXPR))
> + return NULL;
> + bool l1add_p = vect_match_expression_p (SLP_TREE_CHILDREN (node)[l1],
> + PLUS_EXPR);
> + if (!l1add_p
> + && !vect_match_expression_p (SLP_TREE_CHILDREN (node)[l1], MINUS_EXPR))
> return NULL;
> - if (!((SLP_TREE_CHILDREN (sub)[0] == SLP_TREE_CHILDREN (add)[0]
> - && SLP_TREE_CHILDREN (sub)[1] == SLP_TREE_CHILDREN (add)[1])
> - || (SLP_TREE_CHILDREN (sub)[0] == SLP_TREE_CHILDREN (add)[1]
> - && SLP_TREE_CHILDREN (sub)[1] == SLP_TREE_CHILDREN (add)[0])))
> +
> + slp_tree l0node = SLP_TREE_CHILDREN (node)[l0];
> + slp_tree l1node = SLP_TREE_CHILDREN (node)[l1];
> + if (!((SLP_TREE_CHILDREN (l0node)[0] == SLP_TREE_CHILDREN (l1node)[0]
> + && SLP_TREE_CHILDREN (l0node)[1] == SLP_TREE_CHILDREN (l1node)[1])
> + || (SLP_TREE_CHILDREN (l0node)[0] == SLP_TREE_CHILDREN (l1node)[1]
> + && SLP_TREE_CHILDREN (l0node)[1] == SLP_TREE_CHILDREN (l1node)[0])))
> return NULL;
>
> for (unsigned i = 0; i < SLP_TREE_LANE_PERMUTATION (node).length (); ++i)
> {
> std::pair<unsigned, unsigned> perm = SLP_TREE_LANE_PERMUTATION (node)[i];
> - if (swapped_p)
> - perm.first = perm.first == 0 ? 1 : 0;
> - /* It has to be alternating -, +, -, ...
> + /* It has to be alternating -, +, -,
> While we could permute the .ADDSUB inputs and the .ADDSUB output
> that's only profitable over the add + sub + blend if at least
> one of the permute is optimized which we can't determine here. */
> - if (perm.first != (i & 1)
> + if (perm.first != ((i & 1) ? l1 : l0)
> || perm.second != i)
> return NULL;
> }
>
> - if (!vect_pattern_validate_optab (IFN_VEC_ADDSUB, node))
> - return NULL;
> + /* Now we have either { -, +, -, + ... } (!l0add_p) or { +, -, +, - ... }
> + (l0add_p), see whether we have FMA variants. */
> + if (!l0add_p
> + && vect_match_expression_p (SLP_TREE_CHILDREN (l0node)[0], MULT_EXPR))
> + {
> + /* (c * d) -+ a */
> + if (vect_pattern_validate_optab (IFN_VEC_FMADDSUB, node))
> + return new addsub_pattern (node_, IFN_VEC_FMADDSUB);
> + }
> + else if (l0add_p
> + && vect_match_expression_p (SLP_TREE_CHILDREN (l1node)[0], MULT_EXPR))
> + {
> + /* (c * d) +- a */
> + if (vect_pattern_validate_optab (IFN_VEC_FMSUBADD, node))
> + return new addsub_pattern (node_, IFN_VEC_FMSUBADD);
> + }
>
> - return new addsub_pattern (node_);
> + if (!l0add_p && vect_pattern_validate_optab (IFN_VEC_ADDSUB, node))
> + return new addsub_pattern (node_, IFN_VEC_ADDSUB);
> +
> + return NULL;
> }
>
> void
> @@ -1557,38 +1579,96 @@ addsub_pattern::build (vec_info *vinfo)
> {
> slp_tree node = *m_node;
>
> - slp_tree sub = SLP_TREE_CHILDREN (node)[0];
> - slp_tree add = SLP_TREE_CHILDREN (node)[1];
> - if (vect_match_expression_p (sub, PLUS_EXPR))
> - std::swap (add, sub);
> -
> - /* Modify the blend node in-place. */
> - SLP_TREE_CHILDREN (node)[0] = SLP_TREE_CHILDREN (sub)[0];
> - SLP_TREE_CHILDREN (node)[1] = SLP_TREE_CHILDREN (sub)[1];
> - SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[0])++;
> - SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[1])++;
> -
> - /* Build IFN_VEC_ADDSUB from the sub representative operands. */
> - stmt_vec_info rep = SLP_TREE_REPRESENTATIVE (sub);
> - gcall *call = gimple_build_call_internal (IFN_VEC_ADDSUB, 2,
> - gimple_assign_rhs1 (rep->stmt),
> - gimple_assign_rhs2 (rep->stmt));
> - gimple_call_set_lhs (call, make_ssa_name
> - (TREE_TYPE (gimple_assign_lhs (rep->stmt))));
> - gimple_call_set_nothrow (call, true);
> - gimple_set_bb (call, gimple_bb (rep->stmt));
> - stmt_vec_info new_rep = vinfo->add_pattern_stmt (call, rep);
> - SLP_TREE_REPRESENTATIVE (node) = new_rep;
> - STMT_VINFO_RELEVANT (new_rep) = vect_used_in_scope;
> - STMT_SLP_TYPE (new_rep) = pure_slp;
> - STMT_VINFO_VECTYPE (new_rep) = SLP_TREE_VECTYPE (node);
> - STMT_VINFO_SLP_VECT_ONLY_PATTERN (new_rep) = true;
> - STMT_VINFO_REDUC_DEF (new_rep) = STMT_VINFO_REDUC_DEF (vect_orig_stmt (rep));
> - SLP_TREE_CODE (node) = ERROR_MARK;
> - SLP_TREE_LANE_PERMUTATION (node).release ();
> -
> - vect_free_slp_tree (sub);
> - vect_free_slp_tree (add);
> + unsigned l0 = SLP_TREE_LANE_PERMUTATION (node)[0].first;
> + unsigned l1 = SLP_TREE_LANE_PERMUTATION (node)[1].first;
> +
> + switch (m_ifn)
> + {
> + case IFN_VEC_ADDSUB:
> + {
> + slp_tree sub = SLP_TREE_CHILDREN (node)[l0];
> + slp_tree add = SLP_TREE_CHILDREN (node)[l1];
> +
> + /* Modify the blend node in-place. */
> + SLP_TREE_CHILDREN (node)[0] = SLP_TREE_CHILDREN (sub)[0];
> + SLP_TREE_CHILDREN (node)[1] = SLP_TREE_CHILDREN (sub)[1];
> + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[0])++;
> + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[1])++;
> +
> + /* Build IFN_VEC_ADDSUB from the sub representative operands. */
> + stmt_vec_info rep = SLP_TREE_REPRESENTATIVE (sub);
> + gcall *call = gimple_build_call_internal (IFN_VEC_ADDSUB, 2,
> + gimple_assign_rhs1 (rep->stmt),
> + gimple_assign_rhs2 (rep->stmt));
> + gimple_call_set_lhs (call, make_ssa_name
> + (TREE_TYPE (gimple_assign_lhs (rep->stmt))));
> + gimple_call_set_nothrow (call, true);
> + gimple_set_bb (call, gimple_bb (rep->stmt));
> + stmt_vec_info new_rep = vinfo->add_pattern_stmt (call, rep);
> + SLP_TREE_REPRESENTATIVE (node) = new_rep;
> + STMT_VINFO_RELEVANT (new_rep) = vect_used_in_scope;
> + STMT_SLP_TYPE (new_rep) = pure_slp;
> + STMT_VINFO_VECTYPE (new_rep) = SLP_TREE_VECTYPE (node);
> + STMT_VINFO_SLP_VECT_ONLY_PATTERN (new_rep) = true;
> + STMT_VINFO_REDUC_DEF (new_rep) = STMT_VINFO_REDUC_DEF (vect_orig_stmt (rep));
> + SLP_TREE_CODE (node) = ERROR_MARK;
> + SLP_TREE_LANE_PERMUTATION (node).release ();
> +
> + vect_free_slp_tree (sub);
> + vect_free_slp_tree (add);
> + break;
> + }
> + case IFN_VEC_FMADDSUB:
> + case IFN_VEC_FMSUBADD:
> + {
> + slp_tree sub, add;
> + if (m_ifn == IFN_VEC_FMADDSUB)
> + {
> + sub = SLP_TREE_CHILDREN (node)[l0];
> + add = SLP_TREE_CHILDREN (node)[l1];
> + }
> + else /* m_ifn == IFN_VEC_FMSUBADD */
> + {
> + sub = SLP_TREE_CHILDREN (node)[l1];
> + add = SLP_TREE_CHILDREN (node)[l0];
> + }
> + slp_tree mul = SLP_TREE_CHILDREN (sub)[0];
> + /* Modify the blend node in-place. */
> + SLP_TREE_CHILDREN (node).safe_grow (3, true);
> + SLP_TREE_CHILDREN (node)[0] = SLP_TREE_CHILDREN (mul)[0];
> + SLP_TREE_CHILDREN (node)[1] = SLP_TREE_CHILDREN (mul)[1];
> + SLP_TREE_CHILDREN (node)[2] = SLP_TREE_CHILDREN (sub)[1];
> + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[0])++;
> + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[1])++;
> + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[2])++;
> +
> + /* Build IFN_VEC_FMADDSUB from the mul/sub representative operands. */
> + stmt_vec_info srep = SLP_TREE_REPRESENTATIVE (sub);
> + stmt_vec_info mrep = SLP_TREE_REPRESENTATIVE (mul);
> + gcall *call = gimple_build_call_internal (m_ifn, 3,
> + gimple_assign_rhs1 (mrep->stmt),
> + gimple_assign_rhs2 (mrep->stmt),
> + gimple_assign_rhs2 (srep->stmt));
> + gimple_call_set_lhs (call, make_ssa_name
> + (TREE_TYPE (gimple_assign_lhs (srep->stmt))));
> + gimple_call_set_nothrow (call, true);
> + gimple_set_bb (call, gimple_bb (srep->stmt));
> + stmt_vec_info new_rep = vinfo->add_pattern_stmt (call, srep);
> + SLP_TREE_REPRESENTATIVE (node) = new_rep;
> + STMT_VINFO_RELEVANT (new_rep) = vect_used_in_scope;
> + STMT_SLP_TYPE (new_rep) = pure_slp;
> + STMT_VINFO_VECTYPE (new_rep) = SLP_TREE_VECTYPE (node);
> + STMT_VINFO_SLP_VECT_ONLY_PATTERN (new_rep) = true;
> + STMT_VINFO_REDUC_DEF (new_rep) = STMT_VINFO_REDUC_DEF (vect_orig_stmt (srep));
> + SLP_TREE_CODE (node) = ERROR_MARK;
> + SLP_TREE_LANE_PERMUTATION (node).release ();
> +
> + vect_free_slp_tree (sub);
> + vect_free_slp_tree (add);
> + break;
> + }
> + default:;
> + }
> }
>
> /*******************************************************************************
> diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
> index f08797c2bc0..5357cd0e7a4 100644
> --- a/gcc/tree-vect-slp.c
> +++ b/gcc/tree-vect-slp.c
> @@ -3728,6 +3728,8 @@ vect_optimize_slp (vec_info *vinfo)
> case CFN_COMPLEX_MUL:
> case CFN_COMPLEX_MUL_CONJ:
> case CFN_VEC_ADDSUB:
> + case CFN_VEC_FMADDSUB:
> + case CFN_VEC_FMSUBADD:
> vertices[idx].perm_in = 0;
> vertices[idx].perm_out = 0;
> default:;
> --
> 2.26.2
--
BR,
Hongtao
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Add FMADDSUB and FMSUBADD SLP vectorization patterns and optabs
2021-07-06 8:29 ` Hongtao Liu
@ 2021-07-07 7:30 ` Hongtao Liu
2021-07-07 7:57 ` Richard Biener
0 siblings, 1 reply; 8+ messages in thread
From: Hongtao Liu @ 2021-07-07 7:30 UTC (permalink / raw)
To: Richard Biener; +Cc: GCC Patches, Liu, Hongtao
[-- Attachment #1: Type: text/plain, Size: 227 bytes --]
> > > > and I have no easy way to test things there. Handling AVX512
> > > > should be easy as followup though.
Here's the patch adding avx512f tests for FMADDSUB/FMSUBADD slp patterns.
Pushed to the trunk.
--
BR,
Hongtao
[-- Attachment #2: 0001-i386-Add-avx512-tests-for-MADDSUB-and-FMSUBADD-SLP-v.patch --]
[-- Type: text/x-patch, Size: 8576 bytes --]
From 2dc666974cca3a62686f4d7135ca36c25d61a802 Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>
Date: Wed, 7 Jul 2021 15:19:42 +0800
Subject: [PATCH] [i386] Add avx512 tests for MADDSUB and FMSUBADD SLP
vectorization patterns.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx512f-vect-fmaddsubXXXpd.c: New test.
* gcc.target/i386/avx512f-vect-fmaddsubXXXps.c: New test.
* gcc.target/i386/avx512f-vect-fmsubaddXXXpd.c: New test.
* gcc.target/i386/avx512f-vect-fmsubaddXXXps.c: New test.
---
.../i386/avx512f-vect-fmaddsubXXXpd.c | 41 +++++++++++++++
.../i386/avx512f-vect-fmaddsubXXXps.c | 50 +++++++++++++++++++
.../i386/avx512f-vect-fmsubaddXXXpd.c | 41 +++++++++++++++
.../i386/avx512f-vect-fmsubaddXXXps.c | 50 +++++++++++++++++++
4 files changed, 182 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-vect-fmaddsubXXXpd.c
create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-vect-fmaddsubXXXps.c
create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-vect-fmsubaddXXXpd.c
create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-vect-fmsubaddXXXps.c
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vect-fmaddsubXXXpd.c b/gcc/testsuite/gcc.target/i386/avx512f-vect-fmaddsubXXXpd.c
new file mode 100644
index 00000000000..734f9e01443
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vect-fmaddsubXXXpd.c
@@ -0,0 +1,41 @@
+/* { dg-do run } */
+/* { dg-require-effective-target avx512f } */
+/* { dg-options "-O3 -mfma -save-temps -mavx512f -mprefer-vector-width=512" } */
+
+#include "fma-check.h"
+void __attribute__((noipa))
+check_fmaddsub (double * __restrict a, double *b, double *c, int n)
+{
+ for (int i = 0; i < n; ++i)
+ {
+ a[8*i + 0] = b[8*i + 0] * c[8*i + 0] - a[8*i + 0];
+ a[8*i + 1] = b[8*i + 1] * c[8*i + 1] + a[8*i + 1];
+ a[8*i + 2] = b[8*i + 2] * c[8*i + 2] - a[8*i + 2];
+ a[8*i + 3] = b[8*i + 3] * c[8*i + 3] + a[8*i + 3];
+ a[8*i + 4] = b[8*i + 4] * c[8*i + 4] - a[8*i + 4];
+ a[8*i + 5] = b[8*i + 5] * c[8*i + 5] + a[8*i + 5];
+ a[8*i + 6] = b[8*i + 6] * c[8*i + 6] - a[8*i + 6];
+ a[8*i + 7] = b[8*i + 7] * c[8*i + 7] + a[8*i + 7];
+ }
+}
+
+static void
+fma_test (void)
+{
+ if (!__builtin_cpu_supports ("avx512f"))
+ return;
+ double a[8], b[8], c[8];
+ for (int i = 0; i < 8; ++i)
+ {
+ a[i] = i;
+ b[i] = 3*i;
+ c[i] = 7*i;
+ }
+ check_fmaddsub (a, b, c, 1);
+ const double d[8] = { 0., 22., 82., 192., 332., 530., 750., 1036.};
+ for (int i = 0; i < 8; ++i)
+ if (a[i] != d[i])
+ __builtin_abort ();
+}
+
+/* { dg-final { scan-assembler {(?n)fmaddsub...pd[ \t].*%zmm[0-9]} } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vect-fmaddsubXXXps.c b/gcc/testsuite/gcc.target/i386/avx512f-vect-fmaddsubXXXps.c
new file mode 100644
index 00000000000..ae196c5ef48
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vect-fmaddsubXXXps.c
@@ -0,0 +1,50 @@
+/* { dg-do run } */
+/* { dg-require-effective-target avx512f } */
+/* { dg-options "-O3 -mavx512f -mprefer-vector-width=512 -save-temps" } */
+
+#include "fma-check.h"
+void __attribute__((noipa))
+check_fmaddsub (float * __restrict a, float *b, float *c, int n)
+{
+ for (int i = 0; i < n; ++i)
+ {
+ a[16*i + 0] = b[16*i + 0] * c[16*i + 0] - a[16*i + 0];
+ a[16*i + 1] = b[16*i + 1] * c[16*i + 1] + a[16*i + 1];
+ a[16*i + 2] = b[16*i + 2] * c[16*i + 2] - a[16*i + 2];
+ a[16*i + 3] = b[16*i + 3] * c[16*i + 3] + a[16*i + 3];
+ a[16*i + 4] = b[16*i + 4] * c[16*i + 4] - a[16*i + 4];
+ a[16*i + 5] = b[16*i + 5] * c[16*i + 5] + a[16*i + 5];
+ a[16*i + 6] = b[16*i + 6] * c[16*i + 6] - a[16*i + 6];
+ a[16*i + 7] = b[16*i + 7] * c[16*i + 7] + a[16*i + 7];
+ a[16*i + 8] = b[16*i + 8] * c[16*i + 8] - a[16*i + 8];
+ a[16*i + 9] = b[16*i + 9] * c[16*i + 9] + a[16*i + 9];
+ a[16*i + 10] = b[16*i + 10] * c[16*i + 10] - a[16*i + 10];
+ a[16*i + 11] = b[16*i + 11] * c[16*i + 11] + a[16*i + 11];
+ a[16*i + 12] = b[16*i + 12] * c[16*i + 12] - a[16*i + 12];
+ a[16*i + 13] = b[16*i + 13] * c[16*i + 13] + a[16*i + 13];
+ a[16*i + 14] = b[16*i + 14] * c[16*i + 14] - a[16*i + 14];
+ a[16*i + 15] = b[16*i + 15] * c[16*i + 15] + a[16*i + 15];
+ }
+}
+
+static void
+fma_test (void)
+{
+ if (!__builtin_cpu_supports ("avx512f"))
+ return;
+ float a[16], b[16], c[16];
+ for (int i = 0; i < 16; ++i)
+ {
+ a[i] = i;
+ b[i] = 3*i;
+ c[i] = 7*i;
+ }
+ check_fmaddsub (a, b, c, 1);
+ const float d[16] = { 0., 22., 82., 192., 332., 530., 750., 1036.,
+ 1336, 1710., 2090., 2552., 3012., 3562., 4102., 4740.};
+ for (int i = 0; i < 16; ++i)
+ if (a[i] != d[i])
+ __builtin_abort ();
+}
+
+/* { dg-final { scan-assembler {(?n)fmaddsub...ps[ \t].*%zmm[0-9]} } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vect-fmsubaddXXXpd.c b/gcc/testsuite/gcc.target/i386/avx512f-vect-fmsubaddXXXpd.c
new file mode 100644
index 00000000000..cde76db1755
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vect-fmsubaddXXXpd.c
@@ -0,0 +1,41 @@
+/* { dg-do run } */
+/* { dg-require-effective-target avx512f } */
+/* { dg-options "-O3 -mavx512f -mprefer-vector-width=512 -save-temps" } */
+
+#include "fma-check.h"
+void __attribute__((noipa))
+check_fmaddsub (double * __restrict a, double *b, double *c, int n)
+{
+ for (int i = 0; i < n; ++i)
+ {
+ a[8*i + 0] = b[8*i + 0] * c[8*i + 0] + a[8*i + 0];
+ a[8*i + 1] = b[8*i + 1] * c[8*i + 1] - a[8*i + 1];
+ a[8*i + 2] = b[8*i + 2] * c[8*i + 2] + a[8*i + 2];
+ a[8*i + 3] = b[8*i + 3] * c[8*i + 3] - a[8*i + 3];
+ a[8*i + 4] = b[8*i + 4] * c[8*i + 4] + a[8*i + 4];
+ a[8*i + 5] = b[8*i + 5] * c[8*i + 5] - a[8*i + 5];
+ a[8*i + 6] = b[8*i + 6] * c[8*i + 6] + a[8*i + 6];
+ a[8*i + 7] = b[8*i + 7] * c[8*i + 7] - a[8*i + 7];
+ }
+}
+
+static void
+fma_test (void)
+{
+ if (!__builtin_cpu_supports ("avx512f"))
+ return;
+ double a[8], b[8], c[8];
+ for (int i = 0; i < 8; ++i)
+ {
+ a[i] = i;
+ b[i] = 3*i;
+ c[i] = 7*i;
+ }
+ check_fmaddsub (a, b, c, 1);
+ const double d[8] = { 0., 20., 86., 186., 340., 520., 762., 1022.};
+ for (int i = 0; i < 8; ++i)
+ if (a[i] != d[i])
+ __builtin_abort ();
+}
+
+/* { dg-final { scan-assembler {(?n)fmsubadd...pd[ \t].*%zmm[0-9]} } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vect-fmsubaddXXXps.c b/gcc/testsuite/gcc.target/i386/avx512f-vect-fmsubaddXXXps.c
new file mode 100644
index 00000000000..59de39f4112
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vect-fmsubaddXXXps.c
@@ -0,0 +1,50 @@
+/* { dg-do run } */
+/* { dg-require-effective-target avx512f } */
+/* { dg-options "-O3 -mavx512f -mprefer-vector-width=512 -save-temps" } */
+
+#include "fma-check.h"
+void __attribute__((noipa))
+check_fmaddsub (float * __restrict a, float *b, float *c, int n)
+{
+ for (int i = 0; i < n; ++i)
+ {
+ a[16*i + 0] = b[16*i + 0] * c[16*i + 0] + a[16*i + 0];
+ a[16*i + 1] = b[16*i + 1] * c[16*i + 1] - a[16*i + 1];
+ a[16*i + 2] = b[16*i + 2] * c[16*i + 2] + a[16*i + 2];
+ a[16*i + 3] = b[16*i + 3] * c[16*i + 3] - a[16*i + 3];
+ a[16*i + 4] = b[16*i + 4] * c[16*i + 4] + a[16*i + 4];
+ a[16*i + 5] = b[16*i + 5] * c[16*i + 5] - a[16*i + 5];
+ a[16*i + 6] = b[16*i + 6] * c[16*i + 6] + a[16*i + 6];
+ a[16*i + 7] = b[16*i + 7] * c[16*i + 7] - a[16*i + 7];
+ a[16*i + 8] = b[16*i + 8] * c[16*i + 8] + a[16*i + 8];
+ a[16*i + 9] = b[16*i + 9] * c[16*i + 9] - a[16*i + 9];
+ a[16*i + 10] = b[16*i + 10] * c[16*i + 10] + a[16*i + 10];
+ a[16*i + 11] = b[16*i + 11] * c[16*i + 11] - a[16*i + 11];
+ a[16*i + 12] = b[16*i + 12] * c[16*i + 12] + a[16*i + 12];
+ a[16*i + 13] = b[16*i + 13] * c[16*i + 13] - a[16*i + 13];
+ a[16*i + 14] = b[16*i + 14] * c[16*i + 14] + a[16*i + 14];
+ a[16*i + 15] = b[16*i + 15] * c[16*i + 15] - a[16*i + 15];
+ }
+}
+
+static void
+fma_test (void)
+{
+ if (!__builtin_cpu_supports ("avx512f"))
+ return;
+ float a[16], b[16], c[16];
+ for (int i = 0; i < 16; ++i)
+ {
+ a[i] = i;
+ b[i] = 3*i;
+ c[i] = 7*i;
+ }
+ check_fmaddsub (a, b, c, 1);
+ const float d[16] = { 0., 20., 86., 186., 340., 520., 762., 1022.,
+ 1352, 1692., 2110., 2530., 3036., 3536., 4130., 4710.};
+ for (int i = 0; i < 16; ++i)
+ if (a[i] != d[i])
+ __builtin_abort ();
+}
+
+/* { dg-final { scan-assembler {(?n)fmsubadd...ps[ \t].*%zmm[0-9]} } } */
--
2.18.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Add FMADDSUB and FMSUBADD SLP vectorization patterns and optabs
2021-07-07 7:30 ` Hongtao Liu
@ 2021-07-07 7:57 ` Richard Biener
0 siblings, 0 replies; 8+ messages in thread
From: Richard Biener @ 2021-07-07 7:57 UTC (permalink / raw)
To: Hongtao Liu; +Cc: GCC Patches, Liu, Hongtao
On Wed, 7 Jul 2021, Hongtao Liu wrote:
> > > > > and I have no easy way to test things there. Handling AVX512
> > > > > should be easy as followup though.
>
> Here's the patch adding avx512f tests for FMADDSUB/FMSUBADD slp patterns.
> Pushed to the trunk.
Thanks!
Richard.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2021-07-07 7:57 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-05 14:09 [PATCH] Add FMADDSUB and FMSUBADD SLP vectorization patterns and optabs Richard Biener
2021-07-05 14:25 ` Richard Biener
2021-07-05 14:38 ` Richard Biener
2021-07-06 2:16 ` Hongtao Liu
2021-07-06 7:42 ` Richard Biener
2021-07-06 8:29 ` Hongtao Liu
2021-07-07 7:30 ` Hongtao Liu
2021-07-07 7:57 ` Richard Biener
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).