* [PATCH] Add FMADDSUB and FMSUBADD SLP vectorization patterns and optabs @ 2021-07-05 14:09 Richard Biener 2021-07-05 14:25 ` Richard Biener 2021-07-06 2:16 ` Hongtao Liu 0 siblings, 2 replies; 8+ messages in thread From: Richard Biener @ 2021-07-05 14:09 UTC (permalink / raw) To: gcc-patches; +Cc: tamar.christina, ubizjak, hongtao.liu This adds named expanders for vec_fmaddsub<mode>4 and vec_fmsubadd<mode>4 which map to x86 vfmaddsubXXXp{ds} and vfmsubaddXXXp{ds} instructions. This complements the previous addition of ADDSUB support. x86 lacks SUBADD and the negate variants of FMA with mixed plus minus so I did not add optabs or patterns for those but it would not be difficult if there's a target that has them. Maybe one of the complex fma patterns match those variants? I did not dare to rewrite the numerous patterns to the new canonical name but instead added two new expanders. Note I did not cover AVX512 since the existing patterns are separated and I have no easy way to test things there. Handling AVX512 should be easy as followup though. Bootstrap and testing on x86_64-unknown-linux-gnu in progress. Any comments? Thanks, Richard. 2021-07-05 Richard Biener <rguenther@suse.de> * doc/md.texi (vec_fmaddsub<mode>4): Document. (vec_fmsubadd<mode>4): Likewise. * optabs.def (vec_fmaddsub$a4): Add. (vec_fmsubadd$a4): Likewise. * internal-fn.def (IFN_VEC_FMADDSUB): Add. (IFN_VEC_FMSUBADD): Likewise. * tree-vect-slp-patterns.c (addsub_pattern::recognize): Refactor to handle IFN_VEC_FMADDSUB and IFN_VEC_FMSUBADD. (addsub_pattern::build): Likewise. * tree-vect-slp.c (vect_optimize_slp): CFN_VEC_FMADDSUB and CFN_VEC_FMSUBADD are not transparent for permutes. * config/i386/sse.md (vec_fmaddsub<mode>4): New expander. (vec_fmsubadd<mode>4): Likewise. * gcc.target/i386/vect-fmaddsubXXXpd.c: New testcase. * gcc.target/i386/vect-fmaddsubXXXps.c: Likewise. * gcc.target/i386/vect-fmsubaddXXXpd.c: Likewise. * gcc.target/i386/vect-fmsubaddXXXps.c: Likewise. --- gcc/config/i386/sse.md | 19 ++ gcc/doc/md.texi | 14 ++ gcc/internal-fn.def | 3 +- gcc/optabs.def | 2 + .../gcc.target/i386/vect-fmaddsubXXXpd.c | 34 ++++ .../gcc.target/i386/vect-fmaddsubXXXps.c | 34 ++++ .../gcc.target/i386/vect-fmsubaddXXXpd.c | 34 ++++ .../gcc.target/i386/vect-fmsubaddXXXps.c | 34 ++++ gcc/tree-vect-slp-patterns.c | 192 +++++++++++++----- gcc/tree-vect-slp.c | 2 + 10 files changed, 311 insertions(+), 57 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index bcf1605d147..6fc13c184bf 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -4644,6 +4644,25 @@ ;; ;; But this doesn't seem useful in practice. +(define_expand "vec_fmaddsub<mode>4" + [(set (match_operand:VF 0 "register_operand") + (unspec:VF + [(match_operand:VF 1 "nonimmediate_operand") + (match_operand:VF 2 "nonimmediate_operand") + (match_operand:VF 3 "nonimmediate_operand")] + UNSPEC_FMADDSUB))] + "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F") + +(define_expand "vec_fmsubadd<mode>4" + [(set (match_operand:VF 0 "register_operand") + (unspec:VF + [(match_operand:VF 1 "nonimmediate_operand") + (match_operand:VF 2 "nonimmediate_operand") + (neg:VF + (match_operand:VF 3 "nonimmediate_operand"))] + UNSPEC_FMADDSUB))] + "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F") + (define_expand "fmaddsub_<mode>" [(set (match_operand:VF 0 "register_operand") (unspec:VF diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 1b918144330..cc92ebd26aa 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -5688,6 +5688,20 @@ Alternating subtract, add with even lanes doing subtract and odd lanes doing addition. Operands 1 and 2 and the outout operand are vectors with mode @var{m}. +@cindex @code{vec_fmaddsub@var{m}4} instruction pattern +@item @samp{vec_fmaddsub@var{m}4} +Alternating multiply subtract, add with even lanes doing subtract and odd +lanes doing addition of the third operand to the multiplication result +of the first two operands. Operands 1, 2 and 3 and the outout operand are vectors +with mode @var{m}. + +@cindex @code{vec_fmsubadd@var{m}4} instruction pattern +@item @samp{vec_fmsubadd@var{m}4} +Alternating multiply add, subtract with even lanes doing addition and odd +lanes doing subtraction of the third operand to the multiplication result +of the first two operands. Operands 1, 2 and 3 and the outout operand are vectors +with mode @var{m}. + These instructions are not allowed to @code{FAIL}. @cindex @code{mulhisi3} instruction pattern diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index c3b8e730960..a7003d5da8e 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -282,7 +282,8 @@ DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary) DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL, ECF_CONST, cmul, binary) DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL_CONJ, ECF_CONST, cmul_conj, binary) DEF_INTERNAL_OPTAB_FN (VEC_ADDSUB, ECF_CONST, vec_addsub, binary) - +DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, vec_fmaddsub, ternary) +DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd, ternary) /* FP scales. */ DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary) diff --git a/gcc/optabs.def b/gcc/optabs.def index 41ab2598eb6..51acc1be8f5 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -408,6 +408,8 @@ OPTAB_D (vec_widen_usubl_lo_optab, "vec_widen_usubl_lo_$a") OPTAB_D (vec_widen_uaddl_hi_optab, "vec_widen_uaddl_hi_$a") OPTAB_D (vec_widen_uaddl_lo_optab, "vec_widen_uaddl_lo_$a") OPTAB_D (vec_addsub_optab, "vec_addsub$a3") +OPTAB_D (vec_fmaddsub_optab, "vec_fmaddsub$a4") +OPTAB_D (vec_fmsubadd_optab, "vec_fmsubadd$a4") OPTAB_D (sync_add_optab, "sync_add$I$a") OPTAB_D (sync_and_optab, "sync_and$I$a") diff --git a/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c new file mode 100644 index 00000000000..b30d10731a7 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c @@ -0,0 +1,34 @@ +/* { dg-do run } */ +/* { dg-require-effective-target fma } */ +/* { dg-options "-O3 -mfma -save-temps" } */ + +#include "fma-check.h" + +void __attribute__((noipa)) +check_fmaddsub (double * __restrict a, double *b, double *c, int n) +{ + for (int i = 0; i < n; ++i) + { + a[2*i + 0] = b[2*i + 0] * c[2*i + 0] - a[2*i + 0]; + a[2*i + 1] = b[2*i + 1] * c[2*i + 1] + a[2*i + 1]; + } +} + +static void +fma_test (void) +{ + double a[4], b[4], c[4]; + for (int i = 0; i < 4; ++i) + { + a[i] = i; + b[i] = 3*i; + c[i] = 7*i; + } + check_fmaddsub (a, b, c, 2); + const double d[4] = { 0., 22., 82., 192. }; + for (int i = 0; i < 4; ++i) + if (a[i] != d[i]) + __builtin_abort (); +} + +/* { dg-final { scan-assembler "fmaddsub...pd" } } */ diff --git a/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c new file mode 100644 index 00000000000..cd2af8725a3 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c @@ -0,0 +1,34 @@ +/* { dg-do run } */ +/* { dg-require-effective-target fma } */ +/* { dg-options "-O3 -mfma -save-temps" } */ + +#include "fma-check.h" + +void __attribute__((noipa)) +check_fmaddsub (float * __restrict a, float *b, float *c, int n) +{ + for (int i = 0; i < n; ++i) + { + a[2*i + 0] = b[2*i + 0] * c[2*i + 0] - a[2*i + 0]; + a[2*i + 1] = b[2*i + 1] * c[2*i + 1] + a[2*i + 1]; + } +} + +static void +fma_test (void) +{ + float a[4], b[4], c[4]; + for (int i = 0; i < 4; ++i) + { + a[i] = i; + b[i] = 3*i; + c[i] = 7*i; + } + check_fmaddsub (a, b, c, 2); + const float d[4] = { 0., 22., 82., 192. }; + for (int i = 0; i < 4; ++i) + if (a[i] != d[i]) + __builtin_abort (); +} + +/* { dg-final { scan-assembler "fmaddsub...ps" } } */ diff --git a/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c new file mode 100644 index 00000000000..7ca2a275cc1 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c @@ -0,0 +1,34 @@ +/* { dg-do run } */ +/* { dg-require-effective-target fma } */ +/* { dg-options "-O3 -mfma -save-temps" } */ + +#include "fma-check.h" + +void __attribute__((noipa)) +check_fmsubadd (double * __restrict a, double *b, double *c, int n) +{ + for (int i = 0; i < n; ++i) + { + a[2*i + 0] = b[2*i + 0] * c[2*i + 0] + a[2*i + 0]; + a[2*i + 1] = b[2*i + 1] * c[2*i + 1] - a[2*i + 1]; + } +} + +static void +fma_test (void) +{ + double a[4], b[4], c[4]; + for (int i = 0; i < 4; ++i) + { + a[i] = i; + b[i] = 3*i; + c[i] = 7*i; + } + check_fmsubadd (a, b, c, 2); + const double d[4] = { 0., 20., 86., 186. }; + for (int i = 0; i < 4; ++i) + if (a[i] != d[i]) + __builtin_abort (); +} + +/* { dg-final { scan-assembler "fmsubadd...pd" } } */ diff --git a/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c new file mode 100644 index 00000000000..9ddd0e423db --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c @@ -0,0 +1,34 @@ +/* { dg-do run } */ +/* { dg-require-effective-target fma } */ +/* { dg-options "-O3 -mfma -save-temps" } */ + +#include "fma-check.h" + +void __attribute__((noipa)) +check_fmsubadd (float * __restrict a, float *b, float *c, int n) +{ + for (int i = 0; i < n; ++i) + { + a[2*i + 0] = b[2*i + 0] * c[2*i + 0] + a[2*i + 0]; + a[2*i + 1] = b[2*i + 1] * c[2*i + 1] - a[2*i + 1]; + } +} + +static void +fma_test (void) +{ + float a[4], b[4], c[4]; + for (int i = 0; i < 4; ++i) + { + a[i] = i; + b[i] = 3*i; + c[i] = 7*i; + } + check_fmsubadd (a, b, c, 2); + const float d[4] = { 0., 20., 86., 186. }; + for (int i = 0; i < 4; ++i) + if (a[i] != d[i]) + __builtin_abort (); +} + +/* { dg-final { scan-assembler "fmsubadd...ps" } } */ diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c index 2671f91972d..f774cac4a4d 100644 --- a/gcc/tree-vect-slp-patterns.c +++ b/gcc/tree-vect-slp-patterns.c @@ -1496,8 +1496,8 @@ complex_operations_pattern::build (vec_info * /* vinfo */) class addsub_pattern : public vect_pattern { public: - addsub_pattern (slp_tree *node) - : vect_pattern (node, NULL, IFN_VEC_ADDSUB) {}; + addsub_pattern (slp_tree *node, internal_fn ifn) + : vect_pattern (node, NULL, ifn) {}; void build (vec_info *); @@ -1510,46 +1510,68 @@ addsub_pattern::recognize (slp_tree_to_load_perm_map_t *, slp_tree *node_) { slp_tree node = *node_; if (SLP_TREE_CODE (node) != VEC_PERM_EXPR - || SLP_TREE_CHILDREN (node).length () != 2) + || SLP_TREE_CHILDREN (node).length () != 2 + || SLP_TREE_LANE_PERMUTATION (node).length () % 2) return NULL; /* Match a blend of a plus and a minus op with the same number of plus and minus lanes on the same operands. */ - slp_tree sub = SLP_TREE_CHILDREN (node)[0]; - slp_tree add = SLP_TREE_CHILDREN (node)[1]; - bool swapped_p = false; - if (vect_match_expression_p (sub, PLUS_EXPR)) - { - std::swap (add, sub); - swapped_p = true; - } - if (!(vect_match_expression_p (add, PLUS_EXPR) - && vect_match_expression_p (sub, MINUS_EXPR))) + unsigned l0 = SLP_TREE_LANE_PERMUTATION (node)[0].first; + unsigned l1 = SLP_TREE_LANE_PERMUTATION (node)[1].first; + if (l0 == l1) + return NULL; + bool l0add_p = vect_match_expression_p (SLP_TREE_CHILDREN (node)[l0], + PLUS_EXPR); + if (!l0add_p + && !vect_match_expression_p (SLP_TREE_CHILDREN (node)[l0], MINUS_EXPR)) + return NULL; + bool l1add_p = vect_match_expression_p (SLP_TREE_CHILDREN (node)[l1], + PLUS_EXPR); + if (!l1add_p + && !vect_match_expression_p (SLP_TREE_CHILDREN (node)[l1], MINUS_EXPR)) return NULL; - if (!((SLP_TREE_CHILDREN (sub)[0] == SLP_TREE_CHILDREN (add)[0] - && SLP_TREE_CHILDREN (sub)[1] == SLP_TREE_CHILDREN (add)[1]) - || (SLP_TREE_CHILDREN (sub)[0] == SLP_TREE_CHILDREN (add)[1] - && SLP_TREE_CHILDREN (sub)[1] == SLP_TREE_CHILDREN (add)[0]))) + + slp_tree l0node = SLP_TREE_CHILDREN (node)[l0]; + slp_tree l1node = SLP_TREE_CHILDREN (node)[l1]; + if (!((SLP_TREE_CHILDREN (l0node)[0] == SLP_TREE_CHILDREN (l1node)[0] + && SLP_TREE_CHILDREN (l0node)[1] == SLP_TREE_CHILDREN (l1node)[1]) + || (SLP_TREE_CHILDREN (l0node)[0] == SLP_TREE_CHILDREN (l1node)[1] + && SLP_TREE_CHILDREN (l0node)[1] == SLP_TREE_CHILDREN (l1node)[0]))) return NULL; for (unsigned i = 0; i < SLP_TREE_LANE_PERMUTATION (node).length (); ++i) { std::pair<unsigned, unsigned> perm = SLP_TREE_LANE_PERMUTATION (node)[i]; - if (swapped_p) - perm.first = perm.first == 0 ? 1 : 0; - /* It has to be alternating -, +, -, ... + /* It has to be alternating -, +, -, While we could permute the .ADDSUB inputs and the .ADDSUB output that's only profitable over the add + sub + blend if at least one of the permute is optimized which we can't determine here. */ - if (perm.first != (i & 1) + if (perm.first != ((i & 1) ? l1 : l0) || perm.second != i) return NULL; } - if (!vect_pattern_validate_optab (IFN_VEC_ADDSUB, node)) - return NULL; + /* Now we have either { -, +, -, + ... } (!l0add_p) or { +, -, +, - ... } + (l0add_p), see whether we have FMA variants. */ + if (!l0add_p + && vect_match_expression_p (SLP_TREE_CHILDREN (l0node)[0], MULT_EXPR)) + { + /* (c * d) -+ a */ + if (vect_pattern_validate_optab (IFN_VEC_FMADDSUB, node)) + return new addsub_pattern (node_, IFN_VEC_FMADDSUB); + } + else if (l0add_p + && vect_match_expression_p (SLP_TREE_CHILDREN (l1node)[0], MULT_EXPR)) + { + /* (c * d) +- a */ + if (vect_pattern_validate_optab (IFN_VEC_FMSUBADD, node)) + return new addsub_pattern (node_, IFN_VEC_FMSUBADD); + } - return new addsub_pattern (node_); + if (!l0add_p && vect_pattern_validate_optab (IFN_VEC_ADDSUB, node)) + return new addsub_pattern (node_, IFN_VEC_ADDSUB); + + return NULL; } void @@ -1557,38 +1579,96 @@ addsub_pattern::build (vec_info *vinfo) { slp_tree node = *m_node; - slp_tree sub = SLP_TREE_CHILDREN (node)[0]; - slp_tree add = SLP_TREE_CHILDREN (node)[1]; - if (vect_match_expression_p (sub, PLUS_EXPR)) - std::swap (add, sub); - - /* Modify the blend node in-place. */ - SLP_TREE_CHILDREN (node)[0] = SLP_TREE_CHILDREN (sub)[0]; - SLP_TREE_CHILDREN (node)[1] = SLP_TREE_CHILDREN (sub)[1]; - SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[0])++; - SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[1])++; - - /* Build IFN_VEC_ADDSUB from the sub representative operands. */ - stmt_vec_info rep = SLP_TREE_REPRESENTATIVE (sub); - gcall *call = gimple_build_call_internal (IFN_VEC_ADDSUB, 2, - gimple_assign_rhs1 (rep->stmt), - gimple_assign_rhs2 (rep->stmt)); - gimple_call_set_lhs (call, make_ssa_name - (TREE_TYPE (gimple_assign_lhs (rep->stmt)))); - gimple_call_set_nothrow (call, true); - gimple_set_bb (call, gimple_bb (rep->stmt)); - stmt_vec_info new_rep = vinfo->add_pattern_stmt (call, rep); - SLP_TREE_REPRESENTATIVE (node) = new_rep; - STMT_VINFO_RELEVANT (new_rep) = vect_used_in_scope; - STMT_SLP_TYPE (new_rep) = pure_slp; - STMT_VINFO_VECTYPE (new_rep) = SLP_TREE_VECTYPE (node); - STMT_VINFO_SLP_VECT_ONLY_PATTERN (new_rep) = true; - STMT_VINFO_REDUC_DEF (new_rep) = STMT_VINFO_REDUC_DEF (vect_orig_stmt (rep)); - SLP_TREE_CODE (node) = ERROR_MARK; - SLP_TREE_LANE_PERMUTATION (node).release (); - - vect_free_slp_tree (sub); - vect_free_slp_tree (add); + unsigned l0 = SLP_TREE_LANE_PERMUTATION (node)[0].first; + unsigned l1 = SLP_TREE_LANE_PERMUTATION (node)[1].first; + + switch (m_ifn) + { + case IFN_VEC_ADDSUB: + { + slp_tree sub = SLP_TREE_CHILDREN (node)[l0]; + slp_tree add = SLP_TREE_CHILDREN (node)[l1]; + + /* Modify the blend node in-place. */ + SLP_TREE_CHILDREN (node)[0] = SLP_TREE_CHILDREN (sub)[0]; + SLP_TREE_CHILDREN (node)[1] = SLP_TREE_CHILDREN (sub)[1]; + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[0])++; + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[1])++; + + /* Build IFN_VEC_ADDSUB from the sub representative operands. */ + stmt_vec_info rep = SLP_TREE_REPRESENTATIVE (sub); + gcall *call = gimple_build_call_internal (IFN_VEC_ADDSUB, 2, + gimple_assign_rhs1 (rep->stmt), + gimple_assign_rhs2 (rep->stmt)); + gimple_call_set_lhs (call, make_ssa_name + (TREE_TYPE (gimple_assign_lhs (rep->stmt)))); + gimple_call_set_nothrow (call, true); + gimple_set_bb (call, gimple_bb (rep->stmt)); + stmt_vec_info new_rep = vinfo->add_pattern_stmt (call, rep); + SLP_TREE_REPRESENTATIVE (node) = new_rep; + STMT_VINFO_RELEVANT (new_rep) = vect_used_in_scope; + STMT_SLP_TYPE (new_rep) = pure_slp; + STMT_VINFO_VECTYPE (new_rep) = SLP_TREE_VECTYPE (node); + STMT_VINFO_SLP_VECT_ONLY_PATTERN (new_rep) = true; + STMT_VINFO_REDUC_DEF (new_rep) = STMT_VINFO_REDUC_DEF (vect_orig_stmt (rep)); + SLP_TREE_CODE (node) = ERROR_MARK; + SLP_TREE_LANE_PERMUTATION (node).release (); + + vect_free_slp_tree (sub); + vect_free_slp_tree (add); + break; + } + case IFN_VEC_FMADDSUB: + case IFN_VEC_FMSUBADD: + { + slp_tree sub, add; + if (m_ifn == IFN_VEC_FMADDSUB) + { + sub = SLP_TREE_CHILDREN (node)[l0]; + add = SLP_TREE_CHILDREN (node)[l1]; + } + else /* m_ifn == IFN_VEC_FMSUBADD */ + { + sub = SLP_TREE_CHILDREN (node)[l1]; + add = SLP_TREE_CHILDREN (node)[l0]; + } + slp_tree mul = SLP_TREE_CHILDREN (sub)[0]; + /* Modify the blend node in-place. */ + SLP_TREE_CHILDREN (node).safe_grow (3, true); + SLP_TREE_CHILDREN (node)[0] = SLP_TREE_CHILDREN (mul)[0]; + SLP_TREE_CHILDREN (node)[1] = SLP_TREE_CHILDREN (mul)[1]; + SLP_TREE_CHILDREN (node)[2] = SLP_TREE_CHILDREN (sub)[1]; + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[0])++; + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[1])++; + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[2])++; + + /* Build IFN_VEC_FMADDSUB from the mul/sub representative operands. */ + stmt_vec_info srep = SLP_TREE_REPRESENTATIVE (sub); + stmt_vec_info mrep = SLP_TREE_REPRESENTATIVE (mul); + gcall *call = gimple_build_call_internal (m_ifn, 3, + gimple_assign_rhs1 (mrep->stmt), + gimple_assign_rhs2 (mrep->stmt), + gimple_assign_rhs2 (srep->stmt)); + gimple_call_set_lhs (call, make_ssa_name + (TREE_TYPE (gimple_assign_lhs (srep->stmt)))); + gimple_call_set_nothrow (call, true); + gimple_set_bb (call, gimple_bb (srep->stmt)); + stmt_vec_info new_rep = vinfo->add_pattern_stmt (call, srep); + SLP_TREE_REPRESENTATIVE (node) = new_rep; + STMT_VINFO_RELEVANT (new_rep) = vect_used_in_scope; + STMT_SLP_TYPE (new_rep) = pure_slp; + STMT_VINFO_VECTYPE (new_rep) = SLP_TREE_VECTYPE (node); + STMT_VINFO_SLP_VECT_ONLY_PATTERN (new_rep) = true; + STMT_VINFO_REDUC_DEF (new_rep) = STMT_VINFO_REDUC_DEF (vect_orig_stmt (srep)); + SLP_TREE_CODE (node) = ERROR_MARK; + SLP_TREE_LANE_PERMUTATION (node).release (); + + vect_free_slp_tree (sub); + vect_free_slp_tree (add); + break; + } + default:; + } } /******************************************************************************* diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c index f08797c2bc0..5357cd0e7a4 100644 --- a/gcc/tree-vect-slp.c +++ b/gcc/tree-vect-slp.c @@ -3728,6 +3728,8 @@ vect_optimize_slp (vec_info *vinfo) case CFN_COMPLEX_MUL: case CFN_COMPLEX_MUL_CONJ: case CFN_VEC_ADDSUB: + case CFN_VEC_FMADDSUB: + case CFN_VEC_FMSUBADD: vertices[idx].perm_in = 0; vertices[idx].perm_out = 0; default:; -- 2.26.2 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Add FMADDSUB and FMSUBADD SLP vectorization patterns and optabs 2021-07-05 14:09 [PATCH] Add FMADDSUB and FMSUBADD SLP vectorization patterns and optabs Richard Biener @ 2021-07-05 14:25 ` Richard Biener 2021-07-05 14:38 ` Richard Biener 2021-07-06 2:16 ` Hongtao Liu 1 sibling, 1 reply; 8+ messages in thread From: Richard Biener @ 2021-07-05 14:25 UTC (permalink / raw) To: Richard Biener; +Cc: GCC Patches, Hongtao Liu On Mon, Jul 5, 2021 at 4:09 PM Richard Biener <rguenther@suse.de> wrote: > > This adds named expanders for vec_fmaddsub<mode>4 and > vec_fmsubadd<mode>4 which map to x86 vfmaddsubXXXp{ds} and > vfmsubaddXXXp{ds} instructions. This complements the previous > addition of ADDSUB support. > > x86 lacks SUBADD and the negate variants of FMA with mixed > plus minus so I did not add optabs or patterns for those but > it would not be difficult if there's a target that has them. > Maybe one of the complex fma patterns match those variants? > > I did not dare to rewrite the numerous patterns to the new > canonical name but instead added two new expanders. Note I > did not cover AVX512 since the existing patterns are separated > and I have no easy way to test things there. Handling AVX512 > should be easy as followup though. > > Bootstrap and testing on x86_64-unknown-linux-gnu in progress. FYI, building libgfortran matmul_c4 we hit /home/rguenther/src/trunk/libgfortran/generated/matmul_c4.c:1781:1: error: unrecognizable insn: 1781 | } | ^ (insn 5408 5407 5409 213 (set (reg:V8SF 1454 [ vect__4368.5363 ]) (unspec:V8SF [ (reg:V8SF 4391) (reg:V8SF 4398) (reg:V8SF 4415 [ vect__2005.5362 ]) ] UNSPEC_FMADDSUB)) -1 (nil)) during RTL pass: vregs so it looks like the existing fmaddsub_<mode> expander cannot be simply re-purposed? > Any comments? > > Thanks, > Richard. > > 2021-07-05 Richard Biener <rguenther@suse.de> > > * doc/md.texi (vec_fmaddsub<mode>4): Document. > (vec_fmsubadd<mode>4): Likewise. > * optabs.def (vec_fmaddsub$a4): Add. > (vec_fmsubadd$a4): Likewise. > * internal-fn.def (IFN_VEC_FMADDSUB): Add. > (IFN_VEC_FMSUBADD): Likewise. > * tree-vect-slp-patterns.c (addsub_pattern::recognize): > Refactor to handle IFN_VEC_FMADDSUB and IFN_VEC_FMSUBADD. > (addsub_pattern::build): Likewise. > * tree-vect-slp.c (vect_optimize_slp): CFN_VEC_FMADDSUB > and CFN_VEC_FMSUBADD are not transparent for permutes. > * config/i386/sse.md (vec_fmaddsub<mode>4): New expander. > (vec_fmsubadd<mode>4): Likewise. > > * gcc.target/i386/vect-fmaddsubXXXpd.c: New testcase. > * gcc.target/i386/vect-fmaddsubXXXps.c: Likewise. > * gcc.target/i386/vect-fmsubaddXXXpd.c: Likewise. > * gcc.target/i386/vect-fmsubaddXXXps.c: Likewise. > --- > gcc/config/i386/sse.md | 19 ++ > gcc/doc/md.texi | 14 ++ > gcc/internal-fn.def | 3 +- > gcc/optabs.def | 2 + > .../gcc.target/i386/vect-fmaddsubXXXpd.c | 34 ++++ > .../gcc.target/i386/vect-fmaddsubXXXps.c | 34 ++++ > .../gcc.target/i386/vect-fmsubaddXXXpd.c | 34 ++++ > .../gcc.target/i386/vect-fmsubaddXXXps.c | 34 ++++ > gcc/tree-vect-slp-patterns.c | 192 +++++++++++++----- > gcc/tree-vect-slp.c | 2 + > 10 files changed, 311 insertions(+), 57 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c > create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c > create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c > create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md > index bcf1605d147..6fc13c184bf 100644 > --- a/gcc/config/i386/sse.md > +++ b/gcc/config/i386/sse.md > @@ -4644,6 +4644,25 @@ > ;; > ;; But this doesn't seem useful in practice. > > +(define_expand "vec_fmaddsub<mode>4" > + [(set (match_operand:VF 0 "register_operand") > + (unspec:VF > + [(match_operand:VF 1 "nonimmediate_operand") > + (match_operand:VF 2 "nonimmediate_operand") > + (match_operand:VF 3 "nonimmediate_operand")] > + UNSPEC_FMADDSUB))] > + "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F") > + > +(define_expand "vec_fmsubadd<mode>4" > + [(set (match_operand:VF 0 "register_operand") > + (unspec:VF > + [(match_operand:VF 1 "nonimmediate_operand") > + (match_operand:VF 2 "nonimmediate_operand") > + (neg:VF > + (match_operand:VF 3 "nonimmediate_operand"))] > + UNSPEC_FMADDSUB))] > + "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F") > + > (define_expand "fmaddsub_<mode>" > [(set (match_operand:VF 0 "register_operand") > (unspec:VF > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi > index 1b918144330..cc92ebd26aa 100644 > --- a/gcc/doc/md.texi > +++ b/gcc/doc/md.texi > @@ -5688,6 +5688,20 @@ Alternating subtract, add with even lanes doing subtract and odd > lanes doing addition. Operands 1 and 2 and the outout operand are vectors > with mode @var{m}. > > +@cindex @code{vec_fmaddsub@var{m}4} instruction pattern > +@item @samp{vec_fmaddsub@var{m}4} > +Alternating multiply subtract, add with even lanes doing subtract and odd > +lanes doing addition of the third operand to the multiplication result > +of the first two operands. Operands 1, 2 and 3 and the outout operand are vectors > +with mode @var{m}. > + > +@cindex @code{vec_fmsubadd@var{m}4} instruction pattern > +@item @samp{vec_fmsubadd@var{m}4} > +Alternating multiply add, subtract with even lanes doing addition and odd > +lanes doing subtraction of the third operand to the multiplication result > +of the first two operands. Operands 1, 2 and 3 and the outout operand are vectors > +with mode @var{m}. > + > These instructions are not allowed to @code{FAIL}. > > @cindex @code{mulhisi3} instruction pattern > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def > index c3b8e730960..a7003d5da8e 100644 > --- a/gcc/internal-fn.def > +++ b/gcc/internal-fn.def > @@ -282,7 +282,8 @@ DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary) > DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL, ECF_CONST, cmul, binary) > DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL_CONJ, ECF_CONST, cmul_conj, binary) > DEF_INTERNAL_OPTAB_FN (VEC_ADDSUB, ECF_CONST, vec_addsub, binary) > - > +DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, vec_fmaddsub, ternary) > +DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd, ternary) > > /* FP scales. */ > DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary) > diff --git a/gcc/optabs.def b/gcc/optabs.def > index 41ab2598eb6..51acc1be8f5 100644 > --- a/gcc/optabs.def > +++ b/gcc/optabs.def > @@ -408,6 +408,8 @@ OPTAB_D (vec_widen_usubl_lo_optab, "vec_widen_usubl_lo_$a") > OPTAB_D (vec_widen_uaddl_hi_optab, "vec_widen_uaddl_hi_$a") > OPTAB_D (vec_widen_uaddl_lo_optab, "vec_widen_uaddl_lo_$a") > OPTAB_D (vec_addsub_optab, "vec_addsub$a3") > +OPTAB_D (vec_fmaddsub_optab, "vec_fmaddsub$a4") > +OPTAB_D (vec_fmsubadd_optab, "vec_fmsubadd$a4") > > OPTAB_D (sync_add_optab, "sync_add$I$a") > OPTAB_D (sync_and_optab, "sync_and$I$a") > diff --git a/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c > new file mode 100644 > index 00000000000..b30d10731a7 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c > @@ -0,0 +1,34 @@ > +/* { dg-do run } */ > +/* { dg-require-effective-target fma } */ > +/* { dg-options "-O3 -mfma -save-temps" } */ > + > +#include "fma-check.h" > + > +void __attribute__((noipa)) > +check_fmaddsub (double * __restrict a, double *b, double *c, int n) > +{ > + for (int i = 0; i < n; ++i) > + { > + a[2*i + 0] = b[2*i + 0] * c[2*i + 0] - a[2*i + 0]; > + a[2*i + 1] = b[2*i + 1] * c[2*i + 1] + a[2*i + 1]; > + } > +} > + > +static void > +fma_test (void) > +{ > + double a[4], b[4], c[4]; > + for (int i = 0; i < 4; ++i) > + { > + a[i] = i; > + b[i] = 3*i; > + c[i] = 7*i; > + } > + check_fmaddsub (a, b, c, 2); > + const double d[4] = { 0., 22., 82., 192. }; > + for (int i = 0; i < 4; ++i) > + if (a[i] != d[i]) > + __builtin_abort (); > +} > + > +/* { dg-final { scan-assembler "fmaddsub...pd" } } */ > diff --git a/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c > new file mode 100644 > index 00000000000..cd2af8725a3 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c > @@ -0,0 +1,34 @@ > +/* { dg-do run } */ > +/* { dg-require-effective-target fma } */ > +/* { dg-options "-O3 -mfma -save-temps" } */ > + > +#include "fma-check.h" > + > +void __attribute__((noipa)) > +check_fmaddsub (float * __restrict a, float *b, float *c, int n) > +{ > + for (int i = 0; i < n; ++i) > + { > + a[2*i + 0] = b[2*i + 0] * c[2*i + 0] - a[2*i + 0]; > + a[2*i + 1] = b[2*i + 1] * c[2*i + 1] + a[2*i + 1]; > + } > +} > + > +static void > +fma_test (void) > +{ > + float a[4], b[4], c[4]; > + for (int i = 0; i < 4; ++i) > + { > + a[i] = i; > + b[i] = 3*i; > + c[i] = 7*i; > + } > + check_fmaddsub (a, b, c, 2); > + const float d[4] = { 0., 22., 82., 192. }; > + for (int i = 0; i < 4; ++i) > + if (a[i] != d[i]) > + __builtin_abort (); > +} > + > +/* { dg-final { scan-assembler "fmaddsub...ps" } } */ > diff --git a/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c > new file mode 100644 > index 00000000000..7ca2a275cc1 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c > @@ -0,0 +1,34 @@ > +/* { dg-do run } */ > +/* { dg-require-effective-target fma } */ > +/* { dg-options "-O3 -mfma -save-temps" } */ > + > +#include "fma-check.h" > + > +void __attribute__((noipa)) > +check_fmsubadd (double * __restrict a, double *b, double *c, int n) > +{ > + for (int i = 0; i < n; ++i) > + { > + a[2*i + 0] = b[2*i + 0] * c[2*i + 0] + a[2*i + 0]; > + a[2*i + 1] = b[2*i + 1] * c[2*i + 1] - a[2*i + 1]; > + } > +} > + > +static void > +fma_test (void) > +{ > + double a[4], b[4], c[4]; > + for (int i = 0; i < 4; ++i) > + { > + a[i] = i; > + b[i] = 3*i; > + c[i] = 7*i; > + } > + check_fmsubadd (a, b, c, 2); > + const double d[4] = { 0., 20., 86., 186. }; > + for (int i = 0; i < 4; ++i) > + if (a[i] != d[i]) > + __builtin_abort (); > +} > + > +/* { dg-final { scan-assembler "fmsubadd...pd" } } */ > diff --git a/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c > new file mode 100644 > index 00000000000..9ddd0e423db > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c > @@ -0,0 +1,34 @@ > +/* { dg-do run } */ > +/* { dg-require-effective-target fma } */ > +/* { dg-options "-O3 -mfma -save-temps" } */ > + > +#include "fma-check.h" > + > +void __attribute__((noipa)) > +check_fmsubadd (float * __restrict a, float *b, float *c, int n) > +{ > + for (int i = 0; i < n; ++i) > + { > + a[2*i + 0] = b[2*i + 0] * c[2*i + 0] + a[2*i + 0]; > + a[2*i + 1] = b[2*i + 1] * c[2*i + 1] - a[2*i + 1]; > + } > +} > + > +static void > +fma_test (void) > +{ > + float a[4], b[4], c[4]; > + for (int i = 0; i < 4; ++i) > + { > + a[i] = i; > + b[i] = 3*i; > + c[i] = 7*i; > + } > + check_fmsubadd (a, b, c, 2); > + const float d[4] = { 0., 20., 86., 186. }; > + for (int i = 0; i < 4; ++i) > + if (a[i] != d[i]) > + __builtin_abort (); > +} > + > +/* { dg-final { scan-assembler "fmsubadd...ps" } } */ > diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c > index 2671f91972d..f774cac4a4d 100644 > --- a/gcc/tree-vect-slp-patterns.c > +++ b/gcc/tree-vect-slp-patterns.c > @@ -1496,8 +1496,8 @@ complex_operations_pattern::build (vec_info * /* vinfo */) > class addsub_pattern : public vect_pattern > { > public: > - addsub_pattern (slp_tree *node) > - : vect_pattern (node, NULL, IFN_VEC_ADDSUB) {}; > + addsub_pattern (slp_tree *node, internal_fn ifn) > + : vect_pattern (node, NULL, ifn) {}; > > void build (vec_info *); > > @@ -1510,46 +1510,68 @@ addsub_pattern::recognize (slp_tree_to_load_perm_map_t *, slp_tree *node_) > { > slp_tree node = *node_; > if (SLP_TREE_CODE (node) != VEC_PERM_EXPR > - || SLP_TREE_CHILDREN (node).length () != 2) > + || SLP_TREE_CHILDREN (node).length () != 2 > + || SLP_TREE_LANE_PERMUTATION (node).length () % 2) > return NULL; > > /* Match a blend of a plus and a minus op with the same number of plus and > minus lanes on the same operands. */ > - slp_tree sub = SLP_TREE_CHILDREN (node)[0]; > - slp_tree add = SLP_TREE_CHILDREN (node)[1]; > - bool swapped_p = false; > - if (vect_match_expression_p (sub, PLUS_EXPR)) > - { > - std::swap (add, sub); > - swapped_p = true; > - } > - if (!(vect_match_expression_p (add, PLUS_EXPR) > - && vect_match_expression_p (sub, MINUS_EXPR))) > + unsigned l0 = SLP_TREE_LANE_PERMUTATION (node)[0].first; > + unsigned l1 = SLP_TREE_LANE_PERMUTATION (node)[1].first; > + if (l0 == l1) > + return NULL; > + bool l0add_p = vect_match_expression_p (SLP_TREE_CHILDREN (node)[l0], > + PLUS_EXPR); > + if (!l0add_p > + && !vect_match_expression_p (SLP_TREE_CHILDREN (node)[l0], MINUS_EXPR)) > + return NULL; > + bool l1add_p = vect_match_expression_p (SLP_TREE_CHILDREN (node)[l1], > + PLUS_EXPR); > + if (!l1add_p > + && !vect_match_expression_p (SLP_TREE_CHILDREN (node)[l1], MINUS_EXPR)) > return NULL; > - if (!((SLP_TREE_CHILDREN (sub)[0] == SLP_TREE_CHILDREN (add)[0] > - && SLP_TREE_CHILDREN (sub)[1] == SLP_TREE_CHILDREN (add)[1]) > - || (SLP_TREE_CHILDREN (sub)[0] == SLP_TREE_CHILDREN (add)[1] > - && SLP_TREE_CHILDREN (sub)[1] == SLP_TREE_CHILDREN (add)[0]))) > + > + slp_tree l0node = SLP_TREE_CHILDREN (node)[l0]; > + slp_tree l1node = SLP_TREE_CHILDREN (node)[l1]; > + if (!((SLP_TREE_CHILDREN (l0node)[0] == SLP_TREE_CHILDREN (l1node)[0] > + && SLP_TREE_CHILDREN (l0node)[1] == SLP_TREE_CHILDREN (l1node)[1]) > + || (SLP_TREE_CHILDREN (l0node)[0] == SLP_TREE_CHILDREN (l1node)[1] > + && SLP_TREE_CHILDREN (l0node)[1] == SLP_TREE_CHILDREN (l1node)[0]))) > return NULL; > > for (unsigned i = 0; i < SLP_TREE_LANE_PERMUTATION (node).length (); ++i) > { > std::pair<unsigned, unsigned> perm = SLP_TREE_LANE_PERMUTATION (node)[i]; > - if (swapped_p) > - perm.first = perm.first == 0 ? 1 : 0; > - /* It has to be alternating -, +, -, ... > + /* It has to be alternating -, +, -, > While we could permute the .ADDSUB inputs and the .ADDSUB output > that's only profitable over the add + sub + blend if at least > one of the permute is optimized which we can't determine here. */ > - if (perm.first != (i & 1) > + if (perm.first != ((i & 1) ? l1 : l0) > || perm.second != i) > return NULL; > } > > - if (!vect_pattern_validate_optab (IFN_VEC_ADDSUB, node)) > - return NULL; > + /* Now we have either { -, +, -, + ... } (!l0add_p) or { +, -, +, - ... } > + (l0add_p), see whether we have FMA variants. */ > + if (!l0add_p > + && vect_match_expression_p (SLP_TREE_CHILDREN (l0node)[0], MULT_EXPR)) > + { > + /* (c * d) -+ a */ > + if (vect_pattern_validate_optab (IFN_VEC_FMADDSUB, node)) > + return new addsub_pattern (node_, IFN_VEC_FMADDSUB); > + } > + else if (l0add_p > + && vect_match_expression_p (SLP_TREE_CHILDREN (l1node)[0], MULT_EXPR)) > + { > + /* (c * d) +- a */ > + if (vect_pattern_validate_optab (IFN_VEC_FMSUBADD, node)) > + return new addsub_pattern (node_, IFN_VEC_FMSUBADD); > + } > > - return new addsub_pattern (node_); > + if (!l0add_p && vect_pattern_validate_optab (IFN_VEC_ADDSUB, node)) > + return new addsub_pattern (node_, IFN_VEC_ADDSUB); > + > + return NULL; > } > > void > @@ -1557,38 +1579,96 @@ addsub_pattern::build (vec_info *vinfo) > { > slp_tree node = *m_node; > > - slp_tree sub = SLP_TREE_CHILDREN (node)[0]; > - slp_tree add = SLP_TREE_CHILDREN (node)[1]; > - if (vect_match_expression_p (sub, PLUS_EXPR)) > - std::swap (add, sub); > - > - /* Modify the blend node in-place. */ > - SLP_TREE_CHILDREN (node)[0] = SLP_TREE_CHILDREN (sub)[0]; > - SLP_TREE_CHILDREN (node)[1] = SLP_TREE_CHILDREN (sub)[1]; > - SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[0])++; > - SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[1])++; > - > - /* Build IFN_VEC_ADDSUB from the sub representative operands. */ > - stmt_vec_info rep = SLP_TREE_REPRESENTATIVE (sub); > - gcall *call = gimple_build_call_internal (IFN_VEC_ADDSUB, 2, > - gimple_assign_rhs1 (rep->stmt), > - gimple_assign_rhs2 (rep->stmt)); > - gimple_call_set_lhs (call, make_ssa_name > - (TREE_TYPE (gimple_assign_lhs (rep->stmt)))); > - gimple_call_set_nothrow (call, true); > - gimple_set_bb (call, gimple_bb (rep->stmt)); > - stmt_vec_info new_rep = vinfo->add_pattern_stmt (call, rep); > - SLP_TREE_REPRESENTATIVE (node) = new_rep; > - STMT_VINFO_RELEVANT (new_rep) = vect_used_in_scope; > - STMT_SLP_TYPE (new_rep) = pure_slp; > - STMT_VINFO_VECTYPE (new_rep) = SLP_TREE_VECTYPE (node); > - STMT_VINFO_SLP_VECT_ONLY_PATTERN (new_rep) = true; > - STMT_VINFO_REDUC_DEF (new_rep) = STMT_VINFO_REDUC_DEF (vect_orig_stmt (rep)); > - SLP_TREE_CODE (node) = ERROR_MARK; > - SLP_TREE_LANE_PERMUTATION (node).release (); > - > - vect_free_slp_tree (sub); > - vect_free_slp_tree (add); > + unsigned l0 = SLP_TREE_LANE_PERMUTATION (node)[0].first; > + unsigned l1 = SLP_TREE_LANE_PERMUTATION (node)[1].first; > + > + switch (m_ifn) > + { > + case IFN_VEC_ADDSUB: > + { > + slp_tree sub = SLP_TREE_CHILDREN (node)[l0]; > + slp_tree add = SLP_TREE_CHILDREN (node)[l1]; > + > + /* Modify the blend node in-place. */ > + SLP_TREE_CHILDREN (node)[0] = SLP_TREE_CHILDREN (sub)[0]; > + SLP_TREE_CHILDREN (node)[1] = SLP_TREE_CHILDREN (sub)[1]; > + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[0])++; > + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[1])++; > + > + /* Build IFN_VEC_ADDSUB from the sub representative operands. */ > + stmt_vec_info rep = SLP_TREE_REPRESENTATIVE (sub); > + gcall *call = gimple_build_call_internal (IFN_VEC_ADDSUB, 2, > + gimple_assign_rhs1 (rep->stmt), > + gimple_assign_rhs2 (rep->stmt)); > + gimple_call_set_lhs (call, make_ssa_name > + (TREE_TYPE (gimple_assign_lhs (rep->stmt)))); > + gimple_call_set_nothrow (call, true); > + gimple_set_bb (call, gimple_bb (rep->stmt)); > + stmt_vec_info new_rep = vinfo->add_pattern_stmt (call, rep); > + SLP_TREE_REPRESENTATIVE (node) = new_rep; > + STMT_VINFO_RELEVANT (new_rep) = vect_used_in_scope; > + STMT_SLP_TYPE (new_rep) = pure_slp; > + STMT_VINFO_VECTYPE (new_rep) = SLP_TREE_VECTYPE (node); > + STMT_VINFO_SLP_VECT_ONLY_PATTERN (new_rep) = true; > + STMT_VINFO_REDUC_DEF (new_rep) = STMT_VINFO_REDUC_DEF (vect_orig_stmt (rep)); > + SLP_TREE_CODE (node) = ERROR_MARK; > + SLP_TREE_LANE_PERMUTATION (node).release (); > + > + vect_free_slp_tree (sub); > + vect_free_slp_tree (add); > + break; > + } > + case IFN_VEC_FMADDSUB: > + case IFN_VEC_FMSUBADD: > + { > + slp_tree sub, add; > + if (m_ifn == IFN_VEC_FMADDSUB) > + { > + sub = SLP_TREE_CHILDREN (node)[l0]; > + add = SLP_TREE_CHILDREN (node)[l1]; > + } > + else /* m_ifn == IFN_VEC_FMSUBADD */ > + { > + sub = SLP_TREE_CHILDREN (node)[l1]; > + add = SLP_TREE_CHILDREN (node)[l0]; > + } > + slp_tree mul = SLP_TREE_CHILDREN (sub)[0]; > + /* Modify the blend node in-place. */ > + SLP_TREE_CHILDREN (node).safe_grow (3, true); > + SLP_TREE_CHILDREN (node)[0] = SLP_TREE_CHILDREN (mul)[0]; > + SLP_TREE_CHILDREN (node)[1] = SLP_TREE_CHILDREN (mul)[1]; > + SLP_TREE_CHILDREN (node)[2] = SLP_TREE_CHILDREN (sub)[1]; > + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[0])++; > + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[1])++; > + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[2])++; > + > + /* Build IFN_VEC_FMADDSUB from the mul/sub representative operands. */ > + stmt_vec_info srep = SLP_TREE_REPRESENTATIVE (sub); > + stmt_vec_info mrep = SLP_TREE_REPRESENTATIVE (mul); > + gcall *call = gimple_build_call_internal (m_ifn, 3, > + gimple_assign_rhs1 (mrep->stmt), > + gimple_assign_rhs2 (mrep->stmt), > + gimple_assign_rhs2 (srep->stmt)); > + gimple_call_set_lhs (call, make_ssa_name > + (TREE_TYPE (gimple_assign_lhs (srep->stmt)))); > + gimple_call_set_nothrow (call, true); > + gimple_set_bb (call, gimple_bb (srep->stmt)); > + stmt_vec_info new_rep = vinfo->add_pattern_stmt (call, srep); > + SLP_TREE_REPRESENTATIVE (node) = new_rep; > + STMT_VINFO_RELEVANT (new_rep) = vect_used_in_scope; > + STMT_SLP_TYPE (new_rep) = pure_slp; > + STMT_VINFO_VECTYPE (new_rep) = SLP_TREE_VECTYPE (node); > + STMT_VINFO_SLP_VECT_ONLY_PATTERN (new_rep) = true; > + STMT_VINFO_REDUC_DEF (new_rep) = STMT_VINFO_REDUC_DEF (vect_orig_stmt (srep)); > + SLP_TREE_CODE (node) = ERROR_MARK; > + SLP_TREE_LANE_PERMUTATION (node).release (); > + > + vect_free_slp_tree (sub); > + vect_free_slp_tree (add); > + break; > + } > + default:; > + } > } > > /******************************************************************************* > diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c > index f08797c2bc0..5357cd0e7a4 100644 > --- a/gcc/tree-vect-slp.c > +++ b/gcc/tree-vect-slp.c > @@ -3728,6 +3728,8 @@ vect_optimize_slp (vec_info *vinfo) > case CFN_COMPLEX_MUL: > case CFN_COMPLEX_MUL_CONJ: > case CFN_VEC_ADDSUB: > + case CFN_VEC_FMADDSUB: > + case CFN_VEC_FMSUBADD: > vertices[idx].perm_in = 0; > vertices[idx].perm_out = 0; > default:; > -- > 2.26.2 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Add FMADDSUB and FMSUBADD SLP vectorization patterns and optabs 2021-07-05 14:25 ` Richard Biener @ 2021-07-05 14:38 ` Richard Biener 0 siblings, 0 replies; 8+ messages in thread From: Richard Biener @ 2021-07-05 14:38 UTC (permalink / raw) To: Richard Biener; +Cc: GCC Patches, Hongtao Liu On Mon, 5 Jul 2021, Richard Biener wrote: > On Mon, Jul 5, 2021 at 4:09 PM Richard Biener <rguenther@suse.de> wrote: > > > > This adds named expanders for vec_fmaddsub<mode>4 and > > vec_fmsubadd<mode>4 which map to x86 vfmaddsubXXXp{ds} and > > vfmsubaddXXXp{ds} instructions. This complements the previous > > addition of ADDSUB support. > > > > x86 lacks SUBADD and the negate variants of FMA with mixed > > plus minus so I did not add optabs or patterns for those but > > it would not be difficult if there's a target that has them. > > Maybe one of the complex fma patterns match those variants? > > > > I did not dare to rewrite the numerous patterns to the new > > canonical name but instead added two new expanders. Note I > > did not cover AVX512 since the existing patterns are separated > > and I have no easy way to test things there. Handling AVX512 > > should be easy as followup though. > > > > Bootstrap and testing on x86_64-unknown-linux-gnu in progress. > > FYI, building libgfortran matmul_c4 we hit > > /home/rguenther/src/trunk/libgfortran/generated/matmul_c4.c:1781:1: > error: unrecognizable insn: > 1781 | } > | ^ > (insn 5408 5407 5409 213 (set (reg:V8SF 1454 [ vect__4368.5363 ]) > (unspec:V8SF [ > (reg:V8SF 4391) > (reg:V8SF 4398) > (reg:V8SF 4415 [ vect__2005.5362 ]) > ] UNSPEC_FMADDSUB)) -1 > (nil)) > during RTL pass: vregs > > so it looks like the existing fmaddsub_<mode> expander cannot be > simply re-purposed? Ah, using the VF_128_256 iterator and removing the || TARGET_AVX512F predication fixes it. There's a avx512f but not fma target variant of matmul which likely lacks avx512vl for the above. So consider it changed this way. Not sure if there's a more appropriate iterator that catches this case. Richard. > > Any comments? > > > > Thanks, > > Richard. > > > > 2021-07-05 Richard Biener <rguenther@suse.de> > > > > * doc/md.texi (vec_fmaddsub<mode>4): Document. > > (vec_fmsubadd<mode>4): Likewise. > > * optabs.def (vec_fmaddsub$a4): Add. > > (vec_fmsubadd$a4): Likewise. > > * internal-fn.def (IFN_VEC_FMADDSUB): Add. > > (IFN_VEC_FMSUBADD): Likewise. > > * tree-vect-slp-patterns.c (addsub_pattern::recognize): > > Refactor to handle IFN_VEC_FMADDSUB and IFN_VEC_FMSUBADD. > > (addsub_pattern::build): Likewise. > > * tree-vect-slp.c (vect_optimize_slp): CFN_VEC_FMADDSUB > > and CFN_VEC_FMSUBADD are not transparent for permutes. > > * config/i386/sse.md (vec_fmaddsub<mode>4): New expander. > > (vec_fmsubadd<mode>4): Likewise. > > > > * gcc.target/i386/vect-fmaddsubXXXpd.c: New testcase. > > * gcc.target/i386/vect-fmaddsubXXXps.c: Likewise. > > * gcc.target/i386/vect-fmsubaddXXXpd.c: Likewise. > > * gcc.target/i386/vect-fmsubaddXXXps.c: Likewise. > > --- > > gcc/config/i386/sse.md | 19 ++ > > gcc/doc/md.texi | 14 ++ > > gcc/internal-fn.def | 3 +- > > gcc/optabs.def | 2 + > > .../gcc.target/i386/vect-fmaddsubXXXpd.c | 34 ++++ > > .../gcc.target/i386/vect-fmaddsubXXXps.c | 34 ++++ > > .../gcc.target/i386/vect-fmsubaddXXXpd.c | 34 ++++ > > .../gcc.target/i386/vect-fmsubaddXXXps.c | 34 ++++ > > gcc/tree-vect-slp-patterns.c | 192 +++++++++++++----- > > gcc/tree-vect-slp.c | 2 + > > 10 files changed, 311 insertions(+), 57 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c > > create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c > > create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c > > create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c > > > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md > > index bcf1605d147..6fc13c184bf 100644 > > --- a/gcc/config/i386/sse.md > > +++ b/gcc/config/i386/sse.md > > @@ -4644,6 +4644,25 @@ > > ;; > > ;; But this doesn't seem useful in practice. > > > > +(define_expand "vec_fmaddsub<mode>4" > > + [(set (match_operand:VF 0 "register_operand") > > + (unspec:VF > > + [(match_operand:VF 1 "nonimmediate_operand") > > + (match_operand:VF 2 "nonimmediate_operand") > > + (match_operand:VF 3 "nonimmediate_operand")] > > + UNSPEC_FMADDSUB))] > > + "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F") > > + > > +(define_expand "vec_fmsubadd<mode>4" > > + [(set (match_operand:VF 0 "register_operand") > > + (unspec:VF > > + [(match_operand:VF 1 "nonimmediate_operand") > > + (match_operand:VF 2 "nonimmediate_operand") > > + (neg:VF > > + (match_operand:VF 3 "nonimmediate_operand"))] > > + UNSPEC_FMADDSUB))] > > + "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F") > > + > > (define_expand "fmaddsub_<mode>" > > [(set (match_operand:VF 0 "register_operand") > > (unspec:VF > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi > > index 1b918144330..cc92ebd26aa 100644 > > --- a/gcc/doc/md.texi > > +++ b/gcc/doc/md.texi > > @@ -5688,6 +5688,20 @@ Alternating subtract, add with even lanes doing subtract and odd > > lanes doing addition. Operands 1 and 2 and the outout operand are vectors > > with mode @var{m}. > > > > +@cindex @code{vec_fmaddsub@var{m}4} instruction pattern > > +@item @samp{vec_fmaddsub@var{m}4} > > +Alternating multiply subtract, add with even lanes doing subtract and odd > > +lanes doing addition of the third operand to the multiplication result > > +of the first two operands. Operands 1, 2 and 3 and the outout operand are vectors > > +with mode @var{m}. > > + > > +@cindex @code{vec_fmsubadd@var{m}4} instruction pattern > > +@item @samp{vec_fmsubadd@var{m}4} > > +Alternating multiply add, subtract with even lanes doing addition and odd > > +lanes doing subtraction of the third operand to the multiplication result > > +of the first two operands. Operands 1, 2 and 3 and the outout operand are vectors > > +with mode @var{m}. > > + > > These instructions are not allowed to @code{FAIL}. > > > > @cindex @code{mulhisi3} instruction pattern > > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def > > index c3b8e730960..a7003d5da8e 100644 > > --- a/gcc/internal-fn.def > > +++ b/gcc/internal-fn.def > > @@ -282,7 +282,8 @@ DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary) > > DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL, ECF_CONST, cmul, binary) > > DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL_CONJ, ECF_CONST, cmul_conj, binary) > > DEF_INTERNAL_OPTAB_FN (VEC_ADDSUB, ECF_CONST, vec_addsub, binary) > > - > > +DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, vec_fmaddsub, ternary) > > +DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd, ternary) > > > > /* FP scales. */ > > DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary) > > diff --git a/gcc/optabs.def b/gcc/optabs.def > > index 41ab2598eb6..51acc1be8f5 100644 > > --- a/gcc/optabs.def > > +++ b/gcc/optabs.def > > @@ -408,6 +408,8 @@ OPTAB_D (vec_widen_usubl_lo_optab, "vec_widen_usubl_lo_$a") > > OPTAB_D (vec_widen_uaddl_hi_optab, "vec_widen_uaddl_hi_$a") > > OPTAB_D (vec_widen_uaddl_lo_optab, "vec_widen_uaddl_lo_$a") > > OPTAB_D (vec_addsub_optab, "vec_addsub$a3") > > +OPTAB_D (vec_fmaddsub_optab, "vec_fmaddsub$a4") > > +OPTAB_D (vec_fmsubadd_optab, "vec_fmsubadd$a4") > > > > OPTAB_D (sync_add_optab, "sync_add$I$a") > > OPTAB_D (sync_and_optab, "sync_and$I$a") > > diff --git a/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c > > new file mode 100644 > > index 00000000000..b30d10731a7 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c > > @@ -0,0 +1,34 @@ > > +/* { dg-do run } */ > > +/* { dg-require-effective-target fma } */ > > +/* { dg-options "-O3 -mfma -save-temps" } */ > > + > > +#include "fma-check.h" > > + > > +void __attribute__((noipa)) > > +check_fmaddsub (double * __restrict a, double *b, double *c, int n) > > +{ > > + for (int i = 0; i < n; ++i) > > + { > > + a[2*i + 0] = b[2*i + 0] * c[2*i + 0] - a[2*i + 0]; > > + a[2*i + 1] = b[2*i + 1] * c[2*i + 1] + a[2*i + 1]; > > + } > > +} > > + > > +static void > > +fma_test (void) > > +{ > > + double a[4], b[4], c[4]; > > + for (int i = 0; i < 4; ++i) > > + { > > + a[i] = i; > > + b[i] = 3*i; > > + c[i] = 7*i; > > + } > > + check_fmaddsub (a, b, c, 2); > > + const double d[4] = { 0., 22., 82., 192. }; > > + for (int i = 0; i < 4; ++i) > > + if (a[i] != d[i]) > > + __builtin_abort (); > > +} > > + > > +/* { dg-final { scan-assembler "fmaddsub...pd" } } */ > > diff --git a/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c > > new file mode 100644 > > index 00000000000..cd2af8725a3 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c > > @@ -0,0 +1,34 @@ > > +/* { dg-do run } */ > > +/* { dg-require-effective-target fma } */ > > +/* { dg-options "-O3 -mfma -save-temps" } */ > > + > > +#include "fma-check.h" > > + > > +void __attribute__((noipa)) > > +check_fmaddsub (float * __restrict a, float *b, float *c, int n) > > +{ > > + for (int i = 0; i < n; ++i) > > + { > > + a[2*i + 0] = b[2*i + 0] * c[2*i + 0] - a[2*i + 0]; > > + a[2*i + 1] = b[2*i + 1] * c[2*i + 1] + a[2*i + 1]; > > + } > > +} > > + > > +static void > > +fma_test (void) > > +{ > > + float a[4], b[4], c[4]; > > + for (int i = 0; i < 4; ++i) > > + { > > + a[i] = i; > > + b[i] = 3*i; > > + c[i] = 7*i; > > + } > > + check_fmaddsub (a, b, c, 2); > > + const float d[4] = { 0., 22., 82., 192. }; > > + for (int i = 0; i < 4; ++i) > > + if (a[i] != d[i]) > > + __builtin_abort (); > > +} > > + > > +/* { dg-final { scan-assembler "fmaddsub...ps" } } */ > > diff --git a/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c > > new file mode 100644 > > index 00000000000..7ca2a275cc1 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c > > @@ -0,0 +1,34 @@ > > +/* { dg-do run } */ > > +/* { dg-require-effective-target fma } */ > > +/* { dg-options "-O3 -mfma -save-temps" } */ > > + > > +#include "fma-check.h" > > + > > +void __attribute__((noipa)) > > +check_fmsubadd (double * __restrict a, double *b, double *c, int n) > > +{ > > + for (int i = 0; i < n; ++i) > > + { > > + a[2*i + 0] = b[2*i + 0] * c[2*i + 0] + a[2*i + 0]; > > + a[2*i + 1] = b[2*i + 1] * c[2*i + 1] - a[2*i + 1]; > > + } > > +} > > + > > +static void > > +fma_test (void) > > +{ > > + double a[4], b[4], c[4]; > > + for (int i = 0; i < 4; ++i) > > + { > > + a[i] = i; > > + b[i] = 3*i; > > + c[i] = 7*i; > > + } > > + check_fmsubadd (a, b, c, 2); > > + const double d[4] = { 0., 20., 86., 186. }; > > + for (int i = 0; i < 4; ++i) > > + if (a[i] != d[i]) > > + __builtin_abort (); > > +} > > + > > +/* { dg-final { scan-assembler "fmsubadd...pd" } } */ > > diff --git a/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c > > new file mode 100644 > > index 00000000000..9ddd0e423db > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c > > @@ -0,0 +1,34 @@ > > +/* { dg-do run } */ > > +/* { dg-require-effective-target fma } */ > > +/* { dg-options "-O3 -mfma -save-temps" } */ > > + > > +#include "fma-check.h" > > + > > +void __attribute__((noipa)) > > +check_fmsubadd (float * __restrict a, float *b, float *c, int n) > > +{ > > + for (int i = 0; i < n; ++i) > > + { > > + a[2*i + 0] = b[2*i + 0] * c[2*i + 0] + a[2*i + 0]; > > + a[2*i + 1] = b[2*i + 1] * c[2*i + 1] - a[2*i + 1]; > > + } > > +} > > + > > +static void > > +fma_test (void) > > +{ > > + float a[4], b[4], c[4]; > > + for (int i = 0; i < 4; ++i) > > + { > > + a[i] = i; > > + b[i] = 3*i; > > + c[i] = 7*i; > > + } > > + check_fmsubadd (a, b, c, 2); > > + const float d[4] = { 0., 20., 86., 186. }; > > + for (int i = 0; i < 4; ++i) > > + if (a[i] != d[i]) > > + __builtin_abort (); > > +} > > + > > +/* { dg-final { scan-assembler "fmsubadd...ps" } } */ > > diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c > > index 2671f91972d..f774cac4a4d 100644 > > --- a/gcc/tree-vect-slp-patterns.c > > +++ b/gcc/tree-vect-slp-patterns.c > > @@ -1496,8 +1496,8 @@ complex_operations_pattern::build (vec_info * /* vinfo */) > > class addsub_pattern : public vect_pattern > > { > > public: > > - addsub_pattern (slp_tree *node) > > - : vect_pattern (node, NULL, IFN_VEC_ADDSUB) {}; > > + addsub_pattern (slp_tree *node, internal_fn ifn) > > + : vect_pattern (node, NULL, ifn) {}; > > > > void build (vec_info *); > > > > @@ -1510,46 +1510,68 @@ addsub_pattern::recognize (slp_tree_to_load_perm_map_t *, slp_tree *node_) > > { > > slp_tree node = *node_; > > if (SLP_TREE_CODE (node) != VEC_PERM_EXPR > > - || SLP_TREE_CHILDREN (node).length () != 2) > > + || SLP_TREE_CHILDREN (node).length () != 2 > > + || SLP_TREE_LANE_PERMUTATION (node).length () % 2) > > return NULL; > > > > /* Match a blend of a plus and a minus op with the same number of plus and > > minus lanes on the same operands. */ > > - slp_tree sub = SLP_TREE_CHILDREN (node)[0]; > > - slp_tree add = SLP_TREE_CHILDREN (node)[1]; > > - bool swapped_p = false; > > - if (vect_match_expression_p (sub, PLUS_EXPR)) > > - { > > - std::swap (add, sub); > > - swapped_p = true; > > - } > > - if (!(vect_match_expression_p (add, PLUS_EXPR) > > - && vect_match_expression_p (sub, MINUS_EXPR))) > > + unsigned l0 = SLP_TREE_LANE_PERMUTATION (node)[0].first; > > + unsigned l1 = SLP_TREE_LANE_PERMUTATION (node)[1].first; > > + if (l0 == l1) > > + return NULL; > > + bool l0add_p = vect_match_expression_p (SLP_TREE_CHILDREN (node)[l0], > > + PLUS_EXPR); > > + if (!l0add_p > > + && !vect_match_expression_p (SLP_TREE_CHILDREN (node)[l0], MINUS_EXPR)) > > + return NULL; > > + bool l1add_p = vect_match_expression_p (SLP_TREE_CHILDREN (node)[l1], > > + PLUS_EXPR); > > + if (!l1add_p > > + && !vect_match_expression_p (SLP_TREE_CHILDREN (node)[l1], MINUS_EXPR)) > > return NULL; > > - if (!((SLP_TREE_CHILDREN (sub)[0] == SLP_TREE_CHILDREN (add)[0] > > - && SLP_TREE_CHILDREN (sub)[1] == SLP_TREE_CHILDREN (add)[1]) > > - || (SLP_TREE_CHILDREN (sub)[0] == SLP_TREE_CHILDREN (add)[1] > > - && SLP_TREE_CHILDREN (sub)[1] == SLP_TREE_CHILDREN (add)[0]))) > > + > > + slp_tree l0node = SLP_TREE_CHILDREN (node)[l0]; > > + slp_tree l1node = SLP_TREE_CHILDREN (node)[l1]; > > + if (!((SLP_TREE_CHILDREN (l0node)[0] == SLP_TREE_CHILDREN (l1node)[0] > > + && SLP_TREE_CHILDREN (l0node)[1] == SLP_TREE_CHILDREN (l1node)[1]) > > + || (SLP_TREE_CHILDREN (l0node)[0] == SLP_TREE_CHILDREN (l1node)[1] > > + && SLP_TREE_CHILDREN (l0node)[1] == SLP_TREE_CHILDREN (l1node)[0]))) > > return NULL; > > > > for (unsigned i = 0; i < SLP_TREE_LANE_PERMUTATION (node).length (); ++i) > > { > > std::pair<unsigned, unsigned> perm = SLP_TREE_LANE_PERMUTATION (node)[i]; > > - if (swapped_p) > > - perm.first = perm.first == 0 ? 1 : 0; > > - /* It has to be alternating -, +, -, ... > > + /* It has to be alternating -, +, -, > > While we could permute the .ADDSUB inputs and the .ADDSUB output > > that's only profitable over the add + sub + blend if at least > > one of the permute is optimized which we can't determine here. */ > > - if (perm.first != (i & 1) > > + if (perm.first != ((i & 1) ? l1 : l0) > > || perm.second != i) > > return NULL; > > } > > > > - if (!vect_pattern_validate_optab (IFN_VEC_ADDSUB, node)) > > - return NULL; > > + /* Now we have either { -, +, -, + ... } (!l0add_p) or { +, -, +, - ... } > > + (l0add_p), see whether we have FMA variants. */ > > + if (!l0add_p > > + && vect_match_expression_p (SLP_TREE_CHILDREN (l0node)[0], MULT_EXPR)) > > + { > > + /* (c * d) -+ a */ > > + if (vect_pattern_validate_optab (IFN_VEC_FMADDSUB, node)) > > + return new addsub_pattern (node_, IFN_VEC_FMADDSUB); > > + } > > + else if (l0add_p > > + && vect_match_expression_p (SLP_TREE_CHILDREN (l1node)[0], MULT_EXPR)) > > + { > > + /* (c * d) +- a */ > > + if (vect_pattern_validate_optab (IFN_VEC_FMSUBADD, node)) > > + return new addsub_pattern (node_, IFN_VEC_FMSUBADD); > > + } > > > > - return new addsub_pattern (node_); > > + if (!l0add_p && vect_pattern_validate_optab (IFN_VEC_ADDSUB, node)) > > + return new addsub_pattern (node_, IFN_VEC_ADDSUB); > > + > > + return NULL; > > } > > > > void > > @@ -1557,38 +1579,96 @@ addsub_pattern::build (vec_info *vinfo) > > { > > slp_tree node = *m_node; > > > > - slp_tree sub = SLP_TREE_CHILDREN (node)[0]; > > - slp_tree add = SLP_TREE_CHILDREN (node)[1]; > > - if (vect_match_expression_p (sub, PLUS_EXPR)) > > - std::swap (add, sub); > > - > > - /* Modify the blend node in-place. */ > > - SLP_TREE_CHILDREN (node)[0] = SLP_TREE_CHILDREN (sub)[0]; > > - SLP_TREE_CHILDREN (node)[1] = SLP_TREE_CHILDREN (sub)[1]; > > - SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[0])++; > > - SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[1])++; > > - > > - /* Build IFN_VEC_ADDSUB from the sub representative operands. */ > > - stmt_vec_info rep = SLP_TREE_REPRESENTATIVE (sub); > > - gcall *call = gimple_build_call_internal (IFN_VEC_ADDSUB, 2, > > - gimple_assign_rhs1 (rep->stmt), > > - gimple_assign_rhs2 (rep->stmt)); > > - gimple_call_set_lhs (call, make_ssa_name > > - (TREE_TYPE (gimple_assign_lhs (rep->stmt)))); > > - gimple_call_set_nothrow (call, true); > > - gimple_set_bb (call, gimple_bb (rep->stmt)); > > - stmt_vec_info new_rep = vinfo->add_pattern_stmt (call, rep); > > - SLP_TREE_REPRESENTATIVE (node) = new_rep; > > - STMT_VINFO_RELEVANT (new_rep) = vect_used_in_scope; > > - STMT_SLP_TYPE (new_rep) = pure_slp; > > - STMT_VINFO_VECTYPE (new_rep) = SLP_TREE_VECTYPE (node); > > - STMT_VINFO_SLP_VECT_ONLY_PATTERN (new_rep) = true; > > - STMT_VINFO_REDUC_DEF (new_rep) = STMT_VINFO_REDUC_DEF (vect_orig_stmt (rep)); > > - SLP_TREE_CODE (node) = ERROR_MARK; > > - SLP_TREE_LANE_PERMUTATION (node).release (); > > - > > - vect_free_slp_tree (sub); > > - vect_free_slp_tree (add); > > + unsigned l0 = SLP_TREE_LANE_PERMUTATION (node)[0].first; > > + unsigned l1 = SLP_TREE_LANE_PERMUTATION (node)[1].first; > > + > > + switch (m_ifn) > > + { > > + case IFN_VEC_ADDSUB: > > + { > > + slp_tree sub = SLP_TREE_CHILDREN (node)[l0]; > > + slp_tree add = SLP_TREE_CHILDREN (node)[l1]; > > + > > + /* Modify the blend node in-place. */ > > + SLP_TREE_CHILDREN (node)[0] = SLP_TREE_CHILDREN (sub)[0]; > > + SLP_TREE_CHILDREN (node)[1] = SLP_TREE_CHILDREN (sub)[1]; > > + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[0])++; > > + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[1])++; > > + > > + /* Build IFN_VEC_ADDSUB from the sub representative operands. */ > > + stmt_vec_info rep = SLP_TREE_REPRESENTATIVE (sub); > > + gcall *call = gimple_build_call_internal (IFN_VEC_ADDSUB, 2, > > + gimple_assign_rhs1 (rep->stmt), > > + gimple_assign_rhs2 (rep->stmt)); > > + gimple_call_set_lhs (call, make_ssa_name > > + (TREE_TYPE (gimple_assign_lhs (rep->stmt)))); > > + gimple_call_set_nothrow (call, true); > > + gimple_set_bb (call, gimple_bb (rep->stmt)); > > + stmt_vec_info new_rep = vinfo->add_pattern_stmt (call, rep); > > + SLP_TREE_REPRESENTATIVE (node) = new_rep; > > + STMT_VINFO_RELEVANT (new_rep) = vect_used_in_scope; > > + STMT_SLP_TYPE (new_rep) = pure_slp; > > + STMT_VINFO_VECTYPE (new_rep) = SLP_TREE_VECTYPE (node); > > + STMT_VINFO_SLP_VECT_ONLY_PATTERN (new_rep) = true; > > + STMT_VINFO_REDUC_DEF (new_rep) = STMT_VINFO_REDUC_DEF (vect_orig_stmt (rep)); > > + SLP_TREE_CODE (node) = ERROR_MARK; > > + SLP_TREE_LANE_PERMUTATION (node).release (); > > + > > + vect_free_slp_tree (sub); > > + vect_free_slp_tree (add); > > + break; > > + } > > + case IFN_VEC_FMADDSUB: > > + case IFN_VEC_FMSUBADD: > > + { > > + slp_tree sub, add; > > + if (m_ifn == IFN_VEC_FMADDSUB) > > + { > > + sub = SLP_TREE_CHILDREN (node)[l0]; > > + add = SLP_TREE_CHILDREN (node)[l1]; > > + } > > + else /* m_ifn == IFN_VEC_FMSUBADD */ > > + { > > + sub = SLP_TREE_CHILDREN (node)[l1]; > > + add = SLP_TREE_CHILDREN (node)[l0]; > > + } > > + slp_tree mul = SLP_TREE_CHILDREN (sub)[0]; > > + /* Modify the blend node in-place. */ > > + SLP_TREE_CHILDREN (node).safe_grow (3, true); > > + SLP_TREE_CHILDREN (node)[0] = SLP_TREE_CHILDREN (mul)[0]; > > + SLP_TREE_CHILDREN (node)[1] = SLP_TREE_CHILDREN (mul)[1]; > > + SLP_TREE_CHILDREN (node)[2] = SLP_TREE_CHILDREN (sub)[1]; > > + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[0])++; > > + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[1])++; > > + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[2])++; > > + > > + /* Build IFN_VEC_FMADDSUB from the mul/sub representative operands. */ > > + stmt_vec_info srep = SLP_TREE_REPRESENTATIVE (sub); > > + stmt_vec_info mrep = SLP_TREE_REPRESENTATIVE (mul); > > + gcall *call = gimple_build_call_internal (m_ifn, 3, > > + gimple_assign_rhs1 (mrep->stmt), > > + gimple_assign_rhs2 (mrep->stmt), > > + gimple_assign_rhs2 (srep->stmt)); > > + gimple_call_set_lhs (call, make_ssa_name > > + (TREE_TYPE (gimple_assign_lhs (srep->stmt)))); > > + gimple_call_set_nothrow (call, true); > > + gimple_set_bb (call, gimple_bb (srep->stmt)); > > + stmt_vec_info new_rep = vinfo->add_pattern_stmt (call, srep); > > + SLP_TREE_REPRESENTATIVE (node) = new_rep; > > + STMT_VINFO_RELEVANT (new_rep) = vect_used_in_scope; > > + STMT_SLP_TYPE (new_rep) = pure_slp; > > + STMT_VINFO_VECTYPE (new_rep) = SLP_TREE_VECTYPE (node); > > + STMT_VINFO_SLP_VECT_ONLY_PATTERN (new_rep) = true; > > + STMT_VINFO_REDUC_DEF (new_rep) = STMT_VINFO_REDUC_DEF (vect_orig_stmt (srep)); > > + SLP_TREE_CODE (node) = ERROR_MARK; > > + SLP_TREE_LANE_PERMUTATION (node).release (); > > + > > + vect_free_slp_tree (sub); > > + vect_free_slp_tree (add); > > + break; > > + } > > + default:; > > + } > > } > > > > /******************************************************************************* > > diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c > > index f08797c2bc0..5357cd0e7a4 100644 > > --- a/gcc/tree-vect-slp.c > > +++ b/gcc/tree-vect-slp.c > > @@ -3728,6 +3728,8 @@ vect_optimize_slp (vec_info *vinfo) > > case CFN_COMPLEX_MUL: > > case CFN_COMPLEX_MUL_CONJ: > > case CFN_VEC_ADDSUB: > > + case CFN_VEC_FMADDSUB: > > + case CFN_VEC_FMSUBADD: > > vertices[idx].perm_in = 0; > > vertices[idx].perm_out = 0; > > default:; > > -- > > 2.26.2 > -- Richard Biener <rguenther@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg) ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Add FMADDSUB and FMSUBADD SLP vectorization patterns and optabs 2021-07-05 14:09 [PATCH] Add FMADDSUB and FMSUBADD SLP vectorization patterns and optabs Richard Biener 2021-07-05 14:25 ` Richard Biener @ 2021-07-06 2:16 ` Hongtao Liu 2021-07-06 7:42 ` Richard Biener 1 sibling, 1 reply; 8+ messages in thread From: Hongtao Liu @ 2021-07-06 2:16 UTC (permalink / raw) To: Richard Biener; +Cc: GCC Patches, Liu, Hongtao On Mon, Jul 5, 2021 at 10:09 PM Richard Biener <rguenther@suse.de> wrote: > > This adds named expanders for vec_fmaddsub<mode>4 and > vec_fmsubadd<mode>4 which map to x86 vfmaddsubXXXp{ds} and > vfmsubaddXXXp{ds} instructions. This complements the previous > addition of ADDSUB support. > > x86 lacks SUBADD and the negate variants of FMA with mixed > plus minus so I did not add optabs or patterns for those but > it would not be difficult if there's a target that has them. > Maybe one of the complex fma patterns match those variants? > > I did not dare to rewrite the numerous patterns to the new > canonical name but instead added two new expanders. Note I > did not cover AVX512 since the existing patterns are separated > and I have no easy way to test things there. Handling AVX512 > should be easy as followup though. > > Bootstrap and testing on x86_64-unknown-linux-gnu in progress. > > Any comments? > > Thanks, > Richard. > > 2021-07-05 Richard Biener <rguenther@suse.de> > > * doc/md.texi (vec_fmaddsub<mode>4): Document. > (vec_fmsubadd<mode>4): Likewise. > * optabs.def (vec_fmaddsub$a4): Add. > (vec_fmsubadd$a4): Likewise. > * internal-fn.def (IFN_VEC_FMADDSUB): Add. > (IFN_VEC_FMSUBADD): Likewise. > * tree-vect-slp-patterns.c (addsub_pattern::recognize): > Refactor to handle IFN_VEC_FMADDSUB and IFN_VEC_FMSUBADD. > (addsub_pattern::build): Likewise. > * tree-vect-slp.c (vect_optimize_slp): CFN_VEC_FMADDSUB > and CFN_VEC_FMSUBADD are not transparent for permutes. > * config/i386/sse.md (vec_fmaddsub<mode>4): New expander. > (vec_fmsubadd<mode>4): Likewise. > > * gcc.target/i386/vect-fmaddsubXXXpd.c: New testcase. > * gcc.target/i386/vect-fmaddsubXXXps.c: Likewise. > * gcc.target/i386/vect-fmsubaddXXXpd.c: Likewise. > * gcc.target/i386/vect-fmsubaddXXXps.c: Likewise. > --- > gcc/config/i386/sse.md | 19 ++ > gcc/doc/md.texi | 14 ++ > gcc/internal-fn.def | 3 +- > gcc/optabs.def | 2 + > .../gcc.target/i386/vect-fmaddsubXXXpd.c | 34 ++++ > .../gcc.target/i386/vect-fmaddsubXXXps.c | 34 ++++ > .../gcc.target/i386/vect-fmsubaddXXXpd.c | 34 ++++ > .../gcc.target/i386/vect-fmsubaddXXXps.c | 34 ++++ > gcc/tree-vect-slp-patterns.c | 192 +++++++++++++----- > gcc/tree-vect-slp.c | 2 + > 10 files changed, 311 insertions(+), 57 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c > create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c > create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c > create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md > index bcf1605d147..6fc13c184bf 100644 > --- a/gcc/config/i386/sse.md > +++ b/gcc/config/i386/sse.md > @@ -4644,6 +4644,25 @@ > ;; > ;; But this doesn't seem useful in practice. > > +(define_expand "vec_fmaddsub<mode>4" > + [(set (match_operand:VF 0 "register_operand") > + (unspec:VF > + [(match_operand:VF 1 "nonimmediate_operand") > + (match_operand:VF 2 "nonimmediate_operand") > + (match_operand:VF 3 "nonimmediate_operand")] > + UNSPEC_FMADDSUB))] > + "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F") > + > +(define_expand "vec_fmsubadd<mode>4" > + [(set (match_operand:VF 0 "register_operand") > + (unspec:VF > + [(match_operand:VF 1 "nonimmediate_operand") > + (match_operand:VF 2 "nonimmediate_operand") > + (neg:VF > + (match_operand:VF 3 "nonimmediate_operand"))] > + UNSPEC_FMADDSUB))] > + "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F") > + W/ condition like "TARGET_FMA || TARGET_FMA4 || (<MODE_SIZE> == 64 || TARGET_AVX512VL)“? the original expander "fmaddsub_<mode>" is only used by builtins which have it's own guard for AVX512VL, It doesn't matter if it doesn't have TARGET_AVX512VL BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_fmaddsub_v4df_mask, "__builtin_ia32_vfmaddsubpd256_mask", IX86_BUILTIN_VFMADDSUBPD256_MASK, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF_V4DF_UQI) > (define_expand "fmaddsub_<mode>" > [(set (match_operand:VF 0 "register_operand") > (unspec:VF > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi > index 1b918144330..cc92ebd26aa 100644 > --- a/gcc/doc/md.texi > +++ b/gcc/doc/md.texi > @@ -5688,6 +5688,20 @@ Alternating subtract, add with even lanes doing subtract and odd > lanes doing addition. Operands 1 and 2 and the outout operand are vectors > with mode @var{m}. > > +@cindex @code{vec_fmaddsub@var{m}4} instruction pattern > +@item @samp{vec_fmaddsub@var{m}4} > +Alternating multiply subtract, add with even lanes doing subtract and odd > +lanes doing addition of the third operand to the multiplication result > +of the first two operands. Operands 1, 2 and 3 and the outout operand are vectors > +with mode @var{m}. > + > +@cindex @code{vec_fmsubadd@var{m}4} instruction pattern > +@item @samp{vec_fmsubadd@var{m}4} > +Alternating multiply add, subtract with even lanes doing addition and odd > +lanes doing subtraction of the third operand to the multiplication result > +of the first two operands. Operands 1, 2 and 3 and the outout operand are vectors > +with mode @var{m}. > + > These instructions are not allowed to @code{FAIL}. > > @cindex @code{mulhisi3} instruction pattern > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def > index c3b8e730960..a7003d5da8e 100644 > --- a/gcc/internal-fn.def > +++ b/gcc/internal-fn.def > @@ -282,7 +282,8 @@ DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary) > DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL, ECF_CONST, cmul, binary) > DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL_CONJ, ECF_CONST, cmul_conj, binary) > DEF_INTERNAL_OPTAB_FN (VEC_ADDSUB, ECF_CONST, vec_addsub, binary) > - > +DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, vec_fmaddsub, ternary) > +DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd, ternary) > > /* FP scales. */ > DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary) > diff --git a/gcc/optabs.def b/gcc/optabs.def > index 41ab2598eb6..51acc1be8f5 100644 > --- a/gcc/optabs.def > +++ b/gcc/optabs.def > @@ -408,6 +408,8 @@ OPTAB_D (vec_widen_usubl_lo_optab, "vec_widen_usubl_lo_$a") > OPTAB_D (vec_widen_uaddl_hi_optab, "vec_widen_uaddl_hi_$a") > OPTAB_D (vec_widen_uaddl_lo_optab, "vec_widen_uaddl_lo_$a") > OPTAB_D (vec_addsub_optab, "vec_addsub$a3") > +OPTAB_D (vec_fmaddsub_optab, "vec_fmaddsub$a4") > +OPTAB_D (vec_fmsubadd_optab, "vec_fmsubadd$a4") > > OPTAB_D (sync_add_optab, "sync_add$I$a") > OPTAB_D (sync_and_optab, "sync_and$I$a") > diff --git a/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c > new file mode 100644 > index 00000000000..b30d10731a7 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c > @@ -0,0 +1,34 @@ > +/* { dg-do run } */ > +/* { dg-require-effective-target fma } */ > +/* { dg-options "-O3 -mfma -save-temps" } */ > + > +#include "fma-check.h" > + > +void __attribute__((noipa)) > +check_fmaddsub (double * __restrict a, double *b, double *c, int n) > +{ > + for (int i = 0; i < n; ++i) > + { > + a[2*i + 0] = b[2*i + 0] * c[2*i + 0] - a[2*i + 0]; > + a[2*i + 1] = b[2*i + 1] * c[2*i + 1] + a[2*i + 1]; > + } > +} > + > +static void > +fma_test (void) > +{ > + double a[4], b[4], c[4]; > + for (int i = 0; i < 4; ++i) > + { > + a[i] = i; > + b[i] = 3*i; > + c[i] = 7*i; > + } > + check_fmaddsub (a, b, c, 2); > + const double d[4] = { 0., 22., 82., 192. }; > + for (int i = 0; i < 4; ++i) > + if (a[i] != d[i]) > + __builtin_abort (); > +} > + > +/* { dg-final { scan-assembler "fmaddsub...pd" } } */ > diff --git a/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c > new file mode 100644 > index 00000000000..cd2af8725a3 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c > @@ -0,0 +1,34 @@ > +/* { dg-do run } */ > +/* { dg-require-effective-target fma } */ > +/* { dg-options "-O3 -mfma -save-temps" } */ > + > +#include "fma-check.h" > + > +void __attribute__((noipa)) > +check_fmaddsub (float * __restrict a, float *b, float *c, int n) > +{ > + for (int i = 0; i < n; ++i) > + { > + a[2*i + 0] = b[2*i + 0] * c[2*i + 0] - a[2*i + 0]; > + a[2*i + 1] = b[2*i + 1] * c[2*i + 1] + a[2*i + 1]; > + } > +} > + > +static void > +fma_test (void) > +{ > + float a[4], b[4], c[4]; > + for (int i = 0; i < 4; ++i) > + { > + a[i] = i; > + b[i] = 3*i; > + c[i] = 7*i; > + } > + check_fmaddsub (a, b, c, 2); > + const float d[4] = { 0., 22., 82., 192. }; > + for (int i = 0; i < 4; ++i) > + if (a[i] != d[i]) > + __builtin_abort (); > +} > + > +/* { dg-final { scan-assembler "fmaddsub...ps" } } */ > diff --git a/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c > new file mode 100644 > index 00000000000..7ca2a275cc1 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c > @@ -0,0 +1,34 @@ > +/* { dg-do run } */ > +/* { dg-require-effective-target fma } */ > +/* { dg-options "-O3 -mfma -save-temps" } */ > + > +#include "fma-check.h" > + > +void __attribute__((noipa)) > +check_fmsubadd (double * __restrict a, double *b, double *c, int n) > +{ > + for (int i = 0; i < n; ++i) > + { > + a[2*i + 0] = b[2*i + 0] * c[2*i + 0] + a[2*i + 0]; > + a[2*i + 1] = b[2*i + 1] * c[2*i + 1] - a[2*i + 1]; > + } > +} > + > +static void > +fma_test (void) > +{ > + double a[4], b[4], c[4]; > + for (int i = 0; i < 4; ++i) > + { > + a[i] = i; > + b[i] = 3*i; > + c[i] = 7*i; > + } > + check_fmsubadd (a, b, c, 2); > + const double d[4] = { 0., 20., 86., 186. }; > + for (int i = 0; i < 4; ++i) > + if (a[i] != d[i]) > + __builtin_abort (); > +} > + > +/* { dg-final { scan-assembler "fmsubadd...pd" } } */ > diff --git a/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c > new file mode 100644 > index 00000000000..9ddd0e423db > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c > @@ -0,0 +1,34 @@ > +/* { dg-do run } */ > +/* { dg-require-effective-target fma } */ > +/* { dg-options "-O3 -mfma -save-temps" } */ > + > +#include "fma-check.h" > + > +void __attribute__((noipa)) > +check_fmsubadd (float * __restrict a, float *b, float *c, int n) > +{ > + for (int i = 0; i < n; ++i) > + { > + a[2*i + 0] = b[2*i + 0] * c[2*i + 0] + a[2*i + 0]; > + a[2*i + 1] = b[2*i + 1] * c[2*i + 1] - a[2*i + 1]; > + } > +} > + > +static void > +fma_test (void) > +{ > + float a[4], b[4], c[4]; > + for (int i = 0; i < 4; ++i) > + { > + a[i] = i; > + b[i] = 3*i; > + c[i] = 7*i; > + } > + check_fmsubadd (a, b, c, 2); > + const float d[4] = { 0., 20., 86., 186. }; > + for (int i = 0; i < 4; ++i) > + if (a[i] != d[i]) > + __builtin_abort (); > +} > + > +/* { dg-final { scan-assembler "fmsubadd...ps" } } */ > diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c > index 2671f91972d..f774cac4a4d 100644 > --- a/gcc/tree-vect-slp-patterns.c > +++ b/gcc/tree-vect-slp-patterns.c > @@ -1496,8 +1496,8 @@ complex_operations_pattern::build (vec_info * /* vinfo */) > class addsub_pattern : public vect_pattern > { > public: > - addsub_pattern (slp_tree *node) > - : vect_pattern (node, NULL, IFN_VEC_ADDSUB) {}; > + addsub_pattern (slp_tree *node, internal_fn ifn) > + : vect_pattern (node, NULL, ifn) {}; > > void build (vec_info *); > > @@ -1510,46 +1510,68 @@ addsub_pattern::recognize (slp_tree_to_load_perm_map_t *, slp_tree *node_) > { > slp_tree node = *node_; > if (SLP_TREE_CODE (node) != VEC_PERM_EXPR > - || SLP_TREE_CHILDREN (node).length () != 2) > + || SLP_TREE_CHILDREN (node).length () != 2 > + || SLP_TREE_LANE_PERMUTATION (node).length () % 2) > return NULL; > > /* Match a blend of a plus and a minus op with the same number of plus and > minus lanes on the same operands. */ > - slp_tree sub = SLP_TREE_CHILDREN (node)[0]; > - slp_tree add = SLP_TREE_CHILDREN (node)[1]; > - bool swapped_p = false; > - if (vect_match_expression_p (sub, PLUS_EXPR)) > - { > - std::swap (add, sub); > - swapped_p = true; > - } > - if (!(vect_match_expression_p (add, PLUS_EXPR) > - && vect_match_expression_p (sub, MINUS_EXPR))) > + unsigned l0 = SLP_TREE_LANE_PERMUTATION (node)[0].first; > + unsigned l1 = SLP_TREE_LANE_PERMUTATION (node)[1].first; > + if (l0 == l1) > + return NULL; > + bool l0add_p = vect_match_expression_p (SLP_TREE_CHILDREN (node)[l0], > + PLUS_EXPR); > + if (!l0add_p > + && !vect_match_expression_p (SLP_TREE_CHILDREN (node)[l0], MINUS_EXPR)) > + return NULL; > + bool l1add_p = vect_match_expression_p (SLP_TREE_CHILDREN (node)[l1], > + PLUS_EXPR); > + if (!l1add_p > + && !vect_match_expression_p (SLP_TREE_CHILDREN (node)[l1], MINUS_EXPR)) > return NULL; > - if (!((SLP_TREE_CHILDREN (sub)[0] == SLP_TREE_CHILDREN (add)[0] > - && SLP_TREE_CHILDREN (sub)[1] == SLP_TREE_CHILDREN (add)[1]) > - || (SLP_TREE_CHILDREN (sub)[0] == SLP_TREE_CHILDREN (add)[1] > - && SLP_TREE_CHILDREN (sub)[1] == SLP_TREE_CHILDREN (add)[0]))) > + > + slp_tree l0node = SLP_TREE_CHILDREN (node)[l0]; > + slp_tree l1node = SLP_TREE_CHILDREN (node)[l1]; > + if (!((SLP_TREE_CHILDREN (l0node)[0] == SLP_TREE_CHILDREN (l1node)[0] > + && SLP_TREE_CHILDREN (l0node)[1] == SLP_TREE_CHILDREN (l1node)[1]) > + || (SLP_TREE_CHILDREN (l0node)[0] == SLP_TREE_CHILDREN (l1node)[1] > + && SLP_TREE_CHILDREN (l0node)[1] == SLP_TREE_CHILDREN (l1node)[0]))) > return NULL; > > for (unsigned i = 0; i < SLP_TREE_LANE_PERMUTATION (node).length (); ++i) > { > std::pair<unsigned, unsigned> perm = SLP_TREE_LANE_PERMUTATION (node)[i]; > - if (swapped_p) > - perm.first = perm.first == 0 ? 1 : 0; > - /* It has to be alternating -, +, -, ... > + /* It has to be alternating -, +, -, > While we could permute the .ADDSUB inputs and the .ADDSUB output > that's only profitable over the add + sub + blend if at least > one of the permute is optimized which we can't determine here. */ > - if (perm.first != (i & 1) > + if (perm.first != ((i & 1) ? l1 : l0) > || perm.second != i) > return NULL; > } > > - if (!vect_pattern_validate_optab (IFN_VEC_ADDSUB, node)) > - return NULL; > + /* Now we have either { -, +, -, + ... } (!l0add_p) or { +, -, +, - ... } > + (l0add_p), see whether we have FMA variants. */ > + if (!l0add_p > + && vect_match_expression_p (SLP_TREE_CHILDREN (l0node)[0], MULT_EXPR)) > + { > + /* (c * d) -+ a */ > + if (vect_pattern_validate_optab (IFN_VEC_FMADDSUB, node)) > + return new addsub_pattern (node_, IFN_VEC_FMADDSUB); > + } > + else if (l0add_p > + && vect_match_expression_p (SLP_TREE_CHILDREN (l1node)[0], MULT_EXPR)) > + { > + /* (c * d) +- a */ > + if (vect_pattern_validate_optab (IFN_VEC_FMSUBADD, node)) > + return new addsub_pattern (node_, IFN_VEC_FMSUBADD); > + } > > - return new addsub_pattern (node_); > + if (!l0add_p && vect_pattern_validate_optab (IFN_VEC_ADDSUB, node)) > + return new addsub_pattern (node_, IFN_VEC_ADDSUB); > + > + return NULL; > } > > void > @@ -1557,38 +1579,96 @@ addsub_pattern::build (vec_info *vinfo) > { > slp_tree node = *m_node; > > - slp_tree sub = SLP_TREE_CHILDREN (node)[0]; > - slp_tree add = SLP_TREE_CHILDREN (node)[1]; > - if (vect_match_expression_p (sub, PLUS_EXPR)) > - std::swap (add, sub); > - > - /* Modify the blend node in-place. */ > - SLP_TREE_CHILDREN (node)[0] = SLP_TREE_CHILDREN (sub)[0]; > - SLP_TREE_CHILDREN (node)[1] = SLP_TREE_CHILDREN (sub)[1]; > - SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[0])++; > - SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[1])++; > - > - /* Build IFN_VEC_ADDSUB from the sub representative operands. */ > - stmt_vec_info rep = SLP_TREE_REPRESENTATIVE (sub); > - gcall *call = gimple_build_call_internal (IFN_VEC_ADDSUB, 2, > - gimple_assign_rhs1 (rep->stmt), > - gimple_assign_rhs2 (rep->stmt)); > - gimple_call_set_lhs (call, make_ssa_name > - (TREE_TYPE (gimple_assign_lhs (rep->stmt)))); > - gimple_call_set_nothrow (call, true); > - gimple_set_bb (call, gimple_bb (rep->stmt)); > - stmt_vec_info new_rep = vinfo->add_pattern_stmt (call, rep); > - SLP_TREE_REPRESENTATIVE (node) = new_rep; > - STMT_VINFO_RELEVANT (new_rep) = vect_used_in_scope; > - STMT_SLP_TYPE (new_rep) = pure_slp; > - STMT_VINFO_VECTYPE (new_rep) = SLP_TREE_VECTYPE (node); > - STMT_VINFO_SLP_VECT_ONLY_PATTERN (new_rep) = true; > - STMT_VINFO_REDUC_DEF (new_rep) = STMT_VINFO_REDUC_DEF (vect_orig_stmt (rep)); > - SLP_TREE_CODE (node) = ERROR_MARK; > - SLP_TREE_LANE_PERMUTATION (node).release (); > - > - vect_free_slp_tree (sub); > - vect_free_slp_tree (add); > + unsigned l0 = SLP_TREE_LANE_PERMUTATION (node)[0].first; > + unsigned l1 = SLP_TREE_LANE_PERMUTATION (node)[1].first; > + > + switch (m_ifn) > + { > + case IFN_VEC_ADDSUB: > + { > + slp_tree sub = SLP_TREE_CHILDREN (node)[l0]; > + slp_tree add = SLP_TREE_CHILDREN (node)[l1]; > + > + /* Modify the blend node in-place. */ > + SLP_TREE_CHILDREN (node)[0] = SLP_TREE_CHILDREN (sub)[0]; > + SLP_TREE_CHILDREN (node)[1] = SLP_TREE_CHILDREN (sub)[1]; > + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[0])++; > + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[1])++; > + > + /* Build IFN_VEC_ADDSUB from the sub representative operands. */ > + stmt_vec_info rep = SLP_TREE_REPRESENTATIVE (sub); > + gcall *call = gimple_build_call_internal (IFN_VEC_ADDSUB, 2, > + gimple_assign_rhs1 (rep->stmt), > + gimple_assign_rhs2 (rep->stmt)); > + gimple_call_set_lhs (call, make_ssa_name > + (TREE_TYPE (gimple_assign_lhs (rep->stmt)))); > + gimple_call_set_nothrow (call, true); > + gimple_set_bb (call, gimple_bb (rep->stmt)); > + stmt_vec_info new_rep = vinfo->add_pattern_stmt (call, rep); > + SLP_TREE_REPRESENTATIVE (node) = new_rep; > + STMT_VINFO_RELEVANT (new_rep) = vect_used_in_scope; > + STMT_SLP_TYPE (new_rep) = pure_slp; > + STMT_VINFO_VECTYPE (new_rep) = SLP_TREE_VECTYPE (node); > + STMT_VINFO_SLP_VECT_ONLY_PATTERN (new_rep) = true; > + STMT_VINFO_REDUC_DEF (new_rep) = STMT_VINFO_REDUC_DEF (vect_orig_stmt (rep)); > + SLP_TREE_CODE (node) = ERROR_MARK; > + SLP_TREE_LANE_PERMUTATION (node).release (); > + > + vect_free_slp_tree (sub); > + vect_free_slp_tree (add); > + break; > + } > + case IFN_VEC_FMADDSUB: > + case IFN_VEC_FMSUBADD: > + { > + slp_tree sub, add; > + if (m_ifn == IFN_VEC_FMADDSUB) > + { > + sub = SLP_TREE_CHILDREN (node)[l0]; > + add = SLP_TREE_CHILDREN (node)[l1]; > + } > + else /* m_ifn == IFN_VEC_FMSUBADD */ > + { > + sub = SLP_TREE_CHILDREN (node)[l1]; > + add = SLP_TREE_CHILDREN (node)[l0]; > + } > + slp_tree mul = SLP_TREE_CHILDREN (sub)[0]; > + /* Modify the blend node in-place. */ > + SLP_TREE_CHILDREN (node).safe_grow (3, true); > + SLP_TREE_CHILDREN (node)[0] = SLP_TREE_CHILDREN (mul)[0]; > + SLP_TREE_CHILDREN (node)[1] = SLP_TREE_CHILDREN (mul)[1]; > + SLP_TREE_CHILDREN (node)[2] = SLP_TREE_CHILDREN (sub)[1]; > + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[0])++; > + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[1])++; > + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[2])++; > + > + /* Build IFN_VEC_FMADDSUB from the mul/sub representative operands. */ > + stmt_vec_info srep = SLP_TREE_REPRESENTATIVE (sub); > + stmt_vec_info mrep = SLP_TREE_REPRESENTATIVE (mul); > + gcall *call = gimple_build_call_internal (m_ifn, 3, > + gimple_assign_rhs1 (mrep->stmt), > + gimple_assign_rhs2 (mrep->stmt), > + gimple_assign_rhs2 (srep->stmt)); > + gimple_call_set_lhs (call, make_ssa_name > + (TREE_TYPE (gimple_assign_lhs (srep->stmt)))); > + gimple_call_set_nothrow (call, true); > + gimple_set_bb (call, gimple_bb (srep->stmt)); > + stmt_vec_info new_rep = vinfo->add_pattern_stmt (call, srep); > + SLP_TREE_REPRESENTATIVE (node) = new_rep; > + STMT_VINFO_RELEVANT (new_rep) = vect_used_in_scope; > + STMT_SLP_TYPE (new_rep) = pure_slp; > + STMT_VINFO_VECTYPE (new_rep) = SLP_TREE_VECTYPE (node); > + STMT_VINFO_SLP_VECT_ONLY_PATTERN (new_rep) = true; > + STMT_VINFO_REDUC_DEF (new_rep) = STMT_VINFO_REDUC_DEF (vect_orig_stmt (srep)); > + SLP_TREE_CODE (node) = ERROR_MARK; > + SLP_TREE_LANE_PERMUTATION (node).release (); > + > + vect_free_slp_tree (sub); > + vect_free_slp_tree (add); > + break; > + } > + default:; > + } > } > > /******************************************************************************* > diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c > index f08797c2bc0..5357cd0e7a4 100644 > --- a/gcc/tree-vect-slp.c > +++ b/gcc/tree-vect-slp.c > @@ -3728,6 +3728,8 @@ vect_optimize_slp (vec_info *vinfo) > case CFN_COMPLEX_MUL: > case CFN_COMPLEX_MUL_CONJ: > case CFN_VEC_ADDSUB: > + case CFN_VEC_FMADDSUB: > + case CFN_VEC_FMSUBADD: > vertices[idx].perm_in = 0; > vertices[idx].perm_out = 0; > default:; > -- > 2.26.2 -- BR, Hongtao ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Add FMADDSUB and FMSUBADD SLP vectorization patterns and optabs 2021-07-06 2:16 ` Hongtao Liu @ 2021-07-06 7:42 ` Richard Biener 2021-07-06 8:29 ` Hongtao Liu 0 siblings, 1 reply; 8+ messages in thread From: Richard Biener @ 2021-07-06 7:42 UTC (permalink / raw) To: Hongtao Liu; +Cc: GCC Patches, Liu, Hongtao On Tue, 6 Jul 2021, Hongtao Liu wrote: > On Mon, Jul 5, 2021 at 10:09 PM Richard Biener <rguenther@suse.de> wrote: > > > > This adds named expanders for vec_fmaddsub<mode>4 and > > vec_fmsubadd<mode>4 which map to x86 vfmaddsubXXXp{ds} and > > vfmsubaddXXXp{ds} instructions. This complements the previous > > addition of ADDSUB support. > > > > x86 lacks SUBADD and the negate variants of FMA with mixed > > plus minus so I did not add optabs or patterns for those but > > it would not be difficult if there's a target that has them. > > Maybe one of the complex fma patterns match those variants? > > > > I did not dare to rewrite the numerous patterns to the new > > canonical name but instead added two new expanders. Note I > > did not cover AVX512 since the existing patterns are separated > > and I have no easy way to test things there. Handling AVX512 > > should be easy as followup though. > > > > Bootstrap and testing on x86_64-unknown-linux-gnu in progress. > > > > Any comments? > > > > Thanks, > > Richard. > > > > 2021-07-05 Richard Biener <rguenther@suse.de> > > > > * doc/md.texi (vec_fmaddsub<mode>4): Document. > > (vec_fmsubadd<mode>4): Likewise. > > * optabs.def (vec_fmaddsub$a4): Add. > > (vec_fmsubadd$a4): Likewise. > > * internal-fn.def (IFN_VEC_FMADDSUB): Add. > > (IFN_VEC_FMSUBADD): Likewise. > > * tree-vect-slp-patterns.c (addsub_pattern::recognize): > > Refactor to handle IFN_VEC_FMADDSUB and IFN_VEC_FMSUBADD. > > (addsub_pattern::build): Likewise. > > * tree-vect-slp.c (vect_optimize_slp): CFN_VEC_FMADDSUB > > and CFN_VEC_FMSUBADD are not transparent for permutes. > > * config/i386/sse.md (vec_fmaddsub<mode>4): New expander. > > (vec_fmsubadd<mode>4): Likewise. > > > > * gcc.target/i386/vect-fmaddsubXXXpd.c: New testcase. > > * gcc.target/i386/vect-fmaddsubXXXps.c: Likewise. > > * gcc.target/i386/vect-fmsubaddXXXpd.c: Likewise. > > * gcc.target/i386/vect-fmsubaddXXXps.c: Likewise. > > --- > > gcc/config/i386/sse.md | 19 ++ > > gcc/doc/md.texi | 14 ++ > > gcc/internal-fn.def | 3 +- > > gcc/optabs.def | 2 + > > .../gcc.target/i386/vect-fmaddsubXXXpd.c | 34 ++++ > > .../gcc.target/i386/vect-fmaddsubXXXps.c | 34 ++++ > > .../gcc.target/i386/vect-fmsubaddXXXpd.c | 34 ++++ > > .../gcc.target/i386/vect-fmsubaddXXXps.c | 34 ++++ > > gcc/tree-vect-slp-patterns.c | 192 +++++++++++++----- > > gcc/tree-vect-slp.c | 2 + > > 10 files changed, 311 insertions(+), 57 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c > > create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c > > create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c > > create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c > > > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md > > index bcf1605d147..6fc13c184bf 100644 > > --- a/gcc/config/i386/sse.md > > +++ b/gcc/config/i386/sse.md > > @@ -4644,6 +4644,25 @@ > > ;; > > ;; But this doesn't seem useful in practice. > > > > +(define_expand "vec_fmaddsub<mode>4" > > + [(set (match_operand:VF 0 "register_operand") > > + (unspec:VF > > + [(match_operand:VF 1 "nonimmediate_operand") > > + (match_operand:VF 2 "nonimmediate_operand") > > + (match_operand:VF 3 "nonimmediate_operand")] > > + UNSPEC_FMADDSUB))] > > + "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F") > > + > > +(define_expand "vec_fmsubadd<mode>4" > > + [(set (match_operand:VF 0 "register_operand") > > + (unspec:VF > > + [(match_operand:VF 1 "nonimmediate_operand") > > + (match_operand:VF 2 "nonimmediate_operand") > > + (neg:VF > > + (match_operand:VF 3 "nonimmediate_operand"))] > > + UNSPEC_FMADDSUB))] > > + "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F") > > + > > W/ condition like > "TARGET_FMA || TARGET_FMA4 > || (<MODE_SIZE> == 64 || TARGET_AVX512VL)“? > > the original expander "fmaddsub_<mode>" is only used by builtins which > have it's own guard for AVX512VL, It doesn't matter if it doesn't have > TARGET_AVX512VL > BDESC (OPTION_MASK_ISA_AVX512VL, 0, > CODE_FOR_avx512vl_fmaddsub_v4df_mask, > "__builtin_ia32_vfmaddsubpd256_mask", > IX86_BUILTIN_VFMADDSUBPD256_MASK, UNKNOWN, (int) > V4DF_FTYPE_V4DF_V4DF_V4DF_UQI) OK, that seems to work! Bootstrapped and tested on x86_64-unknown-linux-gnu - are the x86 backend changes OK? Thanks, Richard. This adds named expanders for vec_fmaddsub<mode>4 and vec_fmsubadd<mode>4 which map to x86 vfmaddsubXXXp{ds} and vfmsubaddXXXp{ds} instructions. This complements the previous addition of ADDSUB support. x86 lacks SUBADD and the negate variants of FMA with mixed plus minus so I did not add optabs or patterns for those but it would not be difficult if there's a target that has them. 2021-07-05 Richard Biener <rguenther@suse.de> * doc/md.texi (vec_fmaddsub<mode>4): Document. (vec_fmsubadd<mode>4): Likewise. * optabs.def (vec_fmaddsub$a4): Add. (vec_fmsubadd$a4): Likewise. * internal-fn.def (IFN_VEC_FMADDSUB): Add. (IFN_VEC_FMSUBADD): Likewise. * tree-vect-slp-patterns.c (addsub_pattern::recognize): Refactor to handle IFN_VEC_FMADDSUB and IFN_VEC_FMSUBADD. (addsub_pattern::build): Likewise. * tree-vect-slp.c (vect_optimize_slp): CFN_VEC_FMADDSUB and CFN_VEC_FMSUBADD are not transparent for permutes. * config/i386/sse.md (vec_fmaddsub<mode>4): New expander. (vec_fmsubadd<mode>4): Likewise. * gcc.target/i386/vect-fmaddsubXXXpd.c: New testcase. * gcc.target/i386/vect-fmaddsubXXXps.c: Likewise. * gcc.target/i386/vect-fmsubaddXXXpd.c: Likewise. * gcc.target/i386/vect-fmsubaddXXXps.c: Likewise. --- gcc/config/i386/sse.md | 19 ++ gcc/doc/md.texi | 14 ++ gcc/internal-fn.def | 3 +- gcc/optabs.def | 2 + .../gcc.target/i386/vect-fmaddsubXXXpd.c | 34 ++++ .../gcc.target/i386/vect-fmaddsubXXXps.c | 34 ++++ .../gcc.target/i386/vect-fmsubaddXXXpd.c | 34 ++++ .../gcc.target/i386/vect-fmsubaddXXXps.c | 34 ++++ gcc/tree-vect-slp-patterns.c | 192 +++++++++++++----- gcc/tree-vect-slp.c | 2 + 10 files changed, 311 insertions(+), 57 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index bcf1605d147..17c9e571d5d 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -4644,6 +4644,25 @@ ;; ;; But this doesn't seem useful in practice. +(define_expand "vec_fmaddsub<mode>4" + [(set (match_operand:VF 0 "register_operand") + (unspec:VF + [(match_operand:VF 1 "nonimmediate_operand") + (match_operand:VF 2 "nonimmediate_operand") + (match_operand:VF 3 "nonimmediate_operand")] + UNSPEC_FMADDSUB))] + "TARGET_FMA || TARGET_FMA4 || (<MODE_SIZE> == 64 || TARGET_AVX512VL)") + +(define_expand "vec_fmsubadd<mode>4" + [(set (match_operand:VF 0 "register_operand") + (unspec:VF + [(match_operand:VF 1 "nonimmediate_operand") + (match_operand:VF 2 "nonimmediate_operand") + (neg:VF + (match_operand:VF 3 "nonimmediate_operand"))] + UNSPEC_FMADDSUB))] + "TARGET_FMA || TARGET_FMA4 || (<MODE_SIZE> == 64 || TARGET_AVX512VL)") + (define_expand "fmaddsub_<mode>" [(set (match_operand:VF 0 "register_operand") (unspec:VF diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 1b918144330..cc92ebd26aa 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -5688,6 +5688,20 @@ Alternating subtract, add with even lanes doing subtract and odd lanes doing addition. Operands 1 and 2 and the outout operand are vectors with mode @var{m}. +@cindex @code{vec_fmaddsub@var{m}4} instruction pattern +@item @samp{vec_fmaddsub@var{m}4} +Alternating multiply subtract, add with even lanes doing subtract and odd +lanes doing addition of the third operand to the multiplication result +of the first two operands. Operands 1, 2 and 3 and the outout operand are vectors +with mode @var{m}. + +@cindex @code{vec_fmsubadd@var{m}4} instruction pattern +@item @samp{vec_fmsubadd@var{m}4} +Alternating multiply add, subtract with even lanes doing addition and odd +lanes doing subtraction of the third operand to the multiplication result +of the first two operands. Operands 1, 2 and 3 and the outout operand are vectors +with mode @var{m}. + These instructions are not allowed to @code{FAIL}. @cindex @code{mulhisi3} instruction pattern diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index c3b8e730960..a7003d5da8e 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -282,7 +282,8 @@ DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary) DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL, ECF_CONST, cmul, binary) DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL_CONJ, ECF_CONST, cmul_conj, binary) DEF_INTERNAL_OPTAB_FN (VEC_ADDSUB, ECF_CONST, vec_addsub, binary) - +DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, vec_fmaddsub, ternary) +DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd, ternary) /* FP scales. */ DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary) diff --git a/gcc/optabs.def b/gcc/optabs.def index 41ab2598eb6..51acc1be8f5 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -408,6 +408,8 @@ OPTAB_D (vec_widen_usubl_lo_optab, "vec_widen_usubl_lo_$a") OPTAB_D (vec_widen_uaddl_hi_optab, "vec_widen_uaddl_hi_$a") OPTAB_D (vec_widen_uaddl_lo_optab, "vec_widen_uaddl_lo_$a") OPTAB_D (vec_addsub_optab, "vec_addsub$a3") +OPTAB_D (vec_fmaddsub_optab, "vec_fmaddsub$a4") +OPTAB_D (vec_fmsubadd_optab, "vec_fmsubadd$a4") OPTAB_D (sync_add_optab, "sync_add$I$a") OPTAB_D (sync_and_optab, "sync_and$I$a") diff --git a/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c new file mode 100644 index 00000000000..b30d10731a7 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c @@ -0,0 +1,34 @@ +/* { dg-do run } */ +/* { dg-require-effective-target fma } */ +/* { dg-options "-O3 -mfma -save-temps" } */ + +#include "fma-check.h" + +void __attribute__((noipa)) +check_fmaddsub (double * __restrict a, double *b, double *c, int n) +{ + for (int i = 0; i < n; ++i) + { + a[2*i + 0] = b[2*i + 0] * c[2*i + 0] - a[2*i + 0]; + a[2*i + 1] = b[2*i + 1] * c[2*i + 1] + a[2*i + 1]; + } +} + +static void +fma_test (void) +{ + double a[4], b[4], c[4]; + for (int i = 0; i < 4; ++i) + { + a[i] = i; + b[i] = 3*i; + c[i] = 7*i; + } + check_fmaddsub (a, b, c, 2); + const double d[4] = { 0., 22., 82., 192. }; + for (int i = 0; i < 4; ++i) + if (a[i] != d[i]) + __builtin_abort (); +} + +/* { dg-final { scan-assembler "fmaddsub...pd" } } */ diff --git a/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c new file mode 100644 index 00000000000..cd2af8725a3 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c @@ -0,0 +1,34 @@ +/* { dg-do run } */ +/* { dg-require-effective-target fma } */ +/* { dg-options "-O3 -mfma -save-temps" } */ + +#include "fma-check.h" + +void __attribute__((noipa)) +check_fmaddsub (float * __restrict a, float *b, float *c, int n) +{ + for (int i = 0; i < n; ++i) + { + a[2*i + 0] = b[2*i + 0] * c[2*i + 0] - a[2*i + 0]; + a[2*i + 1] = b[2*i + 1] * c[2*i + 1] + a[2*i + 1]; + } +} + +static void +fma_test (void) +{ + float a[4], b[4], c[4]; + for (int i = 0; i < 4; ++i) + { + a[i] = i; + b[i] = 3*i; + c[i] = 7*i; + } + check_fmaddsub (a, b, c, 2); + const float d[4] = { 0., 22., 82., 192. }; + for (int i = 0; i < 4; ++i) + if (a[i] != d[i]) + __builtin_abort (); +} + +/* { dg-final { scan-assembler "fmaddsub...ps" } } */ diff --git a/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c new file mode 100644 index 00000000000..7ca2a275cc1 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c @@ -0,0 +1,34 @@ +/* { dg-do run } */ +/* { dg-require-effective-target fma } */ +/* { dg-options "-O3 -mfma -save-temps" } */ + +#include "fma-check.h" + +void __attribute__((noipa)) +check_fmsubadd (double * __restrict a, double *b, double *c, int n) +{ + for (int i = 0; i < n; ++i) + { + a[2*i + 0] = b[2*i + 0] * c[2*i + 0] + a[2*i + 0]; + a[2*i + 1] = b[2*i + 1] * c[2*i + 1] - a[2*i + 1]; + } +} + +static void +fma_test (void) +{ + double a[4], b[4], c[4]; + for (int i = 0; i < 4; ++i) + { + a[i] = i; + b[i] = 3*i; + c[i] = 7*i; + } + check_fmsubadd (a, b, c, 2); + const double d[4] = { 0., 20., 86., 186. }; + for (int i = 0; i < 4; ++i) + if (a[i] != d[i]) + __builtin_abort (); +} + +/* { dg-final { scan-assembler "fmsubadd...pd" } } */ diff --git a/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c new file mode 100644 index 00000000000..9ddd0e423db --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c @@ -0,0 +1,34 @@ +/* { dg-do run } */ +/* { dg-require-effective-target fma } */ +/* { dg-options "-O3 -mfma -save-temps" } */ + +#include "fma-check.h" + +void __attribute__((noipa)) +check_fmsubadd (float * __restrict a, float *b, float *c, int n) +{ + for (int i = 0; i < n; ++i) + { + a[2*i + 0] = b[2*i + 0] * c[2*i + 0] + a[2*i + 0]; + a[2*i + 1] = b[2*i + 1] * c[2*i + 1] - a[2*i + 1]; + } +} + +static void +fma_test (void) +{ + float a[4], b[4], c[4]; + for (int i = 0; i < 4; ++i) + { + a[i] = i; + b[i] = 3*i; + c[i] = 7*i; + } + check_fmsubadd (a, b, c, 2); + const float d[4] = { 0., 20., 86., 186. }; + for (int i = 0; i < 4; ++i) + if (a[i] != d[i]) + __builtin_abort (); +} + +/* { dg-final { scan-assembler "fmsubadd...ps" } } */ diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c index 2671f91972d..f774cac4a4d 100644 --- a/gcc/tree-vect-slp-patterns.c +++ b/gcc/tree-vect-slp-patterns.c @@ -1496,8 +1496,8 @@ complex_operations_pattern::build (vec_info * /* vinfo */) class addsub_pattern : public vect_pattern { public: - addsub_pattern (slp_tree *node) - : vect_pattern (node, NULL, IFN_VEC_ADDSUB) {}; + addsub_pattern (slp_tree *node, internal_fn ifn) + : vect_pattern (node, NULL, ifn) {}; void build (vec_info *); @@ -1510,46 +1510,68 @@ addsub_pattern::recognize (slp_tree_to_load_perm_map_t *, slp_tree *node_) { slp_tree node = *node_; if (SLP_TREE_CODE (node) != VEC_PERM_EXPR - || SLP_TREE_CHILDREN (node).length () != 2) + || SLP_TREE_CHILDREN (node).length () != 2 + || SLP_TREE_LANE_PERMUTATION (node).length () % 2) return NULL; /* Match a blend of a plus and a minus op with the same number of plus and minus lanes on the same operands. */ - slp_tree sub = SLP_TREE_CHILDREN (node)[0]; - slp_tree add = SLP_TREE_CHILDREN (node)[1]; - bool swapped_p = false; - if (vect_match_expression_p (sub, PLUS_EXPR)) - { - std::swap (add, sub); - swapped_p = true; - } - if (!(vect_match_expression_p (add, PLUS_EXPR) - && vect_match_expression_p (sub, MINUS_EXPR))) + unsigned l0 = SLP_TREE_LANE_PERMUTATION (node)[0].first; + unsigned l1 = SLP_TREE_LANE_PERMUTATION (node)[1].first; + if (l0 == l1) + return NULL; + bool l0add_p = vect_match_expression_p (SLP_TREE_CHILDREN (node)[l0], + PLUS_EXPR); + if (!l0add_p + && !vect_match_expression_p (SLP_TREE_CHILDREN (node)[l0], MINUS_EXPR)) + return NULL; + bool l1add_p = vect_match_expression_p (SLP_TREE_CHILDREN (node)[l1], + PLUS_EXPR); + if (!l1add_p + && !vect_match_expression_p (SLP_TREE_CHILDREN (node)[l1], MINUS_EXPR)) return NULL; - if (!((SLP_TREE_CHILDREN (sub)[0] == SLP_TREE_CHILDREN (add)[0] - && SLP_TREE_CHILDREN (sub)[1] == SLP_TREE_CHILDREN (add)[1]) - || (SLP_TREE_CHILDREN (sub)[0] == SLP_TREE_CHILDREN (add)[1] - && SLP_TREE_CHILDREN (sub)[1] == SLP_TREE_CHILDREN (add)[0]))) + + slp_tree l0node = SLP_TREE_CHILDREN (node)[l0]; + slp_tree l1node = SLP_TREE_CHILDREN (node)[l1]; + if (!((SLP_TREE_CHILDREN (l0node)[0] == SLP_TREE_CHILDREN (l1node)[0] + && SLP_TREE_CHILDREN (l0node)[1] == SLP_TREE_CHILDREN (l1node)[1]) + || (SLP_TREE_CHILDREN (l0node)[0] == SLP_TREE_CHILDREN (l1node)[1] + && SLP_TREE_CHILDREN (l0node)[1] == SLP_TREE_CHILDREN (l1node)[0]))) return NULL; for (unsigned i = 0; i < SLP_TREE_LANE_PERMUTATION (node).length (); ++i) { std::pair<unsigned, unsigned> perm = SLP_TREE_LANE_PERMUTATION (node)[i]; - if (swapped_p) - perm.first = perm.first == 0 ? 1 : 0; - /* It has to be alternating -, +, -, ... + /* It has to be alternating -, +, -, While we could permute the .ADDSUB inputs and the .ADDSUB output that's only profitable over the add + sub + blend if at least one of the permute is optimized which we can't determine here. */ - if (perm.first != (i & 1) + if (perm.first != ((i & 1) ? l1 : l0) || perm.second != i) return NULL; } - if (!vect_pattern_validate_optab (IFN_VEC_ADDSUB, node)) - return NULL; + /* Now we have either { -, +, -, + ... } (!l0add_p) or { +, -, +, - ... } + (l0add_p), see whether we have FMA variants. */ + if (!l0add_p + && vect_match_expression_p (SLP_TREE_CHILDREN (l0node)[0], MULT_EXPR)) + { + /* (c * d) -+ a */ + if (vect_pattern_validate_optab (IFN_VEC_FMADDSUB, node)) + return new addsub_pattern (node_, IFN_VEC_FMADDSUB); + } + else if (l0add_p + && vect_match_expression_p (SLP_TREE_CHILDREN (l1node)[0], MULT_EXPR)) + { + /* (c * d) +- a */ + if (vect_pattern_validate_optab (IFN_VEC_FMSUBADD, node)) + return new addsub_pattern (node_, IFN_VEC_FMSUBADD); + } - return new addsub_pattern (node_); + if (!l0add_p && vect_pattern_validate_optab (IFN_VEC_ADDSUB, node)) + return new addsub_pattern (node_, IFN_VEC_ADDSUB); + + return NULL; } void @@ -1557,38 +1579,96 @@ addsub_pattern::build (vec_info *vinfo) { slp_tree node = *m_node; - slp_tree sub = SLP_TREE_CHILDREN (node)[0]; - slp_tree add = SLP_TREE_CHILDREN (node)[1]; - if (vect_match_expression_p (sub, PLUS_EXPR)) - std::swap (add, sub); - - /* Modify the blend node in-place. */ - SLP_TREE_CHILDREN (node)[0] = SLP_TREE_CHILDREN (sub)[0]; - SLP_TREE_CHILDREN (node)[1] = SLP_TREE_CHILDREN (sub)[1]; - SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[0])++; - SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[1])++; - - /* Build IFN_VEC_ADDSUB from the sub representative operands. */ - stmt_vec_info rep = SLP_TREE_REPRESENTATIVE (sub); - gcall *call = gimple_build_call_internal (IFN_VEC_ADDSUB, 2, - gimple_assign_rhs1 (rep->stmt), - gimple_assign_rhs2 (rep->stmt)); - gimple_call_set_lhs (call, make_ssa_name - (TREE_TYPE (gimple_assign_lhs (rep->stmt)))); - gimple_call_set_nothrow (call, true); - gimple_set_bb (call, gimple_bb (rep->stmt)); - stmt_vec_info new_rep = vinfo->add_pattern_stmt (call, rep); - SLP_TREE_REPRESENTATIVE (node) = new_rep; - STMT_VINFO_RELEVANT (new_rep) = vect_used_in_scope; - STMT_SLP_TYPE (new_rep) = pure_slp; - STMT_VINFO_VECTYPE (new_rep) = SLP_TREE_VECTYPE (node); - STMT_VINFO_SLP_VECT_ONLY_PATTERN (new_rep) = true; - STMT_VINFO_REDUC_DEF (new_rep) = STMT_VINFO_REDUC_DEF (vect_orig_stmt (rep)); - SLP_TREE_CODE (node) = ERROR_MARK; - SLP_TREE_LANE_PERMUTATION (node).release (); - - vect_free_slp_tree (sub); - vect_free_slp_tree (add); + unsigned l0 = SLP_TREE_LANE_PERMUTATION (node)[0].first; + unsigned l1 = SLP_TREE_LANE_PERMUTATION (node)[1].first; + + switch (m_ifn) + { + case IFN_VEC_ADDSUB: + { + slp_tree sub = SLP_TREE_CHILDREN (node)[l0]; + slp_tree add = SLP_TREE_CHILDREN (node)[l1]; + + /* Modify the blend node in-place. */ + SLP_TREE_CHILDREN (node)[0] = SLP_TREE_CHILDREN (sub)[0]; + SLP_TREE_CHILDREN (node)[1] = SLP_TREE_CHILDREN (sub)[1]; + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[0])++; + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[1])++; + + /* Build IFN_VEC_ADDSUB from the sub representative operands. */ + stmt_vec_info rep = SLP_TREE_REPRESENTATIVE (sub); + gcall *call = gimple_build_call_internal (IFN_VEC_ADDSUB, 2, + gimple_assign_rhs1 (rep->stmt), + gimple_assign_rhs2 (rep->stmt)); + gimple_call_set_lhs (call, make_ssa_name + (TREE_TYPE (gimple_assign_lhs (rep->stmt)))); + gimple_call_set_nothrow (call, true); + gimple_set_bb (call, gimple_bb (rep->stmt)); + stmt_vec_info new_rep = vinfo->add_pattern_stmt (call, rep); + SLP_TREE_REPRESENTATIVE (node) = new_rep; + STMT_VINFO_RELEVANT (new_rep) = vect_used_in_scope; + STMT_SLP_TYPE (new_rep) = pure_slp; + STMT_VINFO_VECTYPE (new_rep) = SLP_TREE_VECTYPE (node); + STMT_VINFO_SLP_VECT_ONLY_PATTERN (new_rep) = true; + STMT_VINFO_REDUC_DEF (new_rep) = STMT_VINFO_REDUC_DEF (vect_orig_stmt (rep)); + SLP_TREE_CODE (node) = ERROR_MARK; + SLP_TREE_LANE_PERMUTATION (node).release (); + + vect_free_slp_tree (sub); + vect_free_slp_tree (add); + break; + } + case IFN_VEC_FMADDSUB: + case IFN_VEC_FMSUBADD: + { + slp_tree sub, add; + if (m_ifn == IFN_VEC_FMADDSUB) + { + sub = SLP_TREE_CHILDREN (node)[l0]; + add = SLP_TREE_CHILDREN (node)[l1]; + } + else /* m_ifn == IFN_VEC_FMSUBADD */ + { + sub = SLP_TREE_CHILDREN (node)[l1]; + add = SLP_TREE_CHILDREN (node)[l0]; + } + slp_tree mul = SLP_TREE_CHILDREN (sub)[0]; + /* Modify the blend node in-place. */ + SLP_TREE_CHILDREN (node).safe_grow (3, true); + SLP_TREE_CHILDREN (node)[0] = SLP_TREE_CHILDREN (mul)[0]; + SLP_TREE_CHILDREN (node)[1] = SLP_TREE_CHILDREN (mul)[1]; + SLP_TREE_CHILDREN (node)[2] = SLP_TREE_CHILDREN (sub)[1]; + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[0])++; + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[1])++; + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[2])++; + + /* Build IFN_VEC_FMADDSUB from the mul/sub representative operands. */ + stmt_vec_info srep = SLP_TREE_REPRESENTATIVE (sub); + stmt_vec_info mrep = SLP_TREE_REPRESENTATIVE (mul); + gcall *call = gimple_build_call_internal (m_ifn, 3, + gimple_assign_rhs1 (mrep->stmt), + gimple_assign_rhs2 (mrep->stmt), + gimple_assign_rhs2 (srep->stmt)); + gimple_call_set_lhs (call, make_ssa_name + (TREE_TYPE (gimple_assign_lhs (srep->stmt)))); + gimple_call_set_nothrow (call, true); + gimple_set_bb (call, gimple_bb (srep->stmt)); + stmt_vec_info new_rep = vinfo->add_pattern_stmt (call, srep); + SLP_TREE_REPRESENTATIVE (node) = new_rep; + STMT_VINFO_RELEVANT (new_rep) = vect_used_in_scope; + STMT_SLP_TYPE (new_rep) = pure_slp; + STMT_VINFO_VECTYPE (new_rep) = SLP_TREE_VECTYPE (node); + STMT_VINFO_SLP_VECT_ONLY_PATTERN (new_rep) = true; + STMT_VINFO_REDUC_DEF (new_rep) = STMT_VINFO_REDUC_DEF (vect_orig_stmt (srep)); + SLP_TREE_CODE (node) = ERROR_MARK; + SLP_TREE_LANE_PERMUTATION (node).release (); + + vect_free_slp_tree (sub); + vect_free_slp_tree (add); + break; + } + default:; + } } /******************************************************************************* diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c index f08797c2bc0..5357cd0e7a4 100644 --- a/gcc/tree-vect-slp.c +++ b/gcc/tree-vect-slp.c @@ -3728,6 +3728,8 @@ vect_optimize_slp (vec_info *vinfo) case CFN_COMPLEX_MUL: case CFN_COMPLEX_MUL_CONJ: case CFN_VEC_ADDSUB: + case CFN_VEC_FMADDSUB: + case CFN_VEC_FMSUBADD: vertices[idx].perm_in = 0; vertices[idx].perm_out = 0; default:; -- 2.26.2 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Add FMADDSUB and FMSUBADD SLP vectorization patterns and optabs 2021-07-06 7:42 ` Richard Biener @ 2021-07-06 8:29 ` Hongtao Liu 2021-07-07 7:30 ` Hongtao Liu 0 siblings, 1 reply; 8+ messages in thread From: Hongtao Liu @ 2021-07-06 8:29 UTC (permalink / raw) To: Richard Biener; +Cc: GCC Patches, Liu, Hongtao On Tue, Jul 6, 2021 at 3:42 PM Richard Biener <rguenther@suse.de> wrote: > > On Tue, 6 Jul 2021, Hongtao Liu wrote: > > > On Mon, Jul 5, 2021 at 10:09 PM Richard Biener <rguenther@suse.de> wrote: > > > > > > This adds named expanders for vec_fmaddsub<mode>4 and > > > vec_fmsubadd<mode>4 which map to x86 vfmaddsubXXXp{ds} and > > > vfmsubaddXXXp{ds} instructions. This complements the previous > > > addition of ADDSUB support. > > > > > > x86 lacks SUBADD and the negate variants of FMA with mixed > > > plus minus so I did not add optabs or patterns for those but > > > it would not be difficult if there's a target that has them. > > > Maybe one of the complex fma patterns match those variants? > > > > > > I did not dare to rewrite the numerous patterns to the new > > > canonical name but instead added two new expanders. Note I > > > did not cover AVX512 since the existing patterns are separated > > > and I have no easy way to test things there. Handling AVX512 > > > should be easy as followup though. > > > > > > Bootstrap and testing on x86_64-unknown-linux-gnu in progress. > > > > > > Any comments? > > > > > > Thanks, > > > Richard. > > > > > > 2021-07-05 Richard Biener <rguenther@suse.de> > > > > > > * doc/md.texi (vec_fmaddsub<mode>4): Document. > > > (vec_fmsubadd<mode>4): Likewise. > > > * optabs.def (vec_fmaddsub$a4): Add. > > > (vec_fmsubadd$a4): Likewise. > > > * internal-fn.def (IFN_VEC_FMADDSUB): Add. > > > (IFN_VEC_FMSUBADD): Likewise. > > > * tree-vect-slp-patterns.c (addsub_pattern::recognize): > > > Refactor to handle IFN_VEC_FMADDSUB and IFN_VEC_FMSUBADD. > > > (addsub_pattern::build): Likewise. > > > * tree-vect-slp.c (vect_optimize_slp): CFN_VEC_FMADDSUB > > > and CFN_VEC_FMSUBADD are not transparent for permutes. > > > * config/i386/sse.md (vec_fmaddsub<mode>4): New expander. > > > (vec_fmsubadd<mode>4): Likewise. > > > > > > * gcc.target/i386/vect-fmaddsubXXXpd.c: New testcase. > > > * gcc.target/i386/vect-fmaddsubXXXps.c: Likewise. > > > * gcc.target/i386/vect-fmsubaddXXXpd.c: Likewise. > > > * gcc.target/i386/vect-fmsubaddXXXps.c: Likewise. > > > --- > > > gcc/config/i386/sse.md | 19 ++ > > > gcc/doc/md.texi | 14 ++ > > > gcc/internal-fn.def | 3 +- > > > gcc/optabs.def | 2 + > > > .../gcc.target/i386/vect-fmaddsubXXXpd.c | 34 ++++ > > > .../gcc.target/i386/vect-fmaddsubXXXps.c | 34 ++++ > > > .../gcc.target/i386/vect-fmsubaddXXXpd.c | 34 ++++ > > > .../gcc.target/i386/vect-fmsubaddXXXps.c | 34 ++++ > > > gcc/tree-vect-slp-patterns.c | 192 +++++++++++++----- > > > gcc/tree-vect-slp.c | 2 + > > > 10 files changed, 311 insertions(+), 57 deletions(-) > > > create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c > > > create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c > > > create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c > > > create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c > > > > > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md > > > index bcf1605d147..6fc13c184bf 100644 > > > --- a/gcc/config/i386/sse.md > > > +++ b/gcc/config/i386/sse.md > > > @@ -4644,6 +4644,25 @@ > > > ;; > > > ;; But this doesn't seem useful in practice. > > > > > > +(define_expand "vec_fmaddsub<mode>4" > > > + [(set (match_operand:VF 0 "register_operand") > > > + (unspec:VF > > > + [(match_operand:VF 1 "nonimmediate_operand") > > > + (match_operand:VF 2 "nonimmediate_operand") > > > + (match_operand:VF 3 "nonimmediate_operand")] > > > + UNSPEC_FMADDSUB))] > > > + "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F") > > > + > > > +(define_expand "vec_fmsubadd<mode>4" > > > + [(set (match_operand:VF 0 "register_operand") > > > + (unspec:VF > > > + [(match_operand:VF 1 "nonimmediate_operand") > > > + (match_operand:VF 2 "nonimmediate_operand") > > > + (neg:VF > > > + (match_operand:VF 3 "nonimmediate_operand"))] > > > + UNSPEC_FMADDSUB))] > > > + "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F") > > > + > > > > W/ condition like > > "TARGET_FMA || TARGET_FMA4 > > || (<MODE_SIZE> == 64 || TARGET_AVX512VL)“? > > > > the original expander "fmaddsub_<mode>" is only used by builtins which > > have it's own guard for AVX512VL, It doesn't matter if it doesn't have > > TARGET_AVX512VL > > BDESC (OPTION_MASK_ISA_AVX512VL, 0, > > CODE_FOR_avx512vl_fmaddsub_v4df_mask, > > "__builtin_ia32_vfmaddsubpd256_mask", > > IX86_BUILTIN_VFMADDSUBPD256_MASK, UNKNOWN, (int) > > V4DF_FTYPE_V4DF_V4DF_V4DF_UQI) > > OK, that seems to work! > > Bootstrapped and tested on x86_64-unknown-linux-gnu - are the > x86 backend changes OK? > Yes, LGTM. > Thanks, > Richard. > > > This adds named expanders for vec_fmaddsub<mode>4 and > vec_fmsubadd<mode>4 which map to x86 vfmaddsubXXXp{ds} and > vfmsubaddXXXp{ds} instructions. This complements the previous > addition of ADDSUB support. > > x86 lacks SUBADD and the negate variants of FMA with mixed > plus minus so I did not add optabs or patterns for those but > it would not be difficult if there's a target that has them. > > 2021-07-05 Richard Biener <rguenther@suse.de> > > * doc/md.texi (vec_fmaddsub<mode>4): Document. > (vec_fmsubadd<mode>4): Likewise. > * optabs.def (vec_fmaddsub$a4): Add. > (vec_fmsubadd$a4): Likewise. > * internal-fn.def (IFN_VEC_FMADDSUB): Add. > (IFN_VEC_FMSUBADD): Likewise. > * tree-vect-slp-patterns.c (addsub_pattern::recognize): > Refactor to handle IFN_VEC_FMADDSUB and IFN_VEC_FMSUBADD. > (addsub_pattern::build): Likewise. > * tree-vect-slp.c (vect_optimize_slp): CFN_VEC_FMADDSUB > and CFN_VEC_FMSUBADD are not transparent for permutes. > * config/i386/sse.md (vec_fmaddsub<mode>4): New expander. > (vec_fmsubadd<mode>4): Likewise. > > * gcc.target/i386/vect-fmaddsubXXXpd.c: New testcase. > * gcc.target/i386/vect-fmaddsubXXXps.c: Likewise. > * gcc.target/i386/vect-fmsubaddXXXpd.c: Likewise. > * gcc.target/i386/vect-fmsubaddXXXps.c: Likewise. > --- > gcc/config/i386/sse.md | 19 ++ > gcc/doc/md.texi | 14 ++ > gcc/internal-fn.def | 3 +- > gcc/optabs.def | 2 + > .../gcc.target/i386/vect-fmaddsubXXXpd.c | 34 ++++ > .../gcc.target/i386/vect-fmaddsubXXXps.c | 34 ++++ > .../gcc.target/i386/vect-fmsubaddXXXpd.c | 34 ++++ > .../gcc.target/i386/vect-fmsubaddXXXps.c | 34 ++++ > gcc/tree-vect-slp-patterns.c | 192 +++++++++++++----- > gcc/tree-vect-slp.c | 2 + > 10 files changed, 311 insertions(+), 57 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c > create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c > create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c > create mode 100644 gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md > index bcf1605d147..17c9e571d5d 100644 > --- a/gcc/config/i386/sse.md > +++ b/gcc/config/i386/sse.md > @@ -4644,6 +4644,25 @@ > ;; > ;; But this doesn't seem useful in practice. > > +(define_expand "vec_fmaddsub<mode>4" > + [(set (match_operand:VF 0 "register_operand") > + (unspec:VF > + [(match_operand:VF 1 "nonimmediate_operand") > + (match_operand:VF 2 "nonimmediate_operand") > + (match_operand:VF 3 "nonimmediate_operand")] > + UNSPEC_FMADDSUB))] > + "TARGET_FMA || TARGET_FMA4 || (<MODE_SIZE> == 64 || TARGET_AVX512VL)") > + > +(define_expand "vec_fmsubadd<mode>4" > + [(set (match_operand:VF 0 "register_operand") > + (unspec:VF > + [(match_operand:VF 1 "nonimmediate_operand") > + (match_operand:VF 2 "nonimmediate_operand") > + (neg:VF > + (match_operand:VF 3 "nonimmediate_operand"))] > + UNSPEC_FMADDSUB))] > + "TARGET_FMA || TARGET_FMA4 || (<MODE_SIZE> == 64 || TARGET_AVX512VL)") > + > (define_expand "fmaddsub_<mode>" > [(set (match_operand:VF 0 "register_operand") > (unspec:VF > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi > index 1b918144330..cc92ebd26aa 100644 > --- a/gcc/doc/md.texi > +++ b/gcc/doc/md.texi > @@ -5688,6 +5688,20 @@ Alternating subtract, add with even lanes doing subtract and odd > lanes doing addition. Operands 1 and 2 and the outout operand are vectors > with mode @var{m}. > > +@cindex @code{vec_fmaddsub@var{m}4} instruction pattern > +@item @samp{vec_fmaddsub@var{m}4} > +Alternating multiply subtract, add with even lanes doing subtract and odd > +lanes doing addition of the third operand to the multiplication result > +of the first two operands. Operands 1, 2 and 3 and the outout operand are vectors > +with mode @var{m}. > + > +@cindex @code{vec_fmsubadd@var{m}4} instruction pattern > +@item @samp{vec_fmsubadd@var{m}4} > +Alternating multiply add, subtract with even lanes doing addition and odd > +lanes doing subtraction of the third operand to the multiplication result > +of the first two operands. Operands 1, 2 and 3 and the outout operand are vectors > +with mode @var{m}. > + > These instructions are not allowed to @code{FAIL}. > > @cindex @code{mulhisi3} instruction pattern > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def > index c3b8e730960..a7003d5da8e 100644 > --- a/gcc/internal-fn.def > +++ b/gcc/internal-fn.def > @@ -282,7 +282,8 @@ DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary) > DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL, ECF_CONST, cmul, binary) > DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL_CONJ, ECF_CONST, cmul_conj, binary) > DEF_INTERNAL_OPTAB_FN (VEC_ADDSUB, ECF_CONST, vec_addsub, binary) > - > +DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, vec_fmaddsub, ternary) > +DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd, ternary) > > /* FP scales. */ > DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary) > diff --git a/gcc/optabs.def b/gcc/optabs.def > index 41ab2598eb6..51acc1be8f5 100644 > --- a/gcc/optabs.def > +++ b/gcc/optabs.def > @@ -408,6 +408,8 @@ OPTAB_D (vec_widen_usubl_lo_optab, "vec_widen_usubl_lo_$a") > OPTAB_D (vec_widen_uaddl_hi_optab, "vec_widen_uaddl_hi_$a") > OPTAB_D (vec_widen_uaddl_lo_optab, "vec_widen_uaddl_lo_$a") > OPTAB_D (vec_addsub_optab, "vec_addsub$a3") > +OPTAB_D (vec_fmaddsub_optab, "vec_fmaddsub$a4") > +OPTAB_D (vec_fmsubadd_optab, "vec_fmsubadd$a4") > > OPTAB_D (sync_add_optab, "sync_add$I$a") > OPTAB_D (sync_and_optab, "sync_and$I$a") > diff --git a/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c > new file mode 100644 > index 00000000000..b30d10731a7 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXpd.c > @@ -0,0 +1,34 @@ > +/* { dg-do run } */ > +/* { dg-require-effective-target fma } */ > +/* { dg-options "-O3 -mfma -save-temps" } */ > + > +#include "fma-check.h" > + > +void __attribute__((noipa)) > +check_fmaddsub (double * __restrict a, double *b, double *c, int n) > +{ > + for (int i = 0; i < n; ++i) > + { > + a[2*i + 0] = b[2*i + 0] * c[2*i + 0] - a[2*i + 0]; > + a[2*i + 1] = b[2*i + 1] * c[2*i + 1] + a[2*i + 1]; > + } > +} > + > +static void > +fma_test (void) > +{ > + double a[4], b[4], c[4]; > + for (int i = 0; i < 4; ++i) > + { > + a[i] = i; > + b[i] = 3*i; > + c[i] = 7*i; > + } > + check_fmaddsub (a, b, c, 2); > + const double d[4] = { 0., 22., 82., 192. }; > + for (int i = 0; i < 4; ++i) > + if (a[i] != d[i]) > + __builtin_abort (); > +} > + > +/* { dg-final { scan-assembler "fmaddsub...pd" } } */ > diff --git a/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c > new file mode 100644 > index 00000000000..cd2af8725a3 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/vect-fmaddsubXXXps.c > @@ -0,0 +1,34 @@ > +/* { dg-do run } */ > +/* { dg-require-effective-target fma } */ > +/* { dg-options "-O3 -mfma -save-temps" } */ > + > +#include "fma-check.h" > + > +void __attribute__((noipa)) > +check_fmaddsub (float * __restrict a, float *b, float *c, int n) > +{ > + for (int i = 0; i < n; ++i) > + { > + a[2*i + 0] = b[2*i + 0] * c[2*i + 0] - a[2*i + 0]; > + a[2*i + 1] = b[2*i + 1] * c[2*i + 1] + a[2*i + 1]; > + } > +} > + > +static void > +fma_test (void) > +{ > + float a[4], b[4], c[4]; > + for (int i = 0; i < 4; ++i) > + { > + a[i] = i; > + b[i] = 3*i; > + c[i] = 7*i; > + } > + check_fmaddsub (a, b, c, 2); > + const float d[4] = { 0., 22., 82., 192. }; > + for (int i = 0; i < 4; ++i) > + if (a[i] != d[i]) > + __builtin_abort (); > +} > + > +/* { dg-final { scan-assembler "fmaddsub...ps" } } */ > diff --git a/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c > new file mode 100644 > index 00000000000..7ca2a275cc1 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXpd.c > @@ -0,0 +1,34 @@ > +/* { dg-do run } */ > +/* { dg-require-effective-target fma } */ > +/* { dg-options "-O3 -mfma -save-temps" } */ > + > +#include "fma-check.h" > + > +void __attribute__((noipa)) > +check_fmsubadd (double * __restrict a, double *b, double *c, int n) > +{ > + for (int i = 0; i < n; ++i) > + { > + a[2*i + 0] = b[2*i + 0] * c[2*i + 0] + a[2*i + 0]; > + a[2*i + 1] = b[2*i + 1] * c[2*i + 1] - a[2*i + 1]; > + } > +} > + > +static void > +fma_test (void) > +{ > + double a[4], b[4], c[4]; > + for (int i = 0; i < 4; ++i) > + { > + a[i] = i; > + b[i] = 3*i; > + c[i] = 7*i; > + } > + check_fmsubadd (a, b, c, 2); > + const double d[4] = { 0., 20., 86., 186. }; > + for (int i = 0; i < 4; ++i) > + if (a[i] != d[i]) > + __builtin_abort (); > +} > + > +/* { dg-final { scan-assembler "fmsubadd...pd" } } */ > diff --git a/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c > new file mode 100644 > index 00000000000..9ddd0e423db > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/vect-fmsubaddXXXps.c > @@ -0,0 +1,34 @@ > +/* { dg-do run } */ > +/* { dg-require-effective-target fma } */ > +/* { dg-options "-O3 -mfma -save-temps" } */ > + > +#include "fma-check.h" > + > +void __attribute__((noipa)) > +check_fmsubadd (float * __restrict a, float *b, float *c, int n) > +{ > + for (int i = 0; i < n; ++i) > + { > + a[2*i + 0] = b[2*i + 0] * c[2*i + 0] + a[2*i + 0]; > + a[2*i + 1] = b[2*i + 1] * c[2*i + 1] - a[2*i + 1]; > + } > +} > + > +static void > +fma_test (void) > +{ > + float a[4], b[4], c[4]; > + for (int i = 0; i < 4; ++i) > + { > + a[i] = i; > + b[i] = 3*i; > + c[i] = 7*i; > + } > + check_fmsubadd (a, b, c, 2); > + const float d[4] = { 0., 20., 86., 186. }; > + for (int i = 0; i < 4; ++i) > + if (a[i] != d[i]) > + __builtin_abort (); > +} > + > +/* { dg-final { scan-assembler "fmsubadd...ps" } } */ > diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c > index 2671f91972d..f774cac4a4d 100644 > --- a/gcc/tree-vect-slp-patterns.c > +++ b/gcc/tree-vect-slp-patterns.c > @@ -1496,8 +1496,8 @@ complex_operations_pattern::build (vec_info * /* vinfo */) > class addsub_pattern : public vect_pattern > { > public: > - addsub_pattern (slp_tree *node) > - : vect_pattern (node, NULL, IFN_VEC_ADDSUB) {}; > + addsub_pattern (slp_tree *node, internal_fn ifn) > + : vect_pattern (node, NULL, ifn) {}; > > void build (vec_info *); > > @@ -1510,46 +1510,68 @@ addsub_pattern::recognize (slp_tree_to_load_perm_map_t *, slp_tree *node_) > { > slp_tree node = *node_; > if (SLP_TREE_CODE (node) != VEC_PERM_EXPR > - || SLP_TREE_CHILDREN (node).length () != 2) > + || SLP_TREE_CHILDREN (node).length () != 2 > + || SLP_TREE_LANE_PERMUTATION (node).length () % 2) > return NULL; > > /* Match a blend of a plus and a minus op with the same number of plus and > minus lanes on the same operands. */ > - slp_tree sub = SLP_TREE_CHILDREN (node)[0]; > - slp_tree add = SLP_TREE_CHILDREN (node)[1]; > - bool swapped_p = false; > - if (vect_match_expression_p (sub, PLUS_EXPR)) > - { > - std::swap (add, sub); > - swapped_p = true; > - } > - if (!(vect_match_expression_p (add, PLUS_EXPR) > - && vect_match_expression_p (sub, MINUS_EXPR))) > + unsigned l0 = SLP_TREE_LANE_PERMUTATION (node)[0].first; > + unsigned l1 = SLP_TREE_LANE_PERMUTATION (node)[1].first; > + if (l0 == l1) > + return NULL; > + bool l0add_p = vect_match_expression_p (SLP_TREE_CHILDREN (node)[l0], > + PLUS_EXPR); > + if (!l0add_p > + && !vect_match_expression_p (SLP_TREE_CHILDREN (node)[l0], MINUS_EXPR)) > + return NULL; > + bool l1add_p = vect_match_expression_p (SLP_TREE_CHILDREN (node)[l1], > + PLUS_EXPR); > + if (!l1add_p > + && !vect_match_expression_p (SLP_TREE_CHILDREN (node)[l1], MINUS_EXPR)) > return NULL; > - if (!((SLP_TREE_CHILDREN (sub)[0] == SLP_TREE_CHILDREN (add)[0] > - && SLP_TREE_CHILDREN (sub)[1] == SLP_TREE_CHILDREN (add)[1]) > - || (SLP_TREE_CHILDREN (sub)[0] == SLP_TREE_CHILDREN (add)[1] > - && SLP_TREE_CHILDREN (sub)[1] == SLP_TREE_CHILDREN (add)[0]))) > + > + slp_tree l0node = SLP_TREE_CHILDREN (node)[l0]; > + slp_tree l1node = SLP_TREE_CHILDREN (node)[l1]; > + if (!((SLP_TREE_CHILDREN (l0node)[0] == SLP_TREE_CHILDREN (l1node)[0] > + && SLP_TREE_CHILDREN (l0node)[1] == SLP_TREE_CHILDREN (l1node)[1]) > + || (SLP_TREE_CHILDREN (l0node)[0] == SLP_TREE_CHILDREN (l1node)[1] > + && SLP_TREE_CHILDREN (l0node)[1] == SLP_TREE_CHILDREN (l1node)[0]))) > return NULL; > > for (unsigned i = 0; i < SLP_TREE_LANE_PERMUTATION (node).length (); ++i) > { > std::pair<unsigned, unsigned> perm = SLP_TREE_LANE_PERMUTATION (node)[i]; > - if (swapped_p) > - perm.first = perm.first == 0 ? 1 : 0; > - /* It has to be alternating -, +, -, ... > + /* It has to be alternating -, +, -, > While we could permute the .ADDSUB inputs and the .ADDSUB output > that's only profitable over the add + sub + blend if at least > one of the permute is optimized which we can't determine here. */ > - if (perm.first != (i & 1) > + if (perm.first != ((i & 1) ? l1 : l0) > || perm.second != i) > return NULL; > } > > - if (!vect_pattern_validate_optab (IFN_VEC_ADDSUB, node)) > - return NULL; > + /* Now we have either { -, +, -, + ... } (!l0add_p) or { +, -, +, - ... } > + (l0add_p), see whether we have FMA variants. */ > + if (!l0add_p > + && vect_match_expression_p (SLP_TREE_CHILDREN (l0node)[0], MULT_EXPR)) > + { > + /* (c * d) -+ a */ > + if (vect_pattern_validate_optab (IFN_VEC_FMADDSUB, node)) > + return new addsub_pattern (node_, IFN_VEC_FMADDSUB); > + } > + else if (l0add_p > + && vect_match_expression_p (SLP_TREE_CHILDREN (l1node)[0], MULT_EXPR)) > + { > + /* (c * d) +- a */ > + if (vect_pattern_validate_optab (IFN_VEC_FMSUBADD, node)) > + return new addsub_pattern (node_, IFN_VEC_FMSUBADD); > + } > > - return new addsub_pattern (node_); > + if (!l0add_p && vect_pattern_validate_optab (IFN_VEC_ADDSUB, node)) > + return new addsub_pattern (node_, IFN_VEC_ADDSUB); > + > + return NULL; > } > > void > @@ -1557,38 +1579,96 @@ addsub_pattern::build (vec_info *vinfo) > { > slp_tree node = *m_node; > > - slp_tree sub = SLP_TREE_CHILDREN (node)[0]; > - slp_tree add = SLP_TREE_CHILDREN (node)[1]; > - if (vect_match_expression_p (sub, PLUS_EXPR)) > - std::swap (add, sub); > - > - /* Modify the blend node in-place. */ > - SLP_TREE_CHILDREN (node)[0] = SLP_TREE_CHILDREN (sub)[0]; > - SLP_TREE_CHILDREN (node)[1] = SLP_TREE_CHILDREN (sub)[1]; > - SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[0])++; > - SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[1])++; > - > - /* Build IFN_VEC_ADDSUB from the sub representative operands. */ > - stmt_vec_info rep = SLP_TREE_REPRESENTATIVE (sub); > - gcall *call = gimple_build_call_internal (IFN_VEC_ADDSUB, 2, > - gimple_assign_rhs1 (rep->stmt), > - gimple_assign_rhs2 (rep->stmt)); > - gimple_call_set_lhs (call, make_ssa_name > - (TREE_TYPE (gimple_assign_lhs (rep->stmt)))); > - gimple_call_set_nothrow (call, true); > - gimple_set_bb (call, gimple_bb (rep->stmt)); > - stmt_vec_info new_rep = vinfo->add_pattern_stmt (call, rep); > - SLP_TREE_REPRESENTATIVE (node) = new_rep; > - STMT_VINFO_RELEVANT (new_rep) = vect_used_in_scope; > - STMT_SLP_TYPE (new_rep) = pure_slp; > - STMT_VINFO_VECTYPE (new_rep) = SLP_TREE_VECTYPE (node); > - STMT_VINFO_SLP_VECT_ONLY_PATTERN (new_rep) = true; > - STMT_VINFO_REDUC_DEF (new_rep) = STMT_VINFO_REDUC_DEF (vect_orig_stmt (rep)); > - SLP_TREE_CODE (node) = ERROR_MARK; > - SLP_TREE_LANE_PERMUTATION (node).release (); > - > - vect_free_slp_tree (sub); > - vect_free_slp_tree (add); > + unsigned l0 = SLP_TREE_LANE_PERMUTATION (node)[0].first; > + unsigned l1 = SLP_TREE_LANE_PERMUTATION (node)[1].first; > + > + switch (m_ifn) > + { > + case IFN_VEC_ADDSUB: > + { > + slp_tree sub = SLP_TREE_CHILDREN (node)[l0]; > + slp_tree add = SLP_TREE_CHILDREN (node)[l1]; > + > + /* Modify the blend node in-place. */ > + SLP_TREE_CHILDREN (node)[0] = SLP_TREE_CHILDREN (sub)[0]; > + SLP_TREE_CHILDREN (node)[1] = SLP_TREE_CHILDREN (sub)[1]; > + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[0])++; > + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[1])++; > + > + /* Build IFN_VEC_ADDSUB from the sub representative operands. */ > + stmt_vec_info rep = SLP_TREE_REPRESENTATIVE (sub); > + gcall *call = gimple_build_call_internal (IFN_VEC_ADDSUB, 2, > + gimple_assign_rhs1 (rep->stmt), > + gimple_assign_rhs2 (rep->stmt)); > + gimple_call_set_lhs (call, make_ssa_name > + (TREE_TYPE (gimple_assign_lhs (rep->stmt)))); > + gimple_call_set_nothrow (call, true); > + gimple_set_bb (call, gimple_bb (rep->stmt)); > + stmt_vec_info new_rep = vinfo->add_pattern_stmt (call, rep); > + SLP_TREE_REPRESENTATIVE (node) = new_rep; > + STMT_VINFO_RELEVANT (new_rep) = vect_used_in_scope; > + STMT_SLP_TYPE (new_rep) = pure_slp; > + STMT_VINFO_VECTYPE (new_rep) = SLP_TREE_VECTYPE (node); > + STMT_VINFO_SLP_VECT_ONLY_PATTERN (new_rep) = true; > + STMT_VINFO_REDUC_DEF (new_rep) = STMT_VINFO_REDUC_DEF (vect_orig_stmt (rep)); > + SLP_TREE_CODE (node) = ERROR_MARK; > + SLP_TREE_LANE_PERMUTATION (node).release (); > + > + vect_free_slp_tree (sub); > + vect_free_slp_tree (add); > + break; > + } > + case IFN_VEC_FMADDSUB: > + case IFN_VEC_FMSUBADD: > + { > + slp_tree sub, add; > + if (m_ifn == IFN_VEC_FMADDSUB) > + { > + sub = SLP_TREE_CHILDREN (node)[l0]; > + add = SLP_TREE_CHILDREN (node)[l1]; > + } > + else /* m_ifn == IFN_VEC_FMSUBADD */ > + { > + sub = SLP_TREE_CHILDREN (node)[l1]; > + add = SLP_TREE_CHILDREN (node)[l0]; > + } > + slp_tree mul = SLP_TREE_CHILDREN (sub)[0]; > + /* Modify the blend node in-place. */ > + SLP_TREE_CHILDREN (node).safe_grow (3, true); > + SLP_TREE_CHILDREN (node)[0] = SLP_TREE_CHILDREN (mul)[0]; > + SLP_TREE_CHILDREN (node)[1] = SLP_TREE_CHILDREN (mul)[1]; > + SLP_TREE_CHILDREN (node)[2] = SLP_TREE_CHILDREN (sub)[1]; > + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[0])++; > + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[1])++; > + SLP_TREE_REF_COUNT (SLP_TREE_CHILDREN (node)[2])++; > + > + /* Build IFN_VEC_FMADDSUB from the mul/sub representative operands. */ > + stmt_vec_info srep = SLP_TREE_REPRESENTATIVE (sub); > + stmt_vec_info mrep = SLP_TREE_REPRESENTATIVE (mul); > + gcall *call = gimple_build_call_internal (m_ifn, 3, > + gimple_assign_rhs1 (mrep->stmt), > + gimple_assign_rhs2 (mrep->stmt), > + gimple_assign_rhs2 (srep->stmt)); > + gimple_call_set_lhs (call, make_ssa_name > + (TREE_TYPE (gimple_assign_lhs (srep->stmt)))); > + gimple_call_set_nothrow (call, true); > + gimple_set_bb (call, gimple_bb (srep->stmt)); > + stmt_vec_info new_rep = vinfo->add_pattern_stmt (call, srep); > + SLP_TREE_REPRESENTATIVE (node) = new_rep; > + STMT_VINFO_RELEVANT (new_rep) = vect_used_in_scope; > + STMT_SLP_TYPE (new_rep) = pure_slp; > + STMT_VINFO_VECTYPE (new_rep) = SLP_TREE_VECTYPE (node); > + STMT_VINFO_SLP_VECT_ONLY_PATTERN (new_rep) = true; > + STMT_VINFO_REDUC_DEF (new_rep) = STMT_VINFO_REDUC_DEF (vect_orig_stmt (srep)); > + SLP_TREE_CODE (node) = ERROR_MARK; > + SLP_TREE_LANE_PERMUTATION (node).release (); > + > + vect_free_slp_tree (sub); > + vect_free_slp_tree (add); > + break; > + } > + default:; > + } > } > > /******************************************************************************* > diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c > index f08797c2bc0..5357cd0e7a4 100644 > --- a/gcc/tree-vect-slp.c > +++ b/gcc/tree-vect-slp.c > @@ -3728,6 +3728,8 @@ vect_optimize_slp (vec_info *vinfo) > case CFN_COMPLEX_MUL: > case CFN_COMPLEX_MUL_CONJ: > case CFN_VEC_ADDSUB: > + case CFN_VEC_FMADDSUB: > + case CFN_VEC_FMSUBADD: > vertices[idx].perm_in = 0; > vertices[idx].perm_out = 0; > default:; > -- > 2.26.2 -- BR, Hongtao ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Add FMADDSUB and FMSUBADD SLP vectorization patterns and optabs 2021-07-06 8:29 ` Hongtao Liu @ 2021-07-07 7:30 ` Hongtao Liu 2021-07-07 7:57 ` Richard Biener 0 siblings, 1 reply; 8+ messages in thread From: Hongtao Liu @ 2021-07-07 7:30 UTC (permalink / raw) To: Richard Biener; +Cc: GCC Patches, Liu, Hongtao [-- Attachment #1: Type: text/plain, Size: 227 bytes --] > > > > and I have no easy way to test things there. Handling AVX512 > > > > should be easy as followup though. Here's the patch adding avx512f tests for FMADDSUB/FMSUBADD slp patterns. Pushed to the trunk. -- BR, Hongtao [-- Attachment #2: 0001-i386-Add-avx512-tests-for-MADDSUB-and-FMSUBADD-SLP-v.patch --] [-- Type: text/x-patch, Size: 8576 bytes --] From 2dc666974cca3a62686f4d7135ca36c25d61a802 Mon Sep 17 00:00:00 2001 From: liuhongt <hongtao.liu@intel.com> Date: Wed, 7 Jul 2021 15:19:42 +0800 Subject: [PATCH] [i386] Add avx512 tests for MADDSUB and FMSUBADD SLP vectorization patterns. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512f-vect-fmaddsubXXXpd.c: New test. * gcc.target/i386/avx512f-vect-fmaddsubXXXps.c: New test. * gcc.target/i386/avx512f-vect-fmsubaddXXXpd.c: New test. * gcc.target/i386/avx512f-vect-fmsubaddXXXps.c: New test. --- .../i386/avx512f-vect-fmaddsubXXXpd.c | 41 +++++++++++++++ .../i386/avx512f-vect-fmaddsubXXXps.c | 50 +++++++++++++++++++ .../i386/avx512f-vect-fmsubaddXXXpd.c | 41 +++++++++++++++ .../i386/avx512f-vect-fmsubaddXXXps.c | 50 +++++++++++++++++++ 4 files changed, 182 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-vect-fmaddsubXXXpd.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-vect-fmaddsubXXXps.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-vect-fmsubaddXXXpd.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-vect-fmsubaddXXXps.c diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vect-fmaddsubXXXpd.c b/gcc/testsuite/gcc.target/i386/avx512f-vect-fmaddsubXXXpd.c new file mode 100644 index 00000000000..734f9e01443 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512f-vect-fmaddsubXXXpd.c @@ -0,0 +1,41 @@ +/* { dg-do run } */ +/* { dg-require-effective-target avx512f } */ +/* { dg-options "-O3 -mfma -save-temps -mavx512f -mprefer-vector-width=512" } */ + +#include "fma-check.h" +void __attribute__((noipa)) +check_fmaddsub (double * __restrict a, double *b, double *c, int n) +{ + for (int i = 0; i < n; ++i) + { + a[8*i + 0] = b[8*i + 0] * c[8*i + 0] - a[8*i + 0]; + a[8*i + 1] = b[8*i + 1] * c[8*i + 1] + a[8*i + 1]; + a[8*i + 2] = b[8*i + 2] * c[8*i + 2] - a[8*i + 2]; + a[8*i + 3] = b[8*i + 3] * c[8*i + 3] + a[8*i + 3]; + a[8*i + 4] = b[8*i + 4] * c[8*i + 4] - a[8*i + 4]; + a[8*i + 5] = b[8*i + 5] * c[8*i + 5] + a[8*i + 5]; + a[8*i + 6] = b[8*i + 6] * c[8*i + 6] - a[8*i + 6]; + a[8*i + 7] = b[8*i + 7] * c[8*i + 7] + a[8*i + 7]; + } +} + +static void +fma_test (void) +{ + if (!__builtin_cpu_supports ("avx512f")) + return; + double a[8], b[8], c[8]; + for (int i = 0; i < 8; ++i) + { + a[i] = i; + b[i] = 3*i; + c[i] = 7*i; + } + check_fmaddsub (a, b, c, 1); + const double d[8] = { 0., 22., 82., 192., 332., 530., 750., 1036.}; + for (int i = 0; i < 8; ++i) + if (a[i] != d[i]) + __builtin_abort (); +} + +/* { dg-final { scan-assembler {(?n)fmaddsub...pd[ \t].*%zmm[0-9]} } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vect-fmaddsubXXXps.c b/gcc/testsuite/gcc.target/i386/avx512f-vect-fmaddsubXXXps.c new file mode 100644 index 00000000000..ae196c5ef48 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512f-vect-fmaddsubXXXps.c @@ -0,0 +1,50 @@ +/* { dg-do run } */ +/* { dg-require-effective-target avx512f } */ +/* { dg-options "-O3 -mavx512f -mprefer-vector-width=512 -save-temps" } */ + +#include "fma-check.h" +void __attribute__((noipa)) +check_fmaddsub (float * __restrict a, float *b, float *c, int n) +{ + for (int i = 0; i < n; ++i) + { + a[16*i + 0] = b[16*i + 0] * c[16*i + 0] - a[16*i + 0]; + a[16*i + 1] = b[16*i + 1] * c[16*i + 1] + a[16*i + 1]; + a[16*i + 2] = b[16*i + 2] * c[16*i + 2] - a[16*i + 2]; + a[16*i + 3] = b[16*i + 3] * c[16*i + 3] + a[16*i + 3]; + a[16*i + 4] = b[16*i + 4] * c[16*i + 4] - a[16*i + 4]; + a[16*i + 5] = b[16*i + 5] * c[16*i + 5] + a[16*i + 5]; + a[16*i + 6] = b[16*i + 6] * c[16*i + 6] - a[16*i + 6]; + a[16*i + 7] = b[16*i + 7] * c[16*i + 7] + a[16*i + 7]; + a[16*i + 8] = b[16*i + 8] * c[16*i + 8] - a[16*i + 8]; + a[16*i + 9] = b[16*i + 9] * c[16*i + 9] + a[16*i + 9]; + a[16*i + 10] = b[16*i + 10] * c[16*i + 10] - a[16*i + 10]; + a[16*i + 11] = b[16*i + 11] * c[16*i + 11] + a[16*i + 11]; + a[16*i + 12] = b[16*i + 12] * c[16*i + 12] - a[16*i + 12]; + a[16*i + 13] = b[16*i + 13] * c[16*i + 13] + a[16*i + 13]; + a[16*i + 14] = b[16*i + 14] * c[16*i + 14] - a[16*i + 14]; + a[16*i + 15] = b[16*i + 15] * c[16*i + 15] + a[16*i + 15]; + } +} + +static void +fma_test (void) +{ + if (!__builtin_cpu_supports ("avx512f")) + return; + float a[16], b[16], c[16]; + for (int i = 0; i < 16; ++i) + { + a[i] = i; + b[i] = 3*i; + c[i] = 7*i; + } + check_fmaddsub (a, b, c, 1); + const float d[16] = { 0., 22., 82., 192., 332., 530., 750., 1036., + 1336, 1710., 2090., 2552., 3012., 3562., 4102., 4740.}; + for (int i = 0; i < 16; ++i) + if (a[i] != d[i]) + __builtin_abort (); +} + +/* { dg-final { scan-assembler {(?n)fmaddsub...ps[ \t].*%zmm[0-9]} } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vect-fmsubaddXXXpd.c b/gcc/testsuite/gcc.target/i386/avx512f-vect-fmsubaddXXXpd.c new file mode 100644 index 00000000000..cde76db1755 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512f-vect-fmsubaddXXXpd.c @@ -0,0 +1,41 @@ +/* { dg-do run } */ +/* { dg-require-effective-target avx512f } */ +/* { dg-options "-O3 -mavx512f -mprefer-vector-width=512 -save-temps" } */ + +#include "fma-check.h" +void __attribute__((noipa)) +check_fmaddsub (double * __restrict a, double *b, double *c, int n) +{ + for (int i = 0; i < n; ++i) + { + a[8*i + 0] = b[8*i + 0] * c[8*i + 0] + a[8*i + 0]; + a[8*i + 1] = b[8*i + 1] * c[8*i + 1] - a[8*i + 1]; + a[8*i + 2] = b[8*i + 2] * c[8*i + 2] + a[8*i + 2]; + a[8*i + 3] = b[8*i + 3] * c[8*i + 3] - a[8*i + 3]; + a[8*i + 4] = b[8*i + 4] * c[8*i + 4] + a[8*i + 4]; + a[8*i + 5] = b[8*i + 5] * c[8*i + 5] - a[8*i + 5]; + a[8*i + 6] = b[8*i + 6] * c[8*i + 6] + a[8*i + 6]; + a[8*i + 7] = b[8*i + 7] * c[8*i + 7] - a[8*i + 7]; + } +} + +static void +fma_test (void) +{ + if (!__builtin_cpu_supports ("avx512f")) + return; + double a[8], b[8], c[8]; + for (int i = 0; i < 8; ++i) + { + a[i] = i; + b[i] = 3*i; + c[i] = 7*i; + } + check_fmaddsub (a, b, c, 1); + const double d[8] = { 0., 20., 86., 186., 340., 520., 762., 1022.}; + for (int i = 0; i < 8; ++i) + if (a[i] != d[i]) + __builtin_abort (); +} + +/* { dg-final { scan-assembler {(?n)fmsubadd...pd[ \t].*%zmm[0-9]} } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vect-fmsubaddXXXps.c b/gcc/testsuite/gcc.target/i386/avx512f-vect-fmsubaddXXXps.c new file mode 100644 index 00000000000..59de39f4112 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512f-vect-fmsubaddXXXps.c @@ -0,0 +1,50 @@ +/* { dg-do run } */ +/* { dg-require-effective-target avx512f } */ +/* { dg-options "-O3 -mavx512f -mprefer-vector-width=512 -save-temps" } */ + +#include "fma-check.h" +void __attribute__((noipa)) +check_fmaddsub (float * __restrict a, float *b, float *c, int n) +{ + for (int i = 0; i < n; ++i) + { + a[16*i + 0] = b[16*i + 0] * c[16*i + 0] + a[16*i + 0]; + a[16*i + 1] = b[16*i + 1] * c[16*i + 1] - a[16*i + 1]; + a[16*i + 2] = b[16*i + 2] * c[16*i + 2] + a[16*i + 2]; + a[16*i + 3] = b[16*i + 3] * c[16*i + 3] - a[16*i + 3]; + a[16*i + 4] = b[16*i + 4] * c[16*i + 4] + a[16*i + 4]; + a[16*i + 5] = b[16*i + 5] * c[16*i + 5] - a[16*i + 5]; + a[16*i + 6] = b[16*i + 6] * c[16*i + 6] + a[16*i + 6]; + a[16*i + 7] = b[16*i + 7] * c[16*i + 7] - a[16*i + 7]; + a[16*i + 8] = b[16*i + 8] * c[16*i + 8] + a[16*i + 8]; + a[16*i + 9] = b[16*i + 9] * c[16*i + 9] - a[16*i + 9]; + a[16*i + 10] = b[16*i + 10] * c[16*i + 10] + a[16*i + 10]; + a[16*i + 11] = b[16*i + 11] * c[16*i + 11] - a[16*i + 11]; + a[16*i + 12] = b[16*i + 12] * c[16*i + 12] + a[16*i + 12]; + a[16*i + 13] = b[16*i + 13] * c[16*i + 13] - a[16*i + 13]; + a[16*i + 14] = b[16*i + 14] * c[16*i + 14] + a[16*i + 14]; + a[16*i + 15] = b[16*i + 15] * c[16*i + 15] - a[16*i + 15]; + } +} + +static void +fma_test (void) +{ + if (!__builtin_cpu_supports ("avx512f")) + return; + float a[16], b[16], c[16]; + for (int i = 0; i < 16; ++i) + { + a[i] = i; + b[i] = 3*i; + c[i] = 7*i; + } + check_fmaddsub (a, b, c, 1); + const float d[16] = { 0., 20., 86., 186., 340., 520., 762., 1022., + 1352, 1692., 2110., 2530., 3036., 3536., 4130., 4710.}; + for (int i = 0; i < 16; ++i) + if (a[i] != d[i]) + __builtin_abort (); +} + +/* { dg-final { scan-assembler {(?n)fmsubadd...ps[ \t].*%zmm[0-9]} } } */ -- 2.18.1 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Add FMADDSUB and FMSUBADD SLP vectorization patterns and optabs 2021-07-07 7:30 ` Hongtao Liu @ 2021-07-07 7:57 ` Richard Biener 0 siblings, 0 replies; 8+ messages in thread From: Richard Biener @ 2021-07-07 7:57 UTC (permalink / raw) To: Hongtao Liu; +Cc: GCC Patches, Liu, Hongtao On Wed, 7 Jul 2021, Hongtao Liu wrote: > > > > > and I have no easy way to test things there. Handling AVX512 > > > > > should be easy as followup though. > > Here's the patch adding avx512f tests for FMADDSUB/FMSUBADD slp patterns. > Pushed to the trunk. Thanks! Richard. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2021-07-07 7:57 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-07-05 14:09 [PATCH] Add FMADDSUB and FMSUBADD SLP vectorization patterns and optabs Richard Biener 2021-07-05 14:25 ` Richard Biener 2021-07-05 14:38 ` Richard Biener 2021-07-06 2:16 ` Hongtao Liu 2021-07-06 7:42 ` Richard Biener 2021-07-06 8:29 ` Hongtao Liu 2021-07-07 7:30 ` Hongtao Liu 2021-07-07 7:57 ` Richard Biener
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).