From: Tamar Christina <Tamar.Christina@arm.com>
To: Tamar Christina <Tamar.Christina@arm.com>,
"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Cc: nd <nd@arm.com>, "rguenther@suse.de" <rguenther@suse.de>,
"ook@ucw.cz" <ook@ucw.cz>
Subject: RE: [PATCH v2 8/16]middle-end: add Complex Multiply and Accumulate/Subtract and Multiply and Accumulate/Subtract with Conjucate detection
Date: Tue, 3 Nov 2020 15:06:45 +0000 [thread overview]
Message-ID: <VI1PR08MB5325D6273A6F708BC149B6C0FF110@VI1PR08MB5325.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <20200925142931.GA21805@arm.com>
[-- Attachment #1: Type: text/plain, Size: 1931 bytes --]
Hi All,
This is a respin of the patch using the new approach.
Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
Ok for master?
Thanks,
Tamar
gcc/ChangeLog:
* doc/md.texi: Document optabs.
* internal-fn.def (COMPLEX_FMA, COMPLEX_FMA_CONJ, COMPLEX_FMS,
COMPLEX_FMS_CONJ): New.
* optabs.def (cmla_optab, cmla_conj_optab, cmls_optab, cmls_conj_optab):
New.
* tree-vect-slp-patterns.c (class complex_fma_pattern,
complex_fma_pattern::matches): New.
(slp_patterns): Add complex_fma_pattern.
> -----Original Message-----
> From: Gcc-patches <gcc-patches-bounces@gcc.gnu.org> On Behalf Of Tamar
> Christina
> Sent: Friday, September 25, 2020 3:30 PM
> To: gcc-patches@gcc.gnu.org
> Cc: nd <nd@arm.com>; rguenther@suse.de; ook@ucw.cz
> Subject: [PATCH v2 8/16]middle-end: add Complex Multiply and
> Accumulate/Subtract and Multiply and Accumulate/Subtract with Conjucate
> detection
>
> Hi All,
>
> This patch adds pattern detections for the following operation:
>
> Complex FMLA, Conjucate FMLA of the second parameter and FMLS.
>
> c += a * b, c += a * conj (b), c -= a * b and c -= a * conj (b)
>
> For the conjucate cases it supports under fast-math that the operands that
> is
> being conjucated be flipped by flipping the arguments to the optab. This
> allows it to support c = conj (a) * b and c += conj (a) * b.
>
> where a, b and c are complex numbers.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> * doc/md.texi: Document optabs.
> * internal-fn.def (COMPLEX_FMA, COMPLEX_FMA_CONJ,
> COMPLEX_FMS,
> COMPLEX_FMS_CONJ): New.
> * optabs.def (cmla_optab, cmla_conj_optab, cmls_optab,
> cmls_conj_optab):
> New.
> * tree-vect-slp-patterns.c (class ComplexFMAPattern): New.
> (slp_patterns): Add ComplexFMAPattern.
>
> --
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: pr13512.patch --]
[-- Type: text/x-diff; name="pr13512.patch", Size: 9915 bytes --]
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index ddaf1abaccbd44dae11ea902ec38b474aacfb8e1..d8142f745050d963e8d15c7793fae06d9ad02020 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -6143,6 +6143,50 @@ rotations @var{m} of 90 or 270.
This pattern is not allowed to @code{FAIL}.
+@cindex @code{cmla@var{m}4} instruction pattern
+@item @samp{cmla@var{m}4}
+Perform a vector floating point multiply and accumulate of complex numbers
+in operand 0, operand 1 and operand 2.
+
+The instruction must perform the operation on data loaded contiguously into the
+vectors.
+The operation is only supported for vector modes @var{m}.
+
+This pattern is not allowed to @code{FAIL}.
+
+@cindex @code{cmla_conj@var{m}4} instruction pattern
+@item @samp{cmla_conj@var{m}4}
+Perform a vector floating point multiply and accumulate of complex numbers
+in operand 0, operand 1 and the conjucate of operand 2.
+
+The instruction must perform the operation on data loaded contiguously into the
+vectors.
+The operation is only supported for vector modes @var{m}.
+
+This pattern is not allowed to @code{FAIL}.
+
+@cindex @code{cmls@var{m}4} instruction pattern
+@item @samp{cmls@var{m}4}
+Perform a vector floating point multiply and subtract of complex numbers
+in operand 0, operand 1 and operand 2.
+
+The instruction must perform the operation on data loaded contiguously into the
+vectors.
+The operation is only supported for vector modes @var{m}.
+
+This pattern is not allowed to @code{FAIL}.
+
+@cindex @code{cmls_conj@var{m}4} instruction pattern
+@item @samp{cmls_conj@var{m}4}
+Perform a vector floating point multiply and subtract of complex numbers
+in operand 0, operand 1 and the conjucate of operand 2.
+
+The instruction must perform the operation on data loaded contiguously into the
+vectors.
+The operation is only supported for vector modes @var{m}.
+
+This pattern is not allowed to @code{FAIL}.
+
@cindex @code{cmul@var{m}4} instruction pattern
@item @samp{cmul@var{m}4}
Perform a vector floating point multiplication of complex numbers in operand 0
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index cb41643f5e332518a0271bb8e1af4883c8bd6880..acb7d9f3bdc757437d5492a652144ba31c2ef702 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -288,6 +288,10 @@ DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
/* Ternary math functions. */
DEF_INTERNAL_FLT_FLOATN_FN (FMA, ECF_CONST, fma, ternary)
+DEF_INTERNAL_OPTAB_FN (COMPLEX_FMA, ECF_CONST, cmla, ternary)
+DEF_INTERNAL_OPTAB_FN (COMPLEX_FMA_CONJ, ECF_CONST, cmla_conj, ternary)
+DEF_INTERNAL_OPTAB_FN (COMPLEX_FMS, ECF_CONST, cmls, ternary)
+DEF_INTERNAL_OPTAB_FN (COMPLEX_FMS_CONJ, ECF_CONST, cmls_conj, ternary)
/* Unary integer ops. */
DEF_INTERNAL_INT_FN (CLRSB, ECF_CONST | ECF_NOTHROW, clrsb, unary)
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 9c267d422478d0011f288b1f5f62daabe3989ba7..19db9c00896cd08adfd20a01669990bbbebd79f1 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -294,6 +294,10 @@ OPTAB_D (cadd90_optab, "cadd90$a3")
OPTAB_D (cadd270_optab, "cadd270$a3")
OPTAB_D (cmul_optab, "cmul$a3")
OPTAB_D (cmul_conj_optab, "cmul_conj$a3")
+OPTAB_D (cmla_optab, "cmla$a4")
+OPTAB_D (cmla_conj_optab, "cmla_conj$a4")
+OPTAB_D (cmls_optab, "cmls$a4")
+OPTAB_D (cmls_conj_optab, "cmls_conj$a4")
OPTAB_D (cos_optab, "cos$a2")
OPTAB_D (cosh_optab, "cosh$a2")
OPTAB_D (exp10_optab, "exp10$a2")
diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
index 2edb0117f9cbbfc40e9ed3a96120a3c88f84a68e..c2987c2afac2fbd55e2acd6b56fc13c7d3ad13c1 100644
--- a/gcc/tree-vect-slp-patterns.c
+++ b/gcc/tree-vect-slp-patterns.c
@@ -1172,6 +1172,176 @@ complex_mul_pattern::validate_p ()
return true;
}
+
+/*******************************************************************************
+ * complex_fma_pattern class
+ ******************************************************************************/
+
+class complex_fma_pattern : public complex_mul_pattern
+{
+ protected:
+ complex_fma_pattern (slp_tree *node, vec_info *vinfo)
+ : complex_mul_pattern (node, vinfo)
+ {
+ this->m_arity = 2;
+ this->m_num_args = 3;
+ }
+
+ public:
+ static vect_pattern* create (slp_tree *node, vec_info *vinfo)
+ {
+ return new complex_fma_pattern (node, vinfo);
+ }
+
+ const char* get_name ()
+ {
+ return "Complex FM(A|S)";
+ }
+
+ bool matches ();
+ bool matches (complex_operation_t op, vec<slp_tree> ops);
+};
+
+/* Pattern matcher for trying to match complex multiply and accumulate
+ and multiply and subtract patterns in SLP tree.
+ If the operation matches then IFN is set to the operation it matched and
+ the arguments to the two replacement statements are put in M_OPS.
+
+ If no match is found then IFN is set to IFN_LAST and M_OPTS is unchanged.
+
+ This function matches the patterns shaped as:
+
+ double ax = (b[i+1] * a[i]) + (b[i] * a[i]);
+ double bx = (a[i+1] * b[i]) - (a[i+1] * b[i+1]);
+
+ c[i] = c[i] - ax;
+ c[i+1] = c[i+1] + bx;
+
+ If a match occurred then TRUE is returned, else FALSE. */
+bool
+complex_fma_pattern::matches (complex_operation_t op1, vec<slp_tree> args0)
+{
+ this->m_ifn = IFN_LAST;
+
+ /* Find the two components. We match Complex MUL first which reduces the
+ amount of work this pattern has to do. After that we just match the
+ head node and we're done.:
+
+ * FMA: + +
+ * FMS: - +. */
+ slp_tree child = NULL;
+
+ /* We need to ignore the two_operands nodes that may also match,
+ for that we can check if they have any scalar statements and also
+ check that it's not a permute node as we're looking for a normal
+ PLUS_EXPR operation. */
+ if (op1 == PLUS_MINUS)
+ {
+ child = SLP_TREE_CHILDREN (args0[1])[1];
+ }
+ else if (SLP_TREE_SCALAR_STMTS (*this->m_node).length () > 0
+ && SLP_TREE_CODE (*this->m_node) != VEC_PERM_EXPR
+ && vect_match_expression_p (*this->m_node, PLUS_EXPR))
+ {
+ if (SLP_TREE_CHILDREN (*this->m_node).length () != 2)
+ return false;
+
+ op1 = PLUS_PLUS;
+ args0.safe_splice (SLP_TREE_CHILDREN (*this->m_node));
+ child = args0[1];
+ }
+ else
+ return false;
+
+ auto_vec<slp_tree> ops;
+ internal_fn mulfn = IFN_LAST;
+ /* The accumulation step produces an inverse tree from normal
+ multiply so match the nodes in reverse. */
+ if (!vect_slp_matches_complex_mul (child, &mulfn, &ops, false,
+ op1 == PLUS_MINUS))
+ return false;
+
+ this->m_ops.create (6);
+ if (op1 == PLUS_MINUS)
+ {
+ if (mulfn == IFN_COMPLEX_MUL)
+ this->m_ifn = IFN_COMPLEX_FMS;
+ else if (mulfn == IFN_COMPLEX_MUL_CONJ)
+ this->m_ifn = IFN_COMPLEX_FMS_CONJ;
+
+ child = SLP_TREE_CHILDREN (args0[0])[0];
+ this->workset.safe_splice (SLP_TREE_CHILDREN (*this->m_node));
+ save_match ();
+ }
+ else if (op1 == PLUS_PLUS)
+ {
+ if (mulfn == IFN_COMPLEX_MUL)
+ this->m_ifn = IFN_COMPLEX_FMA;
+ else if (mulfn == IFN_COMPLEX_MUL_CONJ)
+ this->m_ifn = IFN_COMPLEX_FMA_CONJ;
+
+ /* Add doesn't generate a two_operators node, so for it we replace it
+ inline by turning the add node itself into a pattern. */
+ this->m_inplace = true;
+ this->workset.safe_push (*this->m_node);
+ child = args0[0];
+ this->m_match
+ = new vect_simple_pattern_match (this->m_arity, this->m_ifn,
+ this->m_vinfo, &this->workset,
+ this->m_num_args);
+ }
+
+ if (this->m_ifn == IFN_LAST)
+ return false;
+
+ /* The conjucate nodes have a different orderings, oddly enough the SUB node
+ has the same order regardless of the conjucate. This needs to be made more
+ consistent in the mid-end. */
+ if (op1 == PLUS_MINUS || mulfn == IFN_COMPLEX_MUL)
+ {
+ this->m_ops.quick_push (child);
+ this->m_ops.quick_push (ops[1]);
+ this->m_ops.quick_push (ops[0]);
+ this->m_ops.quick_push (child);
+ this->m_ops.quick_push (ops[3]);
+ this->m_ops.quick_push (ops[2]);
+ }
+ else
+ {
+ this->m_ops.quick_push (child);
+ this->m_ops.quick_push (ops[0]);
+ this->m_ops.quick_push (ops[1]);
+ this->m_ops.quick_push (child);
+ this->m_ops.quick_push (ops[2]);
+ this->m_ops.quick_push (ops[3]);
+ }
+
+ vect_build_perm_groups (&this->m_blocks[0], this->m_ops);
+
+ /* Unfortunately the sequence for a conjucate and rotation by 180 and 270 are
+ remarkably similar. So we need to do some extra checks to make sure we
+ don't match those. */
+ if (mulfn == IFN_COMPLEX_MUL_CONJ)
+ for (unsigned i = 0; i < this->m_ops.length (); i++)
+ {
+ map_t m = this->m_blocks[i];
+ if (m.a > m.b)
+ return false;
+ }
+
+ return true;
+}
+
+bool
+complex_fma_pattern::matches ()
+{
+ auto_vec<slp_tree> args0;
+ complex_operation_t op
+ = vect_detect_pair_op (*this->m_node, true, &args0);
+ return matches (op, args0);
+}
+
+
/*******************************************************************************
* complex_operations_pattern class
******************************************************************************/
@@ -1303,6 +1473,10 @@ vect_pattern_decl_t slp_patterns[]
order patterns from the largest to the smallest. Especially if they
overlap in what they can detect. */
+ /* FMA overlaps with MUL but is the longer sequence. Because we're in post
+ order traversal we can't match FMA if included in
+ complex_operations_pattern so must be checked on it's own. */
+ SLP_PATTERN (complex_fma_pattern),
SLP_PATTERN (complex_operations_pattern),
};
#undef SLP_PATTERN
prev parent reply other threads:[~2020-11-03 15:06 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-25 14:29 Tamar Christina
2020-11-03 15:06 ` Tamar Christina [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=VI1PR08MB5325D6273A6F708BC149B6C0FF110@VI1PR08MB5325.eurprd08.prod.outlook.com \
--to=tamar.christina@arm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=nd@arm.com \
--cc=ook@ucw.cz \
--cc=rguenther@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).