From: Tamar Christina <Tamar.Christina@arm.com>
To: Richard Biener <rguenther@suse.de>,
Richard Sandiford <Richard.Sandiford@arm.com>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
nd <nd@arm.com>, "ook@ucw.cz" <ook@ucw.cz>
Subject: RE: [PATCH v2 6/16]middle-end Add Complex Addition with rotation detection
Date: Tue, 3 Nov 2020 15:06:39 +0000 [thread overview]
Message-ID: <VI1PR08MB532552AF751AA7903086F109FF110@VI1PR08MB5325.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <nycvar.YFH.7.76.2009291242100.10073@p653.nepu.fhfr.qr>
[-- Attachment #1: Type: text/plain, Size: 7347 bytes --]
Hi All,
here is a respin with the requested changes.
I just realized I haven't updated the documentation yet but will do
so soon since I'm sure there will be feedback :)
Thanks,
Tamar
gcc/ChangeLog:
* doc/md.texi: Document optabs.
* internal-fn.def (COMPLEX_ADD_ROT90, COMPLEX_ADD_ROT270): New.
* optabs.def (cadd90_optab, cadd270_optab): New.
* tree-vect-slp-patterns.c (linear_loads_p, vect_slp_make_linear,
class complex_add_pattern,complex_add_pattern::matches): New.
(complex_operations_pattern::matches): Add complex_add_pattern.
> -----Original Message-----
> From: rguenther@c653.arch.suse.de <rguenther@c653.arch.suse.de> On
> Behalf Of Richard Biener
> Sent: Tuesday, September 29, 2020 11:44 AM
> To: Richard Sandiford <Richard.Sandiford@arm.com>
> Cc: Tamar Christina <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org;
> nd <nd@arm.com>; ook@ucw.cz
> Subject: Re: [PATCH v2 6/16]middle-end Add Complex Addition with rotation
> detection
>
> On Tue, 29 Sep 2020, Richard Sandiford wrote:
>
> > Tamar Christina <tamar.christina@arm.com> writes:
> > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index
> > >
> 2b46286943778e16d95b15def4299bcbf8db7eb8..71e226505b2619d10982b59a
> 4e
> > > bbed73a70f29be 100644
> > > --- a/gcc/doc/md.texi
> > > +++ b/gcc/doc/md.texi
> > > @@ -6132,6 +6132,17 @@ floating-point mode.
> > >
> > > This pattern is not allowed to @code{FAIL}.
> > >
> > > +@cindex @code{cadd@var{m}@var{n}3} instruction pattern @item
> > > +@samp{cadd@var{m}@var{n}3} Perform a vector addition of complex
> > > +numbers in operand 1 with operand 2 rotated by @var{m} degrees
> > > +around the argand plane and storing the result in operand 0. The
> > > +instruction must perform the operation on data loaded contiguously
> > > +into the vectors.
> >
> > Nitpicking, sorry, but I think it would be better to describe the
> > layout directly rather than in terms of loads, since the preceding
> > operation might not be a load.
>
> So if we're at that and since GCC vectors do not have complex components
> can we formulate this in terms avoiding 'complex'?
> Isn't this an add of one vector to a vector with adjacant lanes swapped and
> possibly negated? Mentioning that this would match a complex add in case
> lanes happen to match up with complex real/imag parts is OK but the pattern
> should work equally well if there's no complex numbers involved?
>
> > I guess the main question is: what representation do we expect for
> > big-endian? A normal Advanced SIMD LDR would give this (for floats):
> >
> > MEMORY
> > +-----+-----+-----+-----+
> > | r0 | i0 | r1 | i1 |
> > +-----+-----+-----+-----+
> > | 0 | 1 | 2 | 3 | array numbering
> > +-----+-----+-----+-----+
> > V V V V Advanced SIMD LDR
> > +-----+-----+-----+-----+
> > | r0 | i0 | r1 | i1 |
> > +-----+-----+-----+-----+
> > | 0 | 1 | 2 | 3 | GCC lane numbering
> > +-----+-----+-----+-----+
> > | 3 | 2 | 1 | 0 | Arm lane numbering
> > +-----+-----+-----+-----+
> > MSB REGISTER LSB
> >
> > but the FC* instructions put the imaginary parts in the more
> > significant lane, so the pairs of elements above would need to be
> > reversed:
> >
> > MEMORY
> > +-----+-----+-----+-----+
> > | r0 | i0 | r1 | i1 |
> > +-----+-----+-----+-----+
> > | 0 | 1 | 2 | 3 | array numbering
> > +-----+-----+-----+-----+
> > \ / \ /
> > \ / \ /
> > X X Load and permute
> > / \ / \
> > / \ / \
> > +-----+-----+-----+-----+
> > | i0 | r0 | i1 | r1 |
> > +-----+-----+-----+-----+
> > | 0 | 1 | 2 | 3 | GCC lane numbering
> > +-----+-----+-----+-----+
> > | 3 | 2 | 1 | 0 | Arm lane numbering
> > +-----+-----+-----+-----+
> > MSB REGISTER LSB
> >
> > (Or the whole vector could be reversed.)
> >
> > We might decide that it just isn't worth doing this for Advanced SIMD.
> > But should the semantics of the optab be that:
> >
> > (1) GCC lane number 0 holds a real part, or
> > (2) the least significant lane holds a real part?
> >
> > With (1), it would be up to the target to hide the permute above.
> > With (2), the vectoriser would need to introduce the permute itself.
> >
> > I'm not sure there's a perfect answer even for Arm targets. (2)
> > matches the Advanced SIMD semantics. But for SVE, the register layout
> > follows
> > LD1 rather than LDR, and the GCC and architectural lane numbering match
> up.
> > (1) would therefore be better than (2) for SVE (and so no permute
> > would be needed for either endianness on SVE).
> >
> > > +The operation is only supported for vector modes @var{n} and with
> > > +rotations @var{m} of 90 or 270.
> > > +
> > > +This pattern is not allowed to @code{FAIL}.
> > > +
> > > @cindex @code{ffs@var{m}2} instruction pattern @item
> > > @samp{ffs@var{m}2} Store into operand 0 one plus the index of the
> > > least significant 1-bit diff --git a/gcc/internal-fn.def
> > > b/gcc/internal-fn.def index
> > >
> 13e60828fcf5db6c5f15aae2bacd4cf04029e430..956a65a338c157b51de7e78a3f
> > > b005b5af78ef31 100644
> > > --- a/gcc/internal-fn.def
> > > +++ b/gcc/internal-fn.def
> > > @@ -275,6 +275,8 @@ DEF_INTERNAL_FLT_FN (SCALB, ECF_CONST, scalb,
> > > binary) DEF_INTERNAL_FLT_FLOATN_FN (FMIN, ECF_CONST, fmin,
> binary)
> > > DEF_INTERNAL_FLT_FLOATN_FN (FMAX, ECF_CONST, fmax, binary)
> > > DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONST, xorsign, binary)
> > > +DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT90, ECF_CONST,
> cadd90,
> > > +binary) DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270,
> ECF_CONST,
> > > +cadd270, binary)
> > >
> > > /* FP scales. */
> > > DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary) diff --git
> > > a/gcc/optabs.def b/gcc/optabs.def index
> > >
> 78409aa14537d259bf90277751aac00d452a0d3f..2bb0bf857977035bf562a77f5f
> > > 6848e80edf936d 100644
> > > --- a/gcc/optabs.def
> > > +++ b/gcc/optabs.def
> > > @@ -290,6 +290,8 @@ OPTAB_D (atan_optab, "atan$a2") OPTAB_D
> > > (atanh_optab, "atanh$a2") OPTAB_D (copysign_optab, "copysign$F$a3")
> > > OPTAB_D (xorsign_optab, "xorsign$F$a3")
> > > +OPTAB_D (cadd90_optab, "cadd90$a3") OPTAB_D (cadd270_optab,
> > > +"cadd270$a3")
> > > OPTAB_D (cos_optab, "cos$a2")
> > > OPTAB_D (cosh_optab, "cosh$a2")
> > > OPTAB_D (exp10_optab, "exp10$a2")
> > > diff --git a/gcc/tree-vect-slp-patterns.c
> > > b/gcc/tree-vect-slp-patterns.c index
> > >
> 6453a5b1b6464dba833adc2c2a194db5e712bb79..b2b0ac62e9a69145470f41d2
> ba
> > > c736dd970be735 100644
> > > --- a/gcc/tree-vect-slp-patterns.c
> > > +++ b/gcc/tree-vect-slp-patterns.c
> > > @@ -663,12 +663,94 @@ graceful_exit:
> > > }
> > > };
> > >
> > > +class ComplexAddPattern : public ComplexPattern
> >
> > Another nitpick, sorry, but type names should be lower case rather
> > than CamelCase.
> >
> > Thanks,
> > Richard
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> Nuernberg, Germany; GF: Felix Imend
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: pr13510.patch --]
[-- Type: text/x-diff; name="pr13510.patch", Size: 9225 bytes --]
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 2b46286943778e16d95b15def4299bcbf8db7eb8..71e226505b2619d10982b59a4ebbed73a70f29be 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -6132,6 +6132,17 @@ floating-point mode.
This pattern is not allowed to @code{FAIL}.
+@cindex @code{cadd@var{m}@var{n}3} instruction pattern
+@item @samp{cadd@var{m}@var{n}3}
+Perform a vector addition of complex numbers in operand 1 with operand 2
+rotated by @var{m} degrees around the argand plane and storing the result in
+operand 0. The instruction must perform the operation on data loaded
+contiguously into the vectors.
+The operation is only supported for vector modes @var{n} and with
+rotations @var{m} of 90 or 270.
+
+This pattern is not allowed to @code{FAIL}.
+
@cindex @code{ffs@var{m}2} instruction pattern
@item @samp{ffs@var{m}2}
Store into operand 0 one plus the index of the least significant 1-bit
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 310d37aa53819791b5df1683afca831f08e5892a..33c54be1e158ddea25c4cd6b1148df8cf4a509b5 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -277,6 +277,9 @@ DEF_INTERNAL_FLT_FN (SCALB, ECF_CONST, scalb, binary)
DEF_INTERNAL_FLT_FLOATN_FN (FMIN, ECF_CONST, fmin, binary)
DEF_INTERNAL_FLT_FLOATN_FN (FMAX, ECF_CONST, fmax, binary)
DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONST, xorsign, binary)
+DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT90, ECF_CONST, cadd90, binary)
+DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary)
+
/* FP scales. */
DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 78409aa14537d259bf90277751aac00d452a0d3f..2bb0bf857977035bf562a77f5f6848e80edf936d 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -290,6 +290,8 @@ OPTAB_D (atan_optab, "atan$a2")
OPTAB_D (atanh_optab, "atanh$a2")
OPTAB_D (copysign_optab, "copysign$F$a3")
OPTAB_D (xorsign_optab, "xorsign$F$a3")
+OPTAB_D (cadd90_optab, "cadd90$a3")
+OPTAB_D (cadd270_optab, "cadd270$a3")
OPTAB_D (cos_optab, "cos$a2")
OPTAB_D (cosh_optab, "cosh$a2")
OPTAB_D (exp10_optab, "exp10$a2")
diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
index 65044a77d55e24cde6c663e81c11b66ad9650056..0732cf0a6d93be8590b84c39dff82940b280e46b 100644
--- a/gcc/tree-vect-slp-patterns.c
+++ b/gcc/tree-vect-slp-patterns.c
@@ -157,6 +157,114 @@ typedef enum _complex_operation : unsigned {
typedef struct map_ { int a; int b; } map_t;
+/*******************************************************************************
+ * General helper functions
+ ******************************************************************************/
+
+/* Check to see if all loads rooted in ROOT are linear. Linearity is
+ defined as having no gaps between values loaded. */
+
+static load_permutation_t
+linear_loads_p (slp_tree root, bool *linear)
+{
+ *linear = false;
+ if (!root)
+ return vNULL;
+
+ unsigned i;
+ load_permutation_t loads = vNULL;
+
+ if (SLP_TREE_LOAD_PERMUTATION (root).exists ())
+ {
+ loads = SLP_TREE_LOAD_PERMUTATION (root);
+ unsigned leader = loads[0];
+ unsigned load;
+ FOR_EACH_VEC_ELT_FROM (loads, i, load, 1)
+ if (load != ++leader)
+ return loads;
+ }
+
+ slp_tree child;
+ FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (root), i, child)
+ {
+ loads = linear_loads_p (child, linear);
+ if (!*linear)
+ return loads;
+ }
+
+ *linear = true;
+ return loads;
+}
+
+/* This function attempts to make a node rooted in NODE linear. If the node
+ if already linear than the node itself is returned in RESULT.
+
+ If the node is not linear then a new VEC_PERM_EXPR node is created with a
+ lane permute that when applied will make the node linear. If such a
+ permute cannot be created then FALSE is returned from the function.
+
+ Here linearity is defined as having a sequential, monotically increasing
+ load position inside the load permute generated by the loads reachable from
+ NODE. */
+
+static bool
+vect_slp_make_linear (slp_tree parent, slp_tree node, slp_tree *result)
+{
+ bool is_linear = false;
+ unsigned x, val;
+ load_permutation_t load_perm = linear_loads_p (node, &is_linear);
+ if (is_linear)
+ {
+ *result = node;
+ return true;
+ }
+
+ /* Attempt to linearise the permute. */
+ vec<std::pair<unsigned, unsigned> > zipped;
+ zipped.create (load_perm.length ());
+ FOR_EACH_VEC_ELT (load_perm, x, val)
+ zipped.quick_push (std::make_pair (val, x));
+ typedef const std::pair<unsigned, unsigned>* cmp_t;
+ zipped.qsort ([](const void *a, const void *b) -> int
+ { return (int)((cmp_t)a)->first - (int)((cmp_t)b)->first; });
+
+ /* Verify if we have a linear permute sequence. */
+ if (zipped.length () > 0)
+ {
+ unsigned leader = zipped[0].first;
+ for (x = 1; x < zipped.length (); x++)
+ if(!(is_linear = (zipped[x].first == ++leader)))
+ break;
+ }
+ else
+ is_linear = true;
+
+ if (!is_linear)
+ {
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+ "Loads could not be made linear %p\n",
+ node);
+ zipped.release ();
+ return false;
+ }
+
+ for (x = 0; x < zipped.length (); x++)
+ zipped[x].first = 0;
+
+ /* Create the new permute node and store it instead. */
+ slp_tree vnode = vect_create_new_slp_node (vNULL, 1);
+ SLP_TREE_CODE (vnode) = VEC_PERM_EXPR;
+ SLP_TREE_LANE_PERMUTATION (vnode) = zipped;
+ SLP_TREE_VECTYPE (vnode) = SLP_TREE_VECTYPE (parent);
+ SLP_TREE_CHILDREN (vnode).quick_push (node);
+ SLP_TREE_REF_COUNT (vnode) = 1;
+ SLP_TREE_LANES (vnode) = SLP_TREE_LANES (node);
+ SLP_TREE_REPRESENTATIVE (vnode) = SLP_TREE_REPRESENTATIVE (parent);
+ *result = vnode;
+ return is_linear;
+}
+
/*******************************************************************************
* Simple vector pattern matcher
******************************************************************************/
@@ -532,6 +640,93 @@ complex_pattern::validate_p ()
return true;
}
+
+/*******************************************************************************
+ * complex_add_pattern class
+ ******************************************************************************/
+
+class complex_add_pattern : public complex_pattern
+{
+ protected:
+ complex_add_pattern (slp_tree *node, vec_info *vinfo)
+ : complex_pattern (node, vinfo)
+ {
+ this->m_arity = 2;
+ this->m_num_args = 2;
+ }
+
+ public:
+ static vect_pattern* create (slp_tree *node, vec_info *vinfo)
+ {
+ return new complex_add_pattern (node, vinfo);
+ }
+
+ const char* get_name ()
+ {
+ return "Complex Addition";
+ }
+
+ bool matches ();
+ bool matches (complex_operation_t op, vec<slp_tree> ops);
+};
+
+/* Pattern matcher for trying to match complex addition pattern in SLP tree.
+
+ If no match is found then IFN is set to IFN_LAST.
+ This function matches the patterns shaped as:
+
+ c[i] = a[i] - b[i+1];
+ c[i+1] = a[i+1] + b[i];
+
+ If a match occurred then TRUE is returned, else FALSE. */
+
+bool
+complex_add_pattern::matches (complex_operation_t op, vec<slp_tree> ops)
+{
+ this->m_ifn = IFN_LAST;
+ this->m_ops.safe_splice (ops);
+
+ /* Find the two components. Rotation in the complex plane will modify
+ the operations:
+
+ * Rotation 0: + +
+ * Rotation 90: - +
+ * Rotation 180: - -
+ * Rotation 270: + -
+
+ Rotation 0 and 180 can be handled by normal SIMD code, so we don't need
+ to care about them here. */
+ if (op == MINUS_PLUS)
+ this->m_ifn = IFN_COMPLEX_ADD_ROT90;
+ else if (op == PLUS_MINUS)
+ this->m_ifn = IFN_COMPLEX_ADD_ROT270;
+
+ /* verify that there is a permute, otherwise this isn't a pattern we
+ we support. */
+ bool is_linear = false;
+ bool all_linear = true;
+ unsigned x;
+ for (x = 0; x < this->m_ops.length () && all_linear; x++)
+ {
+ linear_loads_p (this->m_ops[x], &is_linear);
+ all_linear = all_linear && is_linear;
+ }
+
+ if (this->m_ifn == IFN_LAST || all_linear)
+ return false;
+
+ save_match ();
+ return true;
+}
+
+bool
+complex_add_pattern::matches ()
+{
+ complex_operation_t op
+ = vect_detect_pair_op (*this->m_node, true, &this->m_ops);
+ return matches (op, this->m_ops);
+}
+
/*******************************************************************************
* complex_operations_pattern class
******************************************************************************/
@@ -580,6 +775,13 @@ complex_operations_pattern::matches ()
if (op == CMPLX_NONE)
return false;
+ /* Check which pattern this may be. Match longest pattern first. */
+ this->m_patt = complex_add_pattern::create (this->m_node, this->m_vinfo);
+ if (this->m_patt->matches (op, this->m_ops))
+ return true;
+
+ delete this->m_patt;
+
this->m_patt = NULL;
return false;
}
prev parent reply other threads:[~2020-11-03 15:06 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-25 14:28 Tamar Christina
2020-09-29 10:02 ` Richard Sandiford
2020-09-29 10:44 ` Richard Biener
2020-11-03 15:06 ` Tamar Christina [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=VI1PR08MB532552AF751AA7903086F109FF110@VI1PR08MB5325.eurprd08.prod.outlook.com \
--to=tamar.christina@arm.com \
--cc=Richard.Sandiford@arm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=nd@arm.com \
--cc=ook@ucw.cz \
--cc=rguenther@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).