From: Tamar Christina <Tamar.Christina@arm.com>
To: Richard Biener <rguenther@suse.de>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
nd <nd@arm.com>, "ook@ucw.cz" <ook@ucw.cz>,
"hongtao.liu@intel.com" <hongtao.liu@intel.com>
Subject: RE: [PATCH 2/2] middle-end : Add support for complex pattern detection for Add, Mul, FMA and FMS
Date: Mon, 16 Nov 2020 15:54:30 +0000 [thread overview]
Message-ID: <VI1PR08MB532584C37C8F6A2F9FFBD681FFE30@VI1PR08MB5325.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <nycvar.YFH.7.76.2011161509060.10073@p653.nepu.fhfr.qr>
Hi Richi, thanks for the review!
Just a quick comment on one of the questions asked:
> -----Original Message-----
> From: rguenther@c653.arch.suse.de <rguenther@c653.arch.suse.de> On
> Behalf Of Richard Biener
> Sent: Monday, November 16, 2020 3:12 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; ook@ucw.cz;
> hongtao.liu@intel.com
> Subject: Re: [PATCH 2/2] middle-end : Add support for complex pattern
> detection for Add, Mul, FMA and FMS
>
> On Sat, 14 Nov 2020, Tamar Christina wrote:
>
> > Hi All,
> >
> > This patch series adds support for SLP vectorization of complex instructions
> [1].
> >
> > These instructions exist only in their vector forms and require you to
> recognize
> > two statements in parallel. Complex operations usually require a permute
> due to
> > the fact that the real and imaginary numbers are stored intermixed but
> these vector
> > instructions expect this and no longer need the compiler to generate a
> permute.
> >
> > The pass is able to insert additional permutes inside the SLP tree in order to
> > change the data flow such that the new instructions can be used. These
> permutes
> > are then optimized out later during optimize_slp.
> >
> > These instructions also support rotations along the Argand plane, as such
> the operands
> > have to be re-ordered to coincide with their load group.
> >
> > For now, this patch only adds support for:
> >
> > * Complex Addition with rotation of 90 and 270.
> >
> > Addition with rotation of the second argument around the Argand plane.
> > Supported rotations are 90 and 180.
> >
> > c = a + (b * I) and c = a + (b * I * I * I)
> >
> > * Complex Multiplication and Multiplication where one operand is
> conjucated.
> >
> > Complex multiplication and Conjucate Complex multiplication of the
> second
> > parameter.
> >
> > c = a * b and c = a * conj (b)
> >
> > * Complex FMA and FMA where one operand is conjucated.
> > * Complex FMS and FMS where one operand is conjucated.
> >
> > Complex FMLA, Conjucate FMLA of the second parameter and FMLS.
> >
> > c += a * b, c += a * conj (b), c -= a * b and c -= a * conj (b)
> >
> > For the conjucate cases it supports under fast-math that the operands
> that is
> > being conjucated be flipped by flipping the arguments to the optab. This
> > allows it to support c = conj (a) * b and c += conj (a) * b.
> >
> > where a, b and c are complex numbers.
> >
> > Complex dot-product is not currently supported in this patch set as
> build_slp fails
> > for it. This will be provided as a future patch.
> >
> > These are supported for both integer and floating point and as such these
> don't look
> > for real or imaginary pairs but instead rely on the early lowering of complex
> > numbers by GCC and canonization of the operations such that it just
> recognizes any
> > instruction sequence matching the operations requested.
> >
> > To be safe when the it is not sure it can support the operation or if it finds
> > something it does not understand it backs off. This is done by looking at
> the
> > order of the loads below the nodes. Anything that changes the order like a
> > VEC_PERM is taken into account. The order of the loads define which
> operation
> > is being done. For instance a complex add with conjucate always has the
> second
> > loads in non-contiguous order. Because this information is queried often it
> is
> > heavily cached and because we never modify nodes in place we can cache
> the
> > operation through all SLP instances.
> >
> > When SLP failes (e.g. due to costing) we dissolve the changes and restore
> the
> > previous relevancies of the stmts that were in a pattern before.
> >
> > [1] https://developer.arm.com/documentation/ddi0487/fc/
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master? (working on moving tests to middend IFN tests instead).
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * tree-vect-slp-patterns.c: New file.
> > * Makefile.in: Use it.
> > * doc/passes.texi: Document it.
> > * internal-fn.def (COMPLEX_ADD_ROT90, COMPLEX_ADD_ROT270,
> > COMPLEX_MUL, COMPLEX_MUL_CONJ, COMPLEX_FMA,
> COMPLEX_FMA_CONJ):
> > COMPLEX_FMS, COMPLEX_FMS_CONJ): New.
> > * optabs.def (cadd90_optab, cadd270_optab, cmul_optab,
> cmul_conj_optab,
> > cmla_optab, cmla_conj_optab, cmls_optab, cmls_conj_optab): New.
> > * doc/md.texi: Document them
> >
> > -- inline copy of patch --
> >
> > diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> > index
> 273654cfa2525099ed0d9adc2f9085c8408b096f..4b556bdbbc2f989d894ef2dbe
> 894318858eaa7aa 100644
> > --- a/gcc/Makefile.in
> > +++ b/gcc/Makefile.in
> > @@ -1646,6 +1646,7 @@ OBJS = \
> > tree-vect-loop.o \
> > tree-vect-loop-manip.o \
> > tree-vect-slp.o \
> > + tree-vect-slp-patterns.o \
> > tree-vectorizer.o \
> > tree-vector-builder.o \
> > tree-vrp.o \
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > index
> 813875b973bc73920f1195c4897ce3491ca713c2..68792e4c3eeca2bce0112fc28e
> ba38700e66dc00 100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -1493,7 +1493,7 @@ can be described by a series of letters for each
> operand. The overall
> > constraint for an operand is made from the letters for this operand
> > from the first alternative, a comma, the letters for this operand from
> > the second alternative, a comma, and so on until the last alternative.
> > -All operands for a single instruction must have the same number of
> > +All operands for a single instruction must have the same number of
> > alternatives.
> > @ifset INTERNALS
> > Here is how it is done for fullword logical-or on the 68000:
> > @@ -1558,17 +1558,17 @@ the first, 1 for the second alternative, etc.).
> @xref{Output Statement}.
> > @end ifset
> > @ifclear INTERNALS
> >
> > -So the first alternative for the 68000's logical-or could be written as
> > -@code{"+m" (output) : "ir" (input)}. The second could be @code{"+r"
> > -(output): "irm" (input)}. However, the fact that two memory locations
> > -cannot be used in a single instruction prevents simply using @code{"+rm"
> > -(output) : "irm" (input)}. Using multi-alternatives, this might be
> > +So the first alternative for the 68000's logical-or could be written as
> > +@code{"+m" (output) : "ir" (input)}. The second could be @code{"+r"
> > +(output): "irm" (input)}. However, the fact that two memory locations
> > +cannot be used in a single instruction prevents simply using @code{"+rm"
> > +(output) : "irm" (input)}. Using multi-alternatives, this might be
> > written as @code{"+m,r" (output) : "ir,irm" (input)}. This describes
> > -all the available alternatives to the compiler, allowing it to choose
> > +all the available alternatives to the compiler, allowing it to choose
> > the most efficient one for the current conditions.
> >
> > -There is no way within the template to determine which alternative was
> > -chosen. However you may be able to wrap your @code{asm} statements
> with
> > +There is no way within the template to determine which alternative was
> > +chosen. However you may be able to wrap your @code{asm} statements
> with
> > builtins such as @code{__builtin_constant_p} to achieve the desired
> results.
> > @end ifclear
> >
> > @@ -3062,7 +3062,7 @@ instruction taking only the upper 16-bits of a
> > 16-bits being 0.
> >
> > @item L
> > -Integer that is valid as an immediate operand for a
> > +Integer that is valid as an immediate operand for a
> > shift instruction. Range 0 to 31.
> >
> > @item M
> > @@ -3076,7 +3076,7 @@ Integer that is valid as an immediate operand for
> > a custom instruction opcode. Range 0 to 255.
> >
> > @item P
> > -An immediate operand for R2 andchi/andci instructions.
> > +An immediate operand for R2 andchi/andci instructions.
> >
> > @item S
> > Matches immediates which are addresses in the small
> > @@ -6132,6 +6132,97 @@ floating-point mode.
> >
> > This pattern is not allowed to @code{FAIL}.
> >
> > +@cindex @code{cadd@var{m}@var{n}3} instruction pattern
> > +@item @samp{cadd@var{m}@var{n}3}
> > +Perform a vector addition of complex numbers in operand 1 with operand
> 2
> > +rotated by @var{m} degrees around the argand plane and storing the
> result in
> > +operand 0. The instruction must perform the operation on data that in
> GCC lane
> > +order where the 0th lane holds the real part and the the imaginary part in
> lane
> > +1.
>
> So I'm not sure I'm happy with defining this in terms of 'complex
> numbers'. At least we nowhere specify how complex numbers are laid out in
> vector registers. For V4DF it could be any of {real0,imag0,real1,imag1},
> {real0,real1, imag0,imag1}, {imag0,real0,imag1,real1} or
> {imag0,imag1, real0,real1}. While it's easy to give names to semantics
> this way I'd prefer to define this in terms of even/odd lanes of a
> vector. Maybe it can refer to the aarch64 example of complex number
> instructions implementing such scheme.
>
> So at _least_ we have to define the layout of complex components
> in the vector modes (preferably in a target independent way), say
> even lanes are real parts and odd lanes are imaginary parts?
>
> > +
> > +The operation is only supported for vector modes @var{n} and with
> > +rotations @var{m} of 90 or 270.
> > +
> > +This pattern is not allowed to @code{FAIL}.
> > +
> > +@cindex @code{cmla@var{m}4} instruction pattern
> > +@item @samp{cmla@var{m}4}
> > +Perform a vector floating point multiply and accumulate of complex
> numbers
> > +in operand 0, operand 1 and operand 2.
> > +
> > +The instruction must perform the operation on data that in GCC lane
> > +order where the 0th lane holds the real part and the the imaginary part in
> lane
> > +1.
> > +
> > +The operation is only supported for vector modes @var{m}.
> > +
> > +This pattern is not allowed to @code{FAIL}.
> > +
> > +@cindex @code{cmla_conj@var{m}4} instruction pattern
> > +@item @samp{cmla_conj@var{m}4}
> > +Perform a vector floating point multiply and accumulate of complex
> numbers
> > +in operand 0, operand 1 and the conjucate of operand 2.
> > +
> > +The instruction must perform the operation on data that in GCC lane
> > +order where the 0th lane holds the real part and the the imaginary part in
> lane
> > +1.
> > +
> > +The operation is only supported for vector modes @var{m}.
> > +
> > +This pattern is not allowed to @code{FAIL}.
> > +
> > +@cindex @code{cmls@var{m}4} instruction pattern
> > +@item @samp{cmls@var{m}4}
> > +Perform a vector floating point multiply and subtract of complex numbers
> > +in operand 0, operand 1 and operand 2.
> > +
> > +The instruction must perform the operation on data that in GCC lane
> > +order where the 0th lane holds the real part and the the imaginary part in
> lane
> > +1.
> > +
> > +The operation is only supported for vector modes @var{m}.
> > +
> > +This pattern is not allowed to @code{FAIL}.
> > +
> > +@cindex @code{cmls_conj@var{m}4} instruction pattern
> > +@item @samp{cmls_conj@var{m}4}
> > +Perform a vector floating point multiply and subtract of complex numbers
> > +in operand 0, operand 1 and the conjucate of operand 2.
> > +
> > +The instruction must perform the operation on data that in GCC lane
> > +order where the 0th lane holds the real part and the the imaginary part in
> lane
> > +1.
> > +
> > +The operation is only supported for vector modes @var{m}.
> > +
> > +This pattern is not allowed to @code{FAIL}.
> > +
> > +@cindex @code{cmul@var{m}4} instruction pattern
> > +@item @samp{cmul@var{m}4}
> > +Perform a vector floating point multiplication of complex numbers in
> operand 0
> > +and operand 1.
> > +
> > +The instruction must perform the operation on data that in GCC lane
> > +order where the 0th lane holds the real part and the the imaginary part in
> lane
> > +1.
> > +
> > +The operation is only supported for vector modes @var{m}.
> > +
> > +This pattern is not allowed to @code{FAIL}.
> > +
> > +@cindex @code{cmul_conj@var{m}4} instruction pattern
> > +@item @samp{cmul_conj@var{m}4}
> > +Perform a vector floating point multiplication of complex numbers in
> operand 0
> > +and the conjucate of operand 1.
> > +
> > +The instruction must perform the operation on data that in GCC lane
> > +order where the 0th lane holds the real part and the the imaginary part in
> lane
> > +1.
> > +
> > +The operation is only supported for vector modes @var{m}.
> > +
> > +This pattern is not allowed to @code{FAIL}.
> > +
> > @cindex @code{ffs@var{m}2} instruction pattern
> > @item @samp{ffs@var{m}2}
> > Store into operand 0 one plus the index of the least significant 1-bit
> > @@ -7409,7 +7500,7 @@ If this pattern is not defined, then a
> @code{memory_barrier} pattern
> > will be emitted, followed by a store of the value to the memory operand.
> >
> > @cindex @code{atomic_compare_and_swap@var{mode}} instruction
> pattern
> > -@item @samp{atomic_compare_and_swap@var{mode}}
> > +@item @samp{atomic_compare_and_swap@var{mode}}
> > This pattern, if defined, emits code for an atomic compare-and-swap
> > operation with memory model semantics. Operand 2 is the memory on
> which
> > the atomic operation is performed. Operand 0 is an output operand which
> > @@ -7424,7 +7515,7 @@ if the operation fails.
> >
> > If memory referred to in operand 2 contains the value in operand 3, then
> > operand 4 is stored in memory pointed to by operand 2 and fencing based
> on
> > -the memory model in operand 6 is issued.
> > +the memory model in operand 6 is issued.
> >
> > If memory referred to in operand 2 does not contain the value in operand
> 3,
> > then fencing based on the memory model in operand 7 is issued.
> > @@ -7500,9 +7591,9 @@ none of these are available a compare-and-swap
> loop will be used.
> > @itemx @samp{atomic_fetch_or@var{mode}},
> @samp{atomic_fetch_and@var{mode}}
> > @itemx @samp{atomic_fetch_xor@var{mode}},
> @samp{atomic_fetch_nand@var{mode}}
> > These patterns emit code for an atomic operation on memory with
> memory
> > -model semantics, and return the original value. Operand 0 is an output
> > -operand which contains the value of the memory location before the
> > -operation was performed. Operand 1 is the memory on which the atomic
> > +model semantics, and return the original value. Operand 0 is an output
> > +operand which contains the value of the memory location before the
> > +operation was performed. Operand 1 is the memory on which the atomic
> > operation is performed. Operand 2 is the second operand to the binary
> > operator. Operand 3 is the memory model to be used by the operation.
> >
> > @@ -7859,7 +7950,7 @@ simplified) from the PDP-11 target:
> > (plus:HI (match_dup 0)
> > (const_int -1)))]
> > ""
> > -
> > +
> > @{
> > if (which_alternative == 0)
> > return "sob %0,%l1";
> > diff --git a/gcc/doc/passes.texi b/gcc/doc/passes.texi
> > index
> a5ae4143a8c1293e674b499120372ee5fe5c412b..c86df5cd843084a5b7933ef99
> a23386891a7b0c1 100644
> > --- a/gcc/doc/passes.texi
> > +++ b/gcc/doc/passes.texi
> > @@ -709,7 +709,8 @@ loop.
> > The pass is implemented in @file{tree-vectorizer.c} (the main driver),
> > @file{tree-vect-loop.c} and @file{tree-vect-loop-manip.c} (loop specific
> parts
> > and general loop utilities), @file{tree-vect-slp} (loop-aware SLP
> > -functionality), @file{tree-vect-stmts.c} and @file{tree-vect-data-refs.c}.
> > +functionality), @file{tree-vect-stmts.c}, @file{tree-vect-data-refs.c} and
> > +@file{tree-vect-slp-patterns.c} containing the SLP pattern matcher.
> > Analysis of data references is in @file{tree-data-ref.c}.
> >
> > SLP Vectorization. This pass performs vectorization of straight-line code.
> The
> > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> > index
> 310d37aa53819791b5df1683afca831f08e5892a..acb7d9f3bdc757437d5492a652
> 144ba31c2ef702 100644
> > --- a/gcc/internal-fn.def
> > +++ b/gcc/internal-fn.def
> > @@ -277,12 +277,21 @@ DEF_INTERNAL_FLT_FN (SCALB, ECF_CONST,
> scalb, binary)
> > DEF_INTERNAL_FLT_FLOATN_FN (FMIN, ECF_CONST, fmin, binary)
> > DEF_INTERNAL_FLT_FLOATN_FN (FMAX, ECF_CONST, fmax, binary)
> > DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONST, xorsign, binary)
> > +DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT90, ECF_CONST, cadd90,
> binary)
> > +DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST,
> cadd270, binary)
> > +DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL, ECF_CONST, cmul, binary)
> > +DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL_CONJ, ECF_CONST,
> cmul_conj, binary)
> > +
> >
> > /* FP scales. */
> > DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
> >
> > /* Ternary math functions. */
> > DEF_INTERNAL_FLT_FLOATN_FN (FMA, ECF_CONST, fma, ternary)
> > +DEF_INTERNAL_OPTAB_FN (COMPLEX_FMA, ECF_CONST, cmla, ternary)
> > +DEF_INTERNAL_OPTAB_FN (COMPLEX_FMA_CONJ, ECF_CONST,
> cmla_conj, ternary)
> > +DEF_INTERNAL_OPTAB_FN (COMPLEX_FMS, ECF_CONST, cmls, ternary)
> > +DEF_INTERNAL_OPTAB_FN (COMPLEX_FMS_CONJ, ECF_CONST,
> cmls_conj, ternary)
> >
> > /* Unary integer ops. */
> > DEF_INTERNAL_INT_FN (CLRSB, ECF_CONST | ECF_NOTHROW, clrsb,
> unary)
> > diff --git a/gcc/optabs.def b/gcc/optabs.def
> > index
> 78409aa14537d259bf90277751aac00d452a0d3f..19db9c00896cd08adfd20a0166
> 9990bbbebd79f1 100644
> > --- a/gcc/optabs.def
> > +++ b/gcc/optabs.def
> > @@ -290,6 +290,14 @@ OPTAB_D (atan_optab, "atan$a2")
> > OPTAB_D (atanh_optab, "atanh$a2")
> > OPTAB_D (copysign_optab, "copysign$F$a3")
> > OPTAB_D (xorsign_optab, "xorsign$F$a3")
> > +OPTAB_D (cadd90_optab, "cadd90$a3")
> > +OPTAB_D (cadd270_optab, "cadd270$a3")
> > +OPTAB_D (cmul_optab, "cmul$a3")
> > +OPTAB_D (cmul_conj_optab, "cmul_conj$a3")
> > +OPTAB_D (cmla_optab, "cmla$a4")
> > +OPTAB_D (cmla_conj_optab, "cmla_conj$a4")
> > +OPTAB_D (cmls_optab, "cmls$a4")
> > +OPTAB_D (cmls_conj_optab, "cmls_conj$a4")
> > OPTAB_D (cos_optab, "cos$a2")
> > OPTAB_D (cosh_optab, "cosh$a2")
> > OPTAB_D (exp10_optab, "exp10$a2")
> > diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..7a2c64aeb9f212cd3ee18adf11
> 0e6ac6696249fd
> > --- /dev/null
> > +++ b/gcc/tree-vect-slp-patterns.c
> > @@ -0,0 +1,1844 @@
> > +/* SLP - Pattern matcher on SLP trees
> > + Copyright (C) 2020 Free Software Foundation, Inc.
> > +
> > +This file is part of GCC.
> > +
> > +GCC is free software; you can redistribute it and/or modify it under
> > +the terms of the GNU General Public License as published by the Free
> > +Software Foundation; either version 3, or (at your option) any later
> > +version.
> > +
> > +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> > +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> > +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public
> License
> > +for more details.
> > +
> > +You should have received a copy of the GNU General Public License
> > +along with GCC; see the file COPYING3. If not see
> > +<http://www.gnu.org/licenses/>. */
> > +
> > +#include "config.h"
> > +#include "system.h"
> > +#include "coretypes.h"
> > +#include "backend.h"
> > +#include "target.h"
> > +#include "rtl.h"
> > +#include "tree.h"
> > +#include "gimple.h"
> > +#include "tree-pass.h"
> > +#include "ssa.h"
> > +#include "optabs-tree.h"
> > +#include "insn-config.h"
> > +#include "recog.h" /* FIXME: for insn_data */
> > +#include "fold-const.h"
> > +#include "stor-layout.h"
> > +#include "gimple-iterator.h"
> > +#include "cfgloop.h"
> > +#include "tree-vectorizer.h"
> > +#include "langhooks.h"
> > +#include "gimple-walk.h"
> > +#include "dbgcnt.h"
> > +#include "tree-vector-builder.h"
> > +#include "vec-perm-indices.h"
> > +#include "gimple-fold.h"
> > +#include "internal-fn.h"
> > +
> > +/* SLP Pattern matching mechanism.
> > +
> > + This extension to the SLP vectorizer allows one to transform the
> generated SLP
> > + tree based on any pattern. The difference between this and the normal
> vect
> > + pattern matcher is that unlike the former, this matcher allows you to
> match
> > + with instructions that do not belong to the same SSA dominator graph.
> > +
> > + The only requirement that this pattern matcher has is that you are only
> > + only allowed to either match an entire group or none.
> > +
> > + The pattern matcher currently only allows you to perform replacements
> to
> > + internal functions.
> > +
> > + Once the patterns are matched it is one way, these cannot be undone. It
> is
> > + currently not supported to match patterns recursively.
> > +
> > + To add a new pattern, implement the vect_pattern class and add the
> type to
> > + slp_patterns.
> > +
> > +*/
> > +
> >
> +/*********************************************************
> **********************
> > + * vect_pattern class
> > +
> **********************************************************
> ********************/
> > +
> > +/* Default implementation of recognize that peforms matching, validation
> and
> > + replacement of nodes but that can be overriden if required. */
> > +
> > +bool
> > +vect_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache,
> > + vec_info *vinfo)
> > +{
> > + if (this->matches (perm_cache))
> > + {
> > + tree vectype = SLP_TREE_VECTYPE (*this->m_node);
> > + if (dump_enabled_p ())
> > + dump_printf_loc (MSG_NOTE, vect_location,
> > + "Found %s pattern in SLP tree\n",
> > + this->get_name ());
> > +
> > + if (this->is_optab_supported_p (vectype, OPTIMIZE_FOR_SPEED))
> > + {
> > + if (dump_enabled_p ())
> > + dump_printf_loc (MSG_NOTE, vect_location,
> > + "Target supports %s vectorization with
> mode %T\n",
> > + internal_fn_name (this->get_ifn ()), vectype);
> > + }
> > + else
> > + {
> > + if (dump_enabled_p ())
> > + {
> > + if (!vectype)
> > + dump_printf_loc (MSG_NOTE, vect_location,
> > + "Target does not support vector type
> for %T\n",
> > + SLP_TREE_DEF_TYPE (*this->m_node));
> > + else
> > + dump_printf_loc (MSG_NOTE, vect_location,
> > + "Target does not support %s for vector type
> "
> > + "%T\n", internal_fn_name (this->get_ifn ()),
> > + vectype);
> > + }
> > + return false;
> > + }
> > + }
> > + else
> > + return false;
> > +
> > + if (!this->validate_p (perm_cache))
> > + {
> > + if (dump_enabled_p ())
> > + dump_printf_loc (MSG_NOTE, vect_location,
> > + "transformation for %s not valid due to post "
> > + "condition\n",
> > + internal_fn_name (this->get_ifn ()));
> > + return false;
> > + }
> > +
> > + if (dump_enabled_p ())
> > + dump_printf_loc (MSG_NOTE, vect_location, "Creating vec
> patterns\n");
> > +
> > + gcall* call_stmt = this->build (vinfo);
> > + if (dump_enabled_p ())
> > + dump_printf_loc (MSG_NOTE, vect_location, "\t %p stmt: %G",
> > + *this->m_node, call_stmt);
> > +
> > + return true;
> > +}
> > +
> >
> +/*********************************************************
> **********************
> > + * General helper types
> > +
> **********************************************************
> ********************/
> > +
> > +/* The COMPLEX_OPERATION enum denotes the possible pair of
> operations that can
> > + be matched when looking for expressions that we are interested
> matching for
> > + complex numbers addition and mla. */
> > +
> > +typedef enum _complex_operation : unsigned {
> > + PLUS_PLUS,
> > + MINUS_PLUS,
> > + PLUS_MINUS,
> > + MULT_MULT,
> > + CMPLX_NONE
> > +} complex_operation_t;
> > +
> > +/* A simple pair type used for ordering of combine operations. */
> > +
> > +typedef struct map_ { int a; int b; } map_t;
> > +
> >
> +/*********************************************************
> **********************
> > + * General helper functions
> > +
> **********************************************************
> ********************/
> > +
> > +/* Helper function of linear_loads_p that checks to see if the load
> permutation
> > + is sequential and in monotonically increasing order of loads with no gaps.
> > +*/
> > +
> > +static inline bool
> > +is_linear_load_p (load_permutation_t loads)
> > +{
> > + if (loads.length() == 0)
> > + return false;
> > +
> > + unsigned leader = loads[0];
> > + unsigned load, i;
> > + FOR_EACH_VEC_ELT_FROM (loads, i, load, 1)
> > + if (load != ++leader)
> > + return false;
> > + return true;
> > +}
> > +
> > +
> > +/* Check to see if all loads rooted in ROOT are linear. Linearity is
> > + defined as having no gaps between values loaded. */
> > +
> > +static load_permutation_t
> > +linear_loads_p (slp_tree_to_load_perm_map_t *perm_cache, slp_tree
> root,
> > + bool *linear)
> > +{
> > + *linear = false;
> > + if (!root)
> > + return vNULL;
> > +
> > + unsigned i;
> > + load_permutation_t loads = vNULL;
> > + load_permutation_t *tmp;
> > +
> > + if ((tmp = perm_cache->get (root)) != NULL)
> > + {
> > + *linear = is_linear_load_p (*tmp);
> > + return *tmp;
> > + }
> > +
> > + perm_cache->put (root, vNULL);
> > +
> > + /* If it's a load node, then just read the load permute. */
> > + if (SLP_TREE_LOAD_PERMUTATION (root).exists ())
> > + {
> > + loads = SLP_TREE_LOAD_PERMUTATION (root);
> > + perm_cache->put (root, loads);
> > + if (!is_linear_load_p (loads))
> > + return loads;
> > + }
> > + else if (SLP_TREE_DEF_TYPE (root) == vect_external_def)
> > + {
> > + loads.create (SLP_TREE_LANES (root));
> > + tree op;
> > + FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_OPS (root), i, op)
> > + {
> > + gimple *defstmt = SSA_NAME_DEF_STMT (op);
> > + if (!is_gimple_assign (defstmt))
> > + return vNULL;
> > +
> > + switch (gimple_assign_rhs_code (defstmt))
> > + {
> > + case IMAGPART_EXPR:
> > + loads.safe_push (1);
> > + break;
> > + case REALPART_EXPR:
> > + loads.safe_push (0);
> > + break;
> > + default:
> > + {
> > + loads.release ();
> > + return vNULL;
> > + }
> > + }
> > + }
> > +
> > + perm_cache->put (root, loads);
> > + if (!is_linear_load_p (loads))
> > + return loads;
> > + }
> > + else if (SLP_TREE_DEF_TYPE (root) != vect_internal_def)
> > + return vNULL;
> > +
> > +
> > +
> > + auto_vec<load_permutation_t> all_loads;
> > + bool is_perm = SLP_TREE_LANE_PERMUTATION (root).exists ();
> > +
> > + slp_tree child;
> > + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (root), i, child)
> > + {
> > + /* Seed it with a value to prevent infinite recursions. */
> > + loads = linear_loads_p (perm_cache, child, linear);
> > + if ((!*linear && !is_perm) || !loads.exists ())
> > + return loads;
> > +
> > + all_loads.safe_push (loads);
> > + }
> > +
> > + if (is_perm)
> > + {
> > + lane_permutation_t perm = SLP_TREE_LANE_PERMUTATION (root);
> > + load_permutation_t nloads;
> > + nloads.create (SLP_TREE_LANES (root));
> > + nloads.quick_grow (SLP_TREE_LANES (root));
> > + for (i = 0; i < SLP_TREE_LANES (root); i++)
> > + nloads[i] = all_loads[perm[i].first][perm[i].second];
> > +
> > + perm_cache->put (root, nloads);
> > + if (!is_linear_load_p (nloads))
> > + return nloads;
> > + loads = nloads;
> > + }
> > +
> > + perm_cache->put (root, loads);
> > + *linear = true;
> > + return loads;
> > +}
> > +
> > +/* Builds a permutation group from the operands in OPS and stores it in
> BLOCKS.
> > + The group describes how to combine the operators to get a valid linear
> node.
> > +
> > + This is used when combining multiple children from a two_operators
> node into
> > + one using a lane permute to select the appropriate lane. As an example
> the
> > + permute { [0 0] [1 4] [2 2] [3 3] [1 4] [5 5] } says the nodes which occur
> > + twice in a group, e.g [0 0] only needs itself to possibly be made linear
> > + whereas [1 4] means to combine the nodes 1 and 4. */
> > +
> > +static void
> > +vect_build_perm_groups (slp_tree_to_load_perm_map_t *perm_cache,
> > + map_t *blocks, vec<slp_tree> ops)
> > +{
> > + slp_tree op;
> > + unsigned i;
> > + bool is_linear = false;
> > + unsigned min_eq = -1, max_eq = 0;
> > + unsigned min_idx = 0, max_idx = 0;
> > + FOR_EACH_VEC_ELT (ops, i, op)
> > + {
> > + load_permutation_t perms = linear_loads_p (perm_cache, op,
> &is_linear);
> > + unsigned x, imin = -1, imax = 0;
> > + for (x = 0; x < perms.length () && !is_linear; x++)
> > + {
> > + imin = MIN (imin, perms[x]);
> > + imax = MAX (imax, perms[x]);
> > + }
> > +
> > + if (imin != imax || perms.length () == 0 || is_linear)
> > + blocks[i] = {i, i};
> > + else
> > + {
> > + if (imin <= min_eq)
> > + {
> > + min_eq = imin;
> > + min_idx = i;
> > + }
> > +
> > + if (imin >= max_eq)
> > + {
> > + max_eq = imin;
> > + max_idx = i;
> > + }
> > + }
> > + }
> > +
> > + /* Now fill in the gap. */
> > + blocks[min_idx] = {min_idx, max_idx};
> > + blocks[max_idx] = {min_idx, max_idx};
> > +
> > + if (dump_enabled_p ())
> > + {
> > + dump_printf_loc (MSG_NOTE, vect_location, "pattern group: { ");
> > + for (i = 0; i < ops.length (); i++)
> > + dump_printf (MSG_NOTE,"[%d %d] ", blocks[i].a, blocks[i].b);
> > + dump_printf (MSG_NOTE,"}\n");
> > + }
> > +
> > +}
> > +
> > +/* This function attempts to make a node rooted in NODE with parent
> PARENT
> > + linear. If the node if already linear than the node itself is returned
> > + in RESULT.
> > +
> > + If the node is not linear then a new VEC_PERM_EXPR node is created
> with a
> > + lane permute that when applied will make the node linear. If such a
> > + permute cannot be created then FALSE is returned from the function.
> > +
> > + Here linearity is defined as having a sequential, monotically increasing
> > + load position inside the load permute generated by the loads reachable
> from
> > + NODE. */
> > +
> > +static bool
> > +vect_slp_make_linear (slp_tree_to_load_perm_map_t *perm_cache,
> > + slp_tree parent, slp_tree node, slp_tree *result)
> > +{
> > + bool is_linear = false;
> > + unsigned x, val;
> > + load_permutation_t load_perm = linear_loads_p (perm_cache, node,
> &is_linear);
> > + if (is_linear)
> > + {
> > + *result = node;
> > + return true;
> > + }
> > +
> > + /* Attempt to linearise the permute. */
> > + vec<std::pair<unsigned, unsigned> > zipped;
> > + zipped.create (load_perm.length ());
> > + FOR_EACH_VEC_ELT (load_perm, x, val)
> > + zipped.quick_push (std::make_pair (val, x));
> > +
> > + typedef const std::pair<unsigned, unsigned>* cmp_t;
> > + zipped.qsort ([](const void *a, const void *b) -> int
> > + { return (int)((cmp_t)a)->first - (int)((cmp_t)b)->first; });
> > +
> > + /* Verify if we have a linear permute sequence. */
> > + if (zipped.length () > 0)
> > + {
> > + unsigned leader = zipped[0].first;
> > + for (x = 1; x < zipped.length (); x++)
> > + if(!(is_linear = (zipped[x].first == ++leader)))
> > + break;
> > + }
> > +
> > + if (!is_linear)
> > + {
> > + if (dump_enabled_p ())
> > + dump_printf_loc (MSG_NOTE, vect_location,
> > + "Loads could not be made linear %p\n",
> > + node);
> > + zipped.release ();
> > + return false;
> > + }
> > +
> > + for (x = 0; x < zipped.length (); x++)
> > + zipped[x].first = 0;
> > +
> > + /* Create the new permute node and store it instead. */
> > + slp_tree vnode = vect_create_new_slp_node (vNULL, 1);
> > + SLP_TREE_CODE (vnode) = VEC_PERM_EXPR;
> > + SLP_TREE_LANE_PERMUTATION (vnode) = zipped;
> > + SLP_TREE_VECTYPE (vnode) = SLP_TREE_VECTYPE (parent);
> > + SLP_TREE_CHILDREN (vnode).quick_push (node);
> > + SLP_TREE_REF_COUNT (vnode) = 1;
> > + SLP_TREE_LANES (vnode) = SLP_TREE_LANES (node);
> > + SLP_TREE_REPRESENTATIVE (vnode) = SLP_TREE_REPRESENTATIVE
> (parent);
> > + *result = vnode;
> > + return is_linear;
> > +}
> > +
> > +/* Helper utility to check to see if the permutation PERM is one that can
> be
> > + used in a node combination operation. This is defined as the permute
> not
> > + having all the elements being the same. e.g [0 0]. */
> > +
> > +static inline bool
> > +vect_can_combine_node_p (load_permutation_t perm, bool is_linear)
> > +{
> > + if (is_linear)
> > + return false;
> > +
> > + unsigned i, x;
> > + FOR_EACH_VEC_ELT (perm, i, x)
> > + if (perm[0] != x)
> > + return false;
> > +
> > + return true;
> > +}
> > +
> > +/* This function combines the nodes in MAP together to make a new
> node using a
> > + lane permute. The nodes to combine are stored in ENTRIES and the
> resulting
> > + node is returned in RESULT.
> > +
> > + If the nodes are already linear then this function fails and returns FALSE.
> > + Otherwise it returns the new node and TRUE. */
> > +
> > +static bool
> > +vect_slp_make_combine_linear (slp_tree_to_load_perm_map_t
> *perm_cache,
> > + slp_tree parent, vec<slp_tree> entries, map_t
> map,
> > + slp_tree *result)
> > +{
> > + if (map.a == map.b)
> > + return false;
> > +
> > + slp_tree node_a = entries[map.a];
> > + slp_tree node_b = entries[map.b];
> > +
> > + bool is_a_linear = false;
> > + bool is_b_linear = false;
> > +
> > + load_permutation_t load_perm_a
> > + = linear_loads_p (perm_cache, node_a, &is_a_linear);
> > + if (!vect_can_combine_node_p (load_perm_a, is_a_linear))
> > + return false;
> > +
> > + load_permutation_t load_perm_b
> > + = linear_loads_p (perm_cache, node_b, &is_b_linear);
> > + if (!vect_can_combine_node_p (load_perm_b, is_b_linear))
> > + return false;
> > +
> > + if (!load_perm_a.exists () || !load_perm_b.exists ())
> > + return false;
> > +
> > + /* Now we need to figure which node is first. */
> > + auto_vec<slp_tree> nodes;
> > + nodes.create (2);
> > + vec<std::pair<unsigned, unsigned> > perm;
> > + perm.create (2);
> > + if (load_perm_a[0] < load_perm_b[0])
> > + {
> > + perm.quick_push (std::make_pair (0, 0));
> > + perm.quick_push (std::make_pair (1, 0));
> > + }
> > + else
> > + {
> > + perm.quick_push (std::make_pair (1, 0));
> > + perm.quick_push (std::make_pair (0, 0));
> > + }
> > +
> > + nodes.quick_push (node_a);
> > + nodes.quick_push (node_b);
> > +
> > + slp_tree vnode = vect_create_new_slp_node (vNULL, 1);
> > + SLP_TREE_CODE (vnode) = VEC_PERM_EXPR;
> > + SLP_TREE_LANE_PERMUTATION (vnode) = perm;
> > + SLP_TREE_VECTYPE (vnode) = SLP_TREE_VECTYPE (parent);
> > + SLP_TREE_CHILDREN (vnode).safe_splice (nodes);
> > + SLP_TREE_REF_COUNT (vnode) = 1;
> > + SLP_TREE_LANES (vnode) = SLP_TREE_LANES (parent);
> > + SLP_TREE_REPRESENTATIVE (vnode) = SLP_TREE_REPRESENTATIVE
> (parent);
> > + *result = vnode;
> > + return true;
> > +}
> > +
> > +/* Checks to see of the expression represented by NODE is a gimple
> assign with
> > + code CODE. */
> > +
> > +static inline bool
> > +vect_match_expression_p (slp_tree node, tree_code code)
> > +{
> > + if (!node
> > + || !SLP_TREE_REPRESENTATIVE (node))
> > + return false;
> > +
> > + gimple* expr = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE
> (node));
> > + if (!is_gimple_assign (expr)
> > + || gimple_assign_rhs_code (expr) != code)
> > + return false;
> > +
> > + return true;
> > +}
> > +
> > +/* Checks to see if the expression represented by NODE is a call to the
> internal
> > + function FN. */
> > +
> > +static inline bool
> > +vect_match_call_p (slp_tree node, internal_fn fn)
> > +{
> > + if (!node
> > + || !SLP_TREE_REPRESENTATIVE (node))
> > + return false;
> > +
> > + gimple* expr = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE
> (node));
> > + if (!gimple_call_internal_p (expr, fn))
> > + return false;
> > +
> > + return true;
> > +}
> > +
> > +/* Checks to see if NODE is a two_operands node whom's operands are
> calls to the
> > + internal function FN. */
> > +
> > +static bool
> > +vect_match_two_call_p (slp_tree node, internal_fn fn)
> > +{
> > + if (!SLP_TREE_REPRESENTATIVE (node)
> > + || SLP_TREE_CODE (node) != VEC_PERM_EXPR
> > + || SLP_TREE_CHILDREN (node).length () != 2)
> > + return false;
> > +
> > + vec<slp_tree> children = SLP_TREE_CHILDREN (node);
> > +
> > + return vect_match_call_p (children[0], fn)
> > + && vect_match_call_p (children[1], fn);
> > +}
> > +
> > +/* Check if the given lane permute in PERMUTES matches an alternating
> sequence
> > + of {P0 P1 P0 P1 ...}. This to account for unrolled loops. Further mode
> > + there resulting permute must be linear. */
> > +
> > +static bool
> > +vect_check_lane_permute (lane_permutation_t &permutes,
> > + unsigned p0, unsigned p1)
> > +{
> > + if (permutes.length () == 0)
> > + return false;
> > +
> > + unsigned val[2] = {p0, p1};
> > + unsigned seed = permutes[0].second;
> > + for (unsigned i = 0; i < permutes.length (); i++)
> > + if (permutes[i].first != val[i % 2]
> > + || permutes[i].second != seed++)
> > + return false;
> > +
> > + return true;
> > +}
> > +
> > +/* This function will match the two gimple expressions representing
> NODE1 and
> > + NODE2 in parallel and returns the pair operation that represents the two
> > + expressions in the two statements.
> > +
> > + If match is successful then the corresponding complex_operation is
> > + returned and the arguments to the two matched operations are
> returned in OPS.
> > +
> > + If TWO_OPERANDS it is expected that the LANES of the parent
> VEC_PERM select
> > + from the two nodes alternatingly.
> > +
> > + If unsuccessful then CMPLX_NONE is returned and OPS is untouched.
> > +
> > + e.g. the following gimple statements
> > +
> > + stmt 0 _39 = _37 + _12;
> > + stmt 1 _6 = _38 - _36;
> > +
> > + will return PLUS_MINUS along with OPS containing {_37, _12, _38, _36}.
> > +*/
> > +
> > +static complex_operation_t
> > +vect_detect_pair_op (slp_tree node1, slp_tree node2,
> lane_permutation_t &lanes,
> > + bool two_operands = true, vec<slp_tree> *ops = NULL)
> > +{
> > + complex_operation_t result = CMPLX_NONE;
> > +
> > + if (vect_match_expression_p (node1, MINUS_EXPR)
> > + && vect_match_expression_p (node2, PLUS_EXPR)
> > + && (!two_operands || vect_check_lane_permute (lanes, 0, 1)))
> > + result = MINUS_PLUS;
> > + else if (vect_match_expression_p (node1, PLUS_EXPR)
> > + && vect_match_expression_p (node2, MINUS_EXPR)
> > + && (!two_operands || vect_check_lane_permute (lanes, 0, 1)))
> > + result = PLUS_MINUS;
> > + else if (vect_match_expression_p (node1, PLUS_EXPR)
> > + && vect_match_expression_p (node2, PLUS_EXPR))
> > + result = PLUS_PLUS;
> > + else if (vect_match_expression_p (node1, MULT_EXPR)
> > + && vect_match_expression_p (node2, MULT_EXPR))
> > + result = MULT_MULT;
> > +
> > + if (result != CMPLX_NONE && ops != NULL)
> > + {
> > + ops->create (2);
> > + ops->quick_push (node1);
> > + ops->quick_push (node2);
> > + }
> > + return result;
> > +}
> > +
> > +/* Overload of vect_detect_pair_op that matches against the
> representative
> > + statements in the children of NODE. It is expected that NODE has
> exactly
> > + two children and when TWO_OPERANDS then NODE must be a
> VEC_PERM. */
> > +
> > +static complex_operation_t
> > +vect_detect_pair_op (slp_tree node, bool two_operands = true,
> > + vec<slp_tree> *ops = NULL)
> > +{
> > + if (!two_operands && SLP_TREE_CODE (node) == VEC_PERM_EXPR)
> > + return CMPLX_NONE;
> > +
> > + if (SLP_TREE_CHILDREN (node).length () != 2)
> > + return CMPLX_NONE;
> > +
> > + vec<slp_tree> children = SLP_TREE_CHILDREN (node);
> > + lane_permutation_t &lanes = SLP_TREE_LANE_PERMUTATION (node);
> > +
> > + return vect_detect_pair_op (children[0], children[1], lanes,
> two_operands,
> > + ops);
> > +}
> > +
> > +/* This function marks every statement that is being replaced during the
> > + the pattern matching as PURE. Normally when replacing a statement
> due
> > + to a pattern we add the statement to the
> STMT_VINFO_PATTERN_DEF_SEQ of
> > + the pattern that is replacing them. In this case however this won't
> > + work as when doing the replacement we are changing the nodes that
> are
> > + used by the statements. This means that when vectorized the SSA chain
> > + is different than in the BB.
> > +
> > + Declaring the statements as part of the sequence will then cause SSA
> > + verification to fail as we may refer to statements that were not in the
> > + original USE-DEF chain of the statement we are replacing.
> > +
> > + The relevancy of the statements are saved and are later restored if SLP
> is to
> > + be dissolved. */
> > +
> > +static void
> > +vect_mark_stmts_as_in_pattern (hash_set<slp_tree> *cache, slp_tree
> node)
> > +{
> > + if (cache->add (node))
> > + return;
> > +
> > + unsigned i;
> > + stmt_vec_info stmt_info;
> > + slp_tree child;
> > +
> > + FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt_info)
> > + {
> > + if (gimple_assign_load_p (STMT_VINFO_STMT (stmt_info)))
> > + return;
> > +
> > + vect_save_relevancy (stmt_info);
> > + STMT_VINFO_SLP_VECT_ONLY (stmt_info) = true;
> > + STMT_SLP_TYPE (stmt_info) = pure_slp;
> > + }
> > +
> > + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
> > + vect_mark_stmts_as_in_pattern (cache, child);
> > +}
> > +
> > +
> >
> +/*********************************************************
> **********************
> > + * complex_pattern class
> > +
> **********************************************************
> ********************/
> > +
> > +/* SLP Complex Numbers pattern matching.
> > +
> > + As an example, the following simple loop:
> > +
> > + double a[restrict N]; double b[restrict N]; double c[restrict N];
> > +
> > + for (int i=0; i < N; i+=2)
> > + {
> > + c[i] = a[i] - b[i+1];
> > + c[i+1] = a[i+1] + b[i];
> > + }
> > +
> > + which represents a complex addition on with a rotation of 90* around
> the
> > + argand plane. i.e. if `a` and `b` were complex numbers then this would be
> the
> > + same as `a + (b * I)`.
> > +
> > + Here the expressions for `c[i]` and `c[i+1]` are independent but have to
> be
> > + both recognized in order for the pattern to work. As an SLP tree this is
> > + represented as
> > +
> > + +--------------------------------+
> > + | stmt 0 *_9 = _10; |
> > + | stmt 1 *_15 = _16; |
> > + +--------------------------------+
> > + |
> > + |
> > + v
> > + +--------------------------------+
> > + | stmt 0 _10 = _4 - _8; |
> > + | stmt 1 _16 = _12 + _14; |
> > + | lane permutation { 0[0] 1[1] } |
> > + +--------------------------------+
> > + | |
> > + | |
> > + | |
> > + +-----+ | | +-----+
> > + | | | | | |
> > + +-----| { } |<-----+ +----->| { } --------+
> > + | | | +------------------| | |
> > + | +-----+ | +-----+ |
> > + | | | |
> > + | | | |
> > + | +------|------------------+ |
> > + | | | |
> > + v v v v
> > + +--------------------------+ +--------------------------------+
> > + | stmt 0 _8 = *_7; | | stmt 0 _4 = *_3; |
> > + | stmt 1 _14 = *_13; | | stmt 1 _12 = *_11; |
> > + | load permutation { 1 0 } | | load permutation { 0 1 } |
> > + +--------------------------+ +--------------------------------+
> > +
> > + The pattern matcher allows you to replace both statements 0 and 1 or
> none at
> > + all. Because this operation is a two operands operation the actual nodes
> > + being replaced are those in the { } nodes. The actual scalar statements
> > + themselves are not replaced or used during the matching but instead the
> > + SLP_TREE_REPRESENTATIVE statements are inspected. You are also
> allowed to
> > + replace and match on any number of nodes.
> > +
> > + Because the pattern matcher matches on the representative statement
> for the
> > + SLP node the case of two_operators it allows you to match the children
> of the
> > + node. This is done using the method `recognize ()`.
> > +
> > +*/
> > +
> > +/* The complex_pattern class contains common code for pattern
> matchers that work
> > + on complex numbers. These provide functionality to allow de-
> construction and
> > + validation of sequences depicting/transforming REAL and IMAG pairs. */
> > +
> > +class complex_pattern : public vect_pattern
> > +{
> > + protected:
> > + vec<slp_tree> *m_nodes = NULL;
> > + complex_pattern (slp_tree *node)
> > + : vect_pattern (node)
> > + { }
> > +
> > + public:
> > + bool validate_p (slp_tree_to_load_perm_map_t *);
> > + gcall *build (vec_info *);
> > +};
> > +
> > +/* The post transform and validation function for the complex number
> > + patterns. This will re-arrange the tree and re-organize the nodes such
> > + that they can be used by the complex number instructions that are to be
> > + created. It does this by doing the following steps:
> > +
> > + For each node that isn't already linear a new permute node is created
> which
> > + when applied makes the node linear. If such permutation does not exist
> then
> > + the function returns FALSE. For nodes that are already linear no work is
> > + done. */
> > +
> > +bool
> > +complex_pattern::validate_p (slp_tree_to_load_perm_map_t
> *perm_cache)
> > +{
> > + slp_tree node;
> > + unsigned ix;
> > + auto_vec<slp_tree> workset;
> > + FOR_EACH_VEC_ELT (this->m_ops, ix, node)
> > + {
> > + auto_vec<slp_tree> nodes;
> > + nodes.create (this->m_num_args);
> > + slp_tree tmp = NULL;
> > + auto_vec<slp_tree> new_nodes;
> > +
> > + unsigned i;
> > + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, tmp)
> > + {
> > + slp_tree vnode = NULL;
> > + if (vect_slp_make_linear (perm_cache, node, tmp, &vnode))
> > + {
> > + nodes.quick_push (vnode);
> > + if (vnode != tmp)
> > + new_nodes.safe_push (vnode);
> > + }
> > + else
> > + {
> > + FOR_EACH_VEC_ELT (new_nodes, i, tmp)
> > + delete tmp;
> > +
> > + FOR_EACH_VEC_ELT (workset, i, tmp)
> > + delete tmp;
> > +
> > + nodes.release();
> > + return false;
> > + }
> > + }
> > +
> > + slp_tree new_node
> > + = vect_create_new_slp_node (SLP_TREE_SCALAR_STMTS (node),
> > + SLP_TREE_CHILDREN (node).length ());
> > + SLP_TREE_VECTYPE (new_node) = SLP_TREE_VECTYPE (node);
> > + SLP_TREE_LANE_PERMUTATION (new_node) =
> SLP_TREE_LANE_PERMUTATION (node);
> > + SLP_TREE_CODE (new_node) = SLP_TREE_CODE (node);
> > + SLP_TREE_REF_COUNT (new_node) = SLP_TREE_REF_COUNT (node);
> > + SLP_TREE_REPRESENTATIVE (new_node) =
> SLP_TREE_REPRESENTATIVE (node);
> > + SLP_TREE_CHILDREN (new_node).safe_splice (nodes);
> > + SLP_TREE_LANES (new_node) = SLP_TREE_LANES (node);
> > +
> > + workset.safe_push (new_node);
> > + }
> > +
> > + FOR_EACH_VEC_ELT (workset, ix, node)
> > + SLP_TREE_CHILDREN (*this->m_node)[ix] = node;
> > +
> > + return true;
> > +}
> > +
> > +
> > +/* Create a replacement pattern statement for each node in m_node and
> inserts
> > + the new statement into m_node as the new representative statement.
> The old
> > + statement is marked as being in a pattern defined by the new statement.
> The
> > + statement is created as call to internal function IFN with m_num_args
> > + arguments.
> > +
> > + Futhermore the new pattern is also added to the vectorization
> information
> > + structure VINFO and the old statement STMT_INFO is marked as unused
> while
> > + the new statement is marked as used and the number of SLP uses of the
> new
> > + statement is incremented.
> > +
> > + The newly created SLP nodes are marked as SLP only and will be
> dissolved
> > + if SLP is aborted.
> > +
> > + The newly created gimple call is returned and the BB remains unchanged.
> > +
> > + This default method is designed to only match against simple operands
> where
> > + all the input and output types are the same.
> > +*/
> > +
> > +gcall *
> > +complex_pattern::build (vec_info *vinfo)
> > +{
> > + stmt_vec_info stmt_info;
> > +
> > + auto_vec<tree> args;
> > + args.create (this->m_num_args);
> > + args.quick_grow_cleared (this->m_num_args);
> > + slp_tree node;
> > + unsigned ix;
> > + stmt_vec_info call_stmt_info;
> > + gcall *call_stmt = NULL;
> > +
> > + FOR_EACH_VEC_ELT (*this->m_nodes, ix, node)
>
> So elsewhere I see m_nodes assigned from SLP_TREE_CHILDREN
> of m_node. I assume m_node is the VEC_PERM of the two-operator
> operation. I expected we create a call for m_node here, not
> for all children?
It depends on the pattern. For most, particularly those of two_operands nodes,
unless I have mistaken something it's the children of the node that determines
what operation is being performed isn't it?
So for ADD for instance, the tree is
node 0x49aeae0 (max_nunits=4, refcnt=2)
stmt 0 REALPART_EXPR <*_10> = _4;
stmt 1 IMAGPART_EXPR <*_10> = _22;
children 0x49b0b20
node 0x49b0b20 (max_nunits=4, refcnt=2)
stmt 0 _4 = _13 - _6;
stmt 1 _22 = _8 + _12;
lane permutation { 0[0] 1[1] }
children 0x49b0430 0x49a73c0
node 0x49b0430 (max_nunits=1, refcnt=1)
{ }
children 0x49ad660 0x499e400
node 0x49ad660 (max_nunits=4, refcnt=3)
stmt 0 _13 = REALPART_EXPR <*_3>;
stmt 1 _12 = IMAGPART_EXPR <*_3>;
load permutation { 0 1 }
node 0x499e400 (max_nunits=4, refcnt=3)
stmt 0 _6 = IMAGPART_EXPR <*_5>;
stmt 1 _8 = REALPART_EXPR <*_5>;
load permutation { 1 0 }
node 0x49a73c0 (max_nunits=1, refcnt=1)
{ }
children 0x49ad660 0x499e400
where the VEC_PERM node itself just contains the permute and the children of it
0x49b0430 and 0x49a73c0 contain the operations - and + and from what I understood
the lane permute defined how the input is deinterleaved between the two nodes.
So when replacing them it's those two that I need to change and not VEC_PERM itself
otherwise the node could contain a permute and an operation that needs to be vectorized?
Or did you mean I should dissolve the two_operand node completely at this point? I was trying
to minimize the number of changes done on each replacement, because then it means I can assume
that any information I have already calculated of the children of the node being replaced is still the same
when doing recursive matching.
The tree directly out of the matcher is
node 0x49aeae0 (max_nunits=4, refcnt=2)
stmt 0 REALPART_EXPR <*_10> = _4;
stmt 1 IMAGPART_EXPR <*_10> = _22;
children 0x49b0b20
node 0x49b0b20 (max_nunits=4, refcnt=2)
stmt 0 _4 = _13 - _6;
stmt 1 _22 = _8 + _12;
lane permutation { 0[0] 1[1] }
children 0x4ade0c0 0x4ade220
node 0x4ade0c0 (max_nunits=1, refcnt=1)
{ }
SLP Only pattern:
slp_patt_41 = .COMPLEX_ADD_ROT90 (_pa_43, _pa_42);
children 0x49ad660 0x49b60f0
node 0x49ad660 (max_nunits=4, refcnt=3)
stmt 0 _13 = REALPART_EXPR <*_3>;
stmt 1 _12 = IMAGPART_EXPR <*_3>;
load permutation { 0 1 }
node 0x49b60f0 (max_nunits=1, refcnt=1)
{ }
lane permutation { 0[1] 0[0] }
children 0x499e400
node 0x499e400 (max_nunits=4, refcnt=3)
stmt 0 _6 = IMAGPART_EXPR <*_5>;
stmt 1 _8 = REALPART_EXPR <*_5>;
load permutation { 1 0 }
node 0x4ade220 (max_nunits=1, refcnt=1)
{ }
SLP Only pattern:
slp_patt_32 = .COMPLEX_ADD_ROT90 (_pa_40, _pa_33);
children 0x49ad660 0x4ade180
node 0x4ade180 (max_nunits=1, refcnt=1)
{ }
lane permutation { 0[1] 0[0] }
children 0x499e400
The one exception to this is when the node being replaced isn't a two_operands node,
like is the case with FMA. In that case the node is just an ADD node which itself is replaced.
This is why it's a variable, so I can tell it where to perform the replacement.
Thanks,
Tamar
>
> > + {
> > + /* Calculate the location of the statement in NODE to replace. */
> > + stmt_info = SLP_TREE_REPRESENTATIVE (node);
> > + gimple* old_stmt = STMT_VINFO_STMT (stmt_info);
> > + tree type = TREE_TYPE (gimple_get_lhs (old_stmt));
> > +
> > + /* Create the argument set for use by gimple_build_call_internal_vec.
> */
> > + for (unsigned i = 0; i < this->m_num_args; i++)
> > + {
> > + tree var = make_temp_ssa_name (type, old_stmt, "_pa");
>
> Why do you need to create N new SSA names (all with the same
> definition statement)? It looks like this new "scalar" stmt
> is only a template stored in SLP_TREE_REPRESENTATIVE
> and supposedly there needs to be "something" in the actual
> arguments it make us not crash later? Can't you for example
> simply push the old_stmt LHS N times?
>
> For VEC_PERM I originally wanted to set SLP_TREE_REPRESENTATIVE
> to NULL_TREE but we don't like that obviously. So long-term
> we might want to change SLP_TREE_CODE to a match_code to allow
> tree codes and internal-functions and "ignore" the representative
> in case it is not ERROR_MARK.
>
> > + args[i] = var;
> > + }
> > +
> > + /* Create the new pattern statements. */
> > + call_stmt = gimple_build_call_internal_vec (this->m_ifn, args);
> > + tree var = make_temp_ssa_name (type, call_stmt, "slp_patt");
> > + gimple_call_set_lhs (call_stmt, var);
> > + gimple_set_location (call_stmt, gimple_location (old_stmt));
> > + gimple_call_set_nothrow (call_stmt, true);
> > +
> > + /* Adjust the book-keeping for the new and old statements for use
> during
> > + SLP. This is required to get the right VF and statement during SLP
> > + analysis. These changes are created after relevancy has been set for
> > + the nodes as such we need to manually update them. Any changes
> will be
> > + undone if SLP is cancelled. */
> > + call_stmt_info
> > + = vinfo->add_pattern_stmt (call_stmt, stmt_info);
> > +
> > + /* add_pattern_stmt can't be done in vect_mark_pattern_stmts
> because
> > + the non-SLP pattern matchers already have added the statement to
> VINFO
> > + by the time it is called. Some of them need to modify the returned
> > + stmt_info. vect_mark_pattern_stmts is called by recog_pattern and
> it
> > + would increase the size of each pattern with boilerplate code to
> make
> > + the call there. */
> > + vect_mark_pattern_stmts (vinfo, stmt_info, call_stmt,
> > + SLP_TREE_VECTYPE (node));
> > + STMT_SLP_TYPE (call_stmt_info) = pure_slp;
> > + STMT_VINFO_SLP_VECT_ONLY (call_stmt_info) = true;
> > +
> > + /* Since we are replacing all the statements in the group with the same
> > + thing it doesn't really matter. So just set it every time a new stmt
> > + is created. */
> > + SLP_TREE_REPRESENTATIVE (node) = call_stmt_info;
> > + SLP_TREE_CODE (node) = CALL_EXPR;
> > + }
> > + return call_stmt;
> > +}
> > +
> >
> +/*********************************************************
> **********************
> > + * complex_add_pattern class
> > +
> **********************************************************
> ********************/
> > +
> > +class complex_add_pattern : public complex_pattern
> > +{
> > + protected:
> > + complex_add_pattern (slp_tree *node)
> > + : complex_pattern (node)
> > + {
> > + this->m_num_args = 2;
> > + }
> > +
> > + public:
> > + static vect_pattern* create (slp_tree *node)
> > + {
> > + return new complex_add_pattern (node);
> > + }
> > +
> > + const char* get_name ()
> > + {
> > + return "Complex Addition";
> > + }
> > +
> > + bool matches (slp_tree_to_load_perm_map_t *);
> > + bool matches (complex_operation_t op,
> slp_tree_to_load_perm_map_t *,
> > + vec<slp_tree> ops);
> > +};
> > +
> > +/* Pattern matcher for trying to match complex addition pattern in SLP
> tree.
> > +
> > + If no match is found then IFN is set to IFN_LAST.
> > + This function matches the patterns shaped as:
> > +
> > + c[i] = a[i] - b[i+1];
> > + c[i+1] = a[i+1] + b[i];
> > +
> > + If a match occurred then TRUE is returned, else FALSE. The initial match
> is
> > + expected to be in OP1 and the initial match operands in args0. */
> > +
> > +bool
> > +complex_add_pattern::matches (complex_operation_t op,
> > + slp_tree_to_load_perm_map_t *perm_cache,
> > + vec<slp_tree> ops)
> > +{
> > + this->m_ifn = IFN_LAST;
> > + this->m_ops.safe_splice (ops);
> > +
> > + /* Find the two components. Rotation in the complex plane will modify
> > + the operations:
> > +
> > + * Rotation 0: + +
> > + * Rotation 90: - +
> > + * Rotation 180: - -
> > + * Rotation 270: + -
> > +
> > + Rotation 0 and 180 can be handled by normal SIMD code, so we don't
> need
> > + to care about them here. */
> > + if (op == MINUS_PLUS)
> > + this->m_ifn = IFN_COMPLEX_ADD_ROT90;
> > + else if (op == PLUS_MINUS)
> > + this->m_ifn = IFN_COMPLEX_ADD_ROT270;
> > +
> > + /* verify that there is a permute, otherwise this isn't a pattern we
> > + we support. */
> > + bool is_linear = false;
> > + bool all_linear = true;
> > + unsigned x;
> > + for (x = 0; x < this->m_ops.length () && all_linear; x++)
> > + {
> > + linear_loads_p (perm_cache, this->m_ops[x], &is_linear);
> > + all_linear = all_linear && is_linear;
> > + }
> > +
> > + if (this->m_ifn == IFN_LAST || all_linear)
> > + return false;
> > +
> > + this->m_nodes = &SLP_TREE_CHILDREN (*this->m_node);
> > + return true;
> > +}
> > +
> > +bool
> > +complex_add_pattern::matches (slp_tree_to_load_perm_map_t
> *perm_cache)
> > +{
> > + complex_operation_t op
> > + = vect_detect_pair_op (*this->m_node, true, &this->m_ops);
> > + return matches (op, perm_cache, this->m_ops);
> > +}
> > +
> >
> +/*********************************************************
> **********************
> > + * complex_mul_pattern
> > +
> **********************************************************
> ********************/
> > +
> > +/* Helper function of that looks for a match in the CHILDth child of NODE.
> The
> > + child used is stored in RES.
> > +
> > + If the match is successful then ARGS will contain the operands matched
> > + and the complex_operation_t type is returned. If match is not
> successful
> > + then CMPLX_NONE is returned and ARGS is left unmodified. */
> > +
> > +static inline complex_operation_t
> > +vect_match_call_complex_mla (slp_tree node, unsigned child,
> > + vec<slp_tree> *args = NULL, slp_tree *res = NULL)
> > +{
> > + gcc_assert (child < SLP_TREE_CHILDREN (node).length ());
> > +
> > + slp_tree data = SLP_TREE_CHILDREN (node)[child];
> > +
> > + if (res)
> > + *res = data;
> > +
> > + return vect_detect_pair_op (data, false, args);
> > +}
> > +
> > +/* Check to see if either of the trees in ARGS are a NEGATE_EXPR. If the
> first
> > + child (args[0]) is a NEGATE_EXPR then NEG_FIRST_P is set to TRUE.
> > +
> > + If a negate is found then the values in ARGS are reordered such that the
> > + negate node is always the second one and the entry is replaced by the
> child
> > + of the negate node. */
> > +
> > +static inline bool
> > +vect_normalize_conj_loc (vec<slp_tree> args, bool *neg_first_p = NULL)
> > +{
> > + gcc_assert (args.length () == 2);
> > + bool neg_found = false;
> > +
> > + if (vect_match_expression_p (args[0], NEGATE_EXPR))
> > + {
> > + std::swap (args[0], args[1]);
> > + neg_found = true;
> > + if (neg_first_p)
> > + *neg_first_p = true;
> > + }
> > + else if (vect_match_expression_p (args[1], NEGATE_EXPR))
> > + {
> > + neg_found = true;
> > + if (neg_first_p)
> > + *neg_first_p = false;
> > + }
> > +
> > + if (neg_found)
> > + args[1] = SLP_TREE_CHILDREN (args[1])[0];
> > +
> > + return neg_found;
> > +}
> > +
> > +/* Helper function that checks to see if the load permutation PERM is on
> that
> > + belongs to a duplication operation. i.e. checks to see if all the values in
> > + PERM are the same. */
> > +
> > +static inline bool
> > +vect_is_dup (load_permutation_t perm, unsigned idx)
> > +{
> > + for (unsigned i = 0; i < perm.length (); i++)
> > + if (perm[i] != idx)
> > + return false;
> > +
> > + return true;
> > +}
> > +
> > +/* Helper function that checks to see if LEFT_OP and RIGHT_OP are both
> MULT_EXPR
> > + nodes but also that they represent an operation that is either a complex
> > + multiplication or a complex multiplication by conjucated value.
> > +
> > + Of the negation is expected to be in the first half of the tree (As
> required
> > + by an FMS pattern) then NEG_FIRST is true. If the operation is a
> conjucate
> > + operation then CONJ_FIRST_OPERAND is set to indicate whether the
> first or
> > + second operand contains the conjucate operation. */
> > +
> > +static inline bool
> > +vect_validate_multiplication (slp_tree_to_load_perm_map_t
> *perm_cache,
> > + vec<slp_tree> left_op, vec<slp_tree> right_op,
> > + bool neg_first, bool *conj_first_operand)
> > +{
> > + /* The presence of a negation indicates that we have either a conjucate
> or a
> > + rotation. We need to distinguish which one. */
> > + *conj_first_operand = false;
> > +
> > + load_permutation_t perm;
> > + bool is_linear;
> > + /* Complex conjucates have the negation on the imaginary part of the
> > + number where rotations affect the real component. So check if the
> > + negation is on a dup of lane 1. */
> > + perm = linear_loads_p (perm_cache, right_op[1], &is_linear);
> > + if (is_linear || !vect_is_dup (perm, 1))
> > + return false;
> > +
> > + /* Check if the conjucate is on the second first or second operand. The
> > + order of the node with the conjucate value determines this, and the
> dup
> > + node must be one of lane 0 of the same DR as the neg node. */
> > + perm = linear_loads_p (perm_cache, left_op[0], &is_linear);
> > + if (is_linear)
> > + {
> > + perm = linear_loads_p (perm_cache, left_op[1], &is_linear);
> > + if (is_linear)
> > + return false;
> > + }
> > + else if (!neg_first)
> > + *conj_first_operand = true;
> > + else
> > + return false;
> > +
> > + if (!vect_is_dup (perm, 0))
> > + return false;
> > +
> > + return true;
> > +}
> > +
> > +/* Helper function to help distinguish between a conjucate and a rotation
> in a
> > + complex multiplication. The operations have similar shapes but the
> order of
> > + the load permutes are different. This function returns TRUE when the
> order
> > + is consistent with a multiplication or multiplication by conjucated
> > + operand but returns FALSE if it's a multiplication by rotated operand. */
> > +
> > +static inline bool
> > +vect_validate_multiplication (slp_tree_to_load_perm_map_t
> *perm_cache,
> > + vec<slp_tree> op, unsigned idx)
> > +{
> > + /* The left node is the more common case, test it first. */
> > + bool is_linear;
> > + load_permutation_t perm = linear_loads_p (perm_cache, op[0],
> &is_linear);
> > + if (is_linear || !vect_is_dup (perm, idx))
> > + {
> > + perm = linear_loads_p (perm_cache, op[1], &is_linear);
> > + if (is_linear || !vect_is_dup (perm, idx))
> > + return false;
> > + }
> > +
> > + return true;
> > +}
> > +
> > +class complex_mul_pattern : public complex_pattern
> > +{
> > + protected:
> > + /* Allocate enough space for FMA as well. */
> > + map_t m_blocks[6] = {};
> > + bool m_inplace = false;
> > + complex_mul_pattern (slp_tree *node)
> > + : complex_pattern (node)
> > + {
> > + this->m_num_args = 2;
> > + }
> > +
> > + public:
> > + static vect_pattern* create (slp_tree *node)
> > + {
> > + return new complex_mul_pattern (node);
> > + }
> > +
> > + const char* get_name ()
> > + {
> > + return "Complex Multiplication";
> > + }
> > +
> > + bool validate_p (slp_tree_to_load_perm_map_t *);
> > + bool matches (slp_tree_to_load_perm_map_t *);
> > + bool matches (complex_operation_t, slp_tree_to_load_perm_map_t *,
> > + vec<slp_tree>);
> > +};
> > +
> > +/* Pattern matcher for trying to match complex multiply pattern in SLP
> tree
> > + If the operation matches then IFN is set to the operation it matched
> > + and the arguments to the two replacement statements are put in
> m_ops.
> > +
> > + If no match is found then IFN is set to IFN_LAST and m_ops is unchanged.
> > +
> > + This function matches the patterns shaped as:
> > +
> > + double ax = (b[i+1] * a[i]);
> > + double bx = (a[i+1] * b[i]);
> > +
> > + c[i] = c[i] - ax;
> > + c[i+1] = c[i+1] + bx;
> > +
> > + If a match occurred then TRUE is returned, else FALSE. The initial match
> is
> > + expected to be in OP1 and the initial match operands in args0. */
> > +
> > +bool
> > +complex_mul_pattern::matches (complex_operation_t op,
> > + slp_tree_to_load_perm_map_t *perm_cache,
> > + vec<slp_tree> /* ops */)
> > +{
> > + this->m_ifn = IFN_LAST;
> > +
> > + if (op != MINUS_PLUS)
> > + return false;
> > +
> > + slp_tree root = *this->m_node;
> > + /* First two nodes must be a multiply. */
> > + auto_vec<slp_tree> muls;
> > + if (vect_match_call_complex_mla (root, 0) != MULT_MULT
> > + || vect_match_call_complex_mla (root, 1, &muls) != MULT_MULT)
> > + return false;
> > +
> > + /* Now operand2+4 may lead to another expression. */
> > + auto_vec<slp_tree> left_op, right_op;
> > + left_op.safe_splice (SLP_TREE_CHILDREN (muls[0]));
> > + right_op.safe_splice (SLP_TREE_CHILDREN (muls[1]));
> > +
> > + bool neg_first;
> > + bool is_neg = vect_normalize_conj_loc (right_op, &neg_first);
> > +
> > + if (!is_neg)
> > + {
> > + /* A multiplication needs to multiply agains the real pair, otherwise
> > + the pattern matches that of FMS. */
> > + if (!vect_validate_multiplication (perm_cache, left_op, 0))
> > + return false;
> > + this->m_ifn = IFN_COMPLEX_MUL;
> > + this->m_ops.safe_splice (left_op);
> > + this->m_ops.safe_splice (right_op);
> > + }
> > + else if (is_neg)
> > + {
> > + bool conj_first_operand;
> > + if (!vect_validate_multiplication (perm_cache, left_op, right_op,
> > + neg_first, &conj_first_operand))
> > + return false;
> > +
> > + this->m_ifn = IFN_COMPLEX_MUL_CONJ;
> > + /* Check if the conjucate is on the first or second parameter. */
> > + if (!conj_first_operand)
> > + {
> > + this->m_ops.safe_push (left_op[1]);
> > + this->m_ops.safe_push (left_op[0]);
> > + }
> > + else
> > + {
> > + this->m_ops.safe_push (left_op[0]);
> > + this->m_ops.safe_push (left_op[1]);
> > + }
> > + this->m_ops.safe_push (right_op[1]);
> > + this->m_ops.safe_push (right_op[0]);
> > + }
> > +
> > + vect_build_perm_groups (perm_cache, &this->m_blocks[0], this-
> >m_ops);
> > + this->m_nodes = &SLP_TREE_CHILDREN (*this->m_node);
> > +
> > + return true;
> > +}
> > +
> > +bool
> > +complex_mul_pattern::matches (slp_tree_to_load_perm_map_t
> *perm_cache)
> > +{
> > + complex_operation_t op
> > + = vect_detect_pair_op (*this->m_node);
> > + return matches (op, perm_cache, this->m_ops);
> > +}
> > +
> > +
> > +/* Validates to see if the Complex MUL that we have matched is valid.
> This is
> > + done through a combination of making nodes linear and combining
> nodes. */
> > +
> > +bool
> > +complex_mul_pattern::validate_p (slp_tree_to_load_perm_map_t
> *perm_cache)
> > +{
> > + slp_tree node;
> > + unsigned ix;
> > + hash_set<slp_tree> cache;
> > + auto_vec<slp_tree> workset;
> > + FOR_EACH_VEC_ELT (*this->m_nodes, ix, node)
> > + {
> > + auto_vec<slp_tree> nodes;
> > + nodes.create (this->m_num_args);
> > + slp_tree tmp = NULL;
> > + auto_vec<slp_tree> new_nodes;
> > +
> > + unsigned i;
> > + for (i = 0; i < this->m_num_args; i++)
> > + {
> > + unsigned index = (ix * this->m_num_args) + i;
> > + map_t map = this->m_blocks[index];
> > + slp_tree vnode = NULL;
> > + bool needs_linear = map.a == map.b;
> > + tmp = this->m_ops[index];
> > + cache.add (tmp);
> > + if (needs_linear
> > + && vect_slp_make_linear (perm_cache, node, tmp, &vnode))
> > + {
> > + nodes.quick_push (vnode);
> > + if (vnode != tmp)
> > + new_nodes.safe_push (vnode);
> > + }
> > + else if (!needs_linear
> > + && vect_slp_make_combine_linear (perm_cache, node,
> > + this->m_ops, map,
> &vnode))
> > + {
> > + nodes.quick_push (vnode);
> > + if (vnode != tmp)
> > + new_nodes.safe_push (vnode);
> > + }
> > + else
> > + {
> > + if (dump_enabled_p ())
> > + dump_printf_loc (MSG_NOTE, vect_location,
> > + "stmts could not be made %s %p\n",
> > + needs_linear ? "linear" : "linear/combined",
> > + tmp);
> > + FOR_EACH_VEC_ELT (new_nodes, i, tmp)
> > + delete tmp;
> > +
> > + if (!m_inplace)
> > + FOR_EACH_VEC_ELT (workset, i, tmp)
> > + delete tmp;
> > +
> > + nodes.release();
> > + return false;
> > + }
> > +
> > + vect_mark_stmts_as_in_pattern (&cache, node);
> > + }
> > +
> > + if (m_inplace)
> > + {
> > + workset.safe_splice (nodes);
> > + }
> > + else
> > + {
> > + slp_tree new_node
> > + = vect_create_new_slp_node (SLP_TREE_SCALAR_STMTS (node),
> > + SLP_TREE_CHILDREN (node).length
> ());
> > + SLP_TREE_VECTYPE (new_node) = SLP_TREE_VECTYPE (node);
> > + SLP_TREE_LANE_PERMUTATION (new_node)
> > + = SLP_TREE_LANE_PERMUTATION (node);
> > + SLP_TREE_CODE (new_node) = SLP_TREE_CODE (node);
> > + SLP_TREE_REF_COUNT (new_node) = SLP_TREE_REF_COUNT
> (node);
> > + SLP_TREE_REPRESENTATIVE (new_node) =
> SLP_TREE_REPRESENTATIVE (node);
> > + SLP_TREE_CHILDREN (new_node).safe_splice (nodes);
> > + SLP_TREE_LANES (new_node) = SLP_TREE_LANES (node);
> > +
> > + workset.safe_push (new_node);
> > + }
> > + }
> > +
> > + if (m_inplace)
> > + {
> > + SLP_TREE_CHILDREN (*this->m_node).truncate (0);
> > + SLP_TREE_CHILDREN (*this->m_node).safe_splice (workset);
> > + }
> > + else
> > + {
> > + FOR_EACH_VEC_ELT (workset, ix, node)
> > + SLP_TREE_CHILDREN (*this->m_node)[ix] = node;
> > + }
> > +
> > + return true;
> > +}
> > +
> >
> +/*********************************************************
> **********************
> > + * complex_fma_pattern class
> > +
> **********************************************************
> ********************/
> > +
> > +class complex_fma_pattern : public complex_mul_pattern
> > +{
> > + protected:
> > + /* Temporary nodes cache as scope of the class so we don't
> > + have to worry about freeing it. */
> > + auto_vec<slp_tree> m_ncache;
> > + /* List of nodes to mark as no longer being a mul. The first
> > + one is chosen and the last one is dangling. */
> > + auto_vec<slp_tree> m_mul_nodes;
> > + complex_fma_pattern (slp_tree *node)
> > + : complex_mul_pattern (node)
> > + {
> > + this->m_num_args = 3;
> > + }
> > +
> > + public:
> > + static vect_pattern* create (slp_tree *node)
> > + {
> > + return new complex_fma_pattern (node);
> > + }
> > +
> > + const char* get_name ()
> > + {
> > + return "Complex FMA";
> > + }
> > +
> > + bool matches (slp_tree_to_load_perm_map_t *);
> > + bool matches (complex_operation_t, slp_tree_to_load_perm_map_t *,
> > + vec<slp_tree> ops);
> > + bool validate_p (slp_tree_to_load_perm_map_t *);
> > +};
> > +
> > +/* Helper function to "reset" a previously matched node and undo the
> changes
> > + made enough so that the node is treated as an irrelevant node. */
> > +
> > +static inline void
> > +vect_slp_reset_pattern (slp_tree node)
> > +{
> > + stmt_vec_info stmt_info = vect_orig_stmt (SLP_TREE_REPRESENTATIVE
> (node));
> > + STMT_VINFO_IN_PATTERN_P (stmt_info) = false;
> > + STMT_SLP_TYPE (stmt_info) = pure_slp;
> > + SLP_TREE_REPRESENTATIVE (node) = stmt_info;
> > +}
> > +
> > +/* Pattern matcher for trying to match complex multiply and accumulate
> > + and multiply and subtract patterns in SLP tree.
> > + If the operation matches then IFN is set to the operation it matched and
> > + the arguments to the two replacement statements are put in m_ops.
> > +
> > + If no match is found then IFN is set to IFN_LAST and m_ops is unchanged.
> > +
> > + This function matches the patterns shaped as:
> > +
> > + double ax = (b[i+1] * a[i]) + (b[i] * a[i]);
> > + double bx = (a[i+1] * b[i]) - (a[i+1] * b[i+1]);
> > +
> > + c[i] = c[i] - ax;
> > + c[i+1] = c[i+1] + bx;
> > +
> > + If a match occurred then TRUE is returned, else FALSE. The match is
> > + performed after COMPLEX_MUL which would have done the majority of
> the work.
> > + This function merely matches an ADD with a COMPLEX_MUL IFN. The
> initial
> > + match is expected to be in OP1 and the initial match operands in args0.
> */
> > +
> > +bool
> > +complex_fma_pattern::matches (complex_operation_t op1,
> > + slp_tree_to_load_perm_map_t * /* perm_cache
> */,
> > + vec<slp_tree> /* args0 */)
> > +{
> > + this->m_ifn = IFN_LAST;
> > +
> > + /* Find the two components. We match Complex MUL first which
> reduces the
> > + amount of work this pattern has to do. After that we just match the
> > + head node and we're done.:
> > +
> > + * FMA: + +.
> > +
> > + We need to ignore the two_operands nodes that may also match.
> > + For that we can check if they have any scalar statements and also
> > + check that it's not a permute node as we're looking for a normal
> > + PLUS_EXPR operation. */
> > + if (op1 != CMPLX_NONE)
> > + return false;
> > +
> > + /* Find the two components. We match Complex MUL first which
> reduces the
> > + amount of work this pattern has to do. After that we just match the
> > + head node and we're done.:
> > +
> > + * FMA: + + on a non-two_operands node. */
> > + if (SLP_TREE_LANE_PERMUTATION (*this->m_node).exists ()
> > + || !SLP_TREE_SCALAR_STMTS (*this->m_node).exists ()
> > + || !vect_match_expression_p (*this->m_node, PLUS_EXPR))
> > + return false;
> > +
> > + slp_tree node = SLP_TREE_CHILDREN (*this->m_node)[1];
> > +
> > + if (vect_match_two_call_p (node, IFN_COMPLEX_MUL))
> > + this->m_ifn = IFN_COMPLEX_FMA;
> > + else if (vect_match_two_call_p (node, IFN_COMPLEX_MUL_CONJ))
> > + this->m_ifn = IFN_COMPLEX_FMA_CONJ;
> > + else
> > + return false;
> > +
> > + this->m_mul_nodes.safe_splice (SLP_TREE_CHILDREN (node));
> > + node = SLP_TREE_CHILDREN (node)[0];
> > + this->m_ops.create (this->m_num_args);
> > +
> > + if (this->m_ifn == IFN_COMPLEX_FMA)
> > + {
> > + this->m_ops.quick_push (SLP_TREE_CHILDREN (*this->m_node)[0]);
> > + this->m_ops.quick_push (SLP_TREE_CHILDREN (node)[1]);
> > + this->m_ops.quick_push (SLP_TREE_CHILDREN (node)[0]);
> > + }
> > + else
> > + {
> > + this->m_ops.quick_push (SLP_TREE_CHILDREN (*this->m_node)[0]);
> > + this->m_ops.quick_push (SLP_TREE_CHILDREN (node)[0]);
> > + this->m_ops.quick_push (SLP_TREE_CHILDREN (node)[1]);
> > + }
> > +
> > + this->m_ncache.safe_push (*this->m_node);
> > + this->m_nodes = &this->m_ncache;
> > +
> > + return true;
> > +}
> > +
> > +bool
> > +complex_fma_pattern::matches (slp_tree_to_load_perm_map_t
> *perm_cache)
> > +{
> > + auto_vec<slp_tree> args0;
> > + complex_operation_t op
> > + = vect_detect_pair_op (*this->m_node, true, &args0);
> > + return matches (op, perm_cache, args0);
> > +}
> > +
> > +bool
> > +complex_fma_pattern::validate_p (slp_tree_to_load_perm_map_t
> *perm_cache)
> > +{
> > + /* FMA's top level node is + + and so is not two_operator,
> > + we only need one child. */
> > + slp_tree node;
> > + auto_vec<slp_tree> nodes;
> > + auto_vec<slp_tree> new_nodes;
> > + nodes.create (this->m_num_args);
> > + hash_set<slp_tree> cache;
> > + unsigned ix;
> > + FOR_EACH_VEC_ELT (this->m_ops, ix, node)
> > + {
> > + slp_tree vnode = NULL;
> > + if (vect_slp_make_linear (perm_cache, *this->m_node, node,
> &vnode))
> > + {
> > + nodes.quick_push (vnode);
> > + if (vnode != node)
> > + new_nodes.safe_push (vnode);
> > + if (SLP_TREE_CHILDREN (node).length () == 2)
> > + {
> > + cache.add(SLP_TREE_CHILDREN (node)[0]);
> > + cache.add(SLP_TREE_CHILDREN (node)[1]);
> > + }
> > + else
> > + cache.add (node);
> > + }
> > + else
> > + {
> > + FOR_EACH_VEC_ELT (new_nodes, ix, node)
> > + delete node;
> > +
> > + nodes.release();
> > + return false;
> > + }
> > + }
> > +
> > + SLP_TREE_CHILDREN (*this->m_node).truncate (0);
> > + SLP_TREE_CHILDREN (*this->m_node).safe_splice (nodes);
> > +
> > + /* Ignore the part of the tree that becomes irrelevant for FMA. */
> > + vect_slp_reset_pattern (this->m_mul_nodes[0]);
> > + vect_slp_reset_pattern (this->m_mul_nodes[1]);
> > + vect_mark_stmts_as_in_pattern (&cache, this->m_mul_nodes[1]);
> > +
> > + return true;
> > +}
> > +
> > +
> >
> +/*********************************************************
> **********************
> > + * complex_fms_pattern class
> > +
> **********************************************************
> ********************/
> > +
> > +class complex_fms_pattern : public complex_mul_pattern
> > +{
> > + protected:
> > + complex_fms_pattern (slp_tree *node)
> > + : complex_mul_pattern (node)
> > + {
> > + this->m_num_args = 3;
> > + }
> > +
> > + public:
> > + static vect_pattern* create (slp_tree *node)
> > + {
> > + return new complex_fms_pattern (node);
> > + }
> > +
> > + const char* get_name ()
> > + {
> > + return "Complex FMS";
> > + }
> > +
> > + bool matches (slp_tree_to_load_perm_map_t *);
> > + bool matches (complex_operation_t, slp_tree_to_load_perm_map_t *,
> > + vec<slp_tree>);
> > +};
> > +
> > +/* Pattern matcher for trying to match complex multiply and accumulate
> > + and multiply and subtract patterns in SLP tree.
> > + If the operation matches then IFN is set to the operation it matched and
> > + the arguments to the two replacement statements are put in m_ops.
> > +
> > + If no match is found then IFN is set to IFN_LAST and m_ops is unchanged.
> > +
> > + This function matches the patterns shaped as:
> > +
> > + double ax = (b[i+1] * a[i]) + (b[i] * a[i]);
> > + double bx = (a[i+1] * b[i]) - (a[i+1] * b[i+1]);
> > +
> > + c[i] = c[i] - ax;
> > + c[i+1] = c[i+1] + bx;
> > +
> > + If a match occurred then TRUE is returned, else FALSE. The initial match
> is
> > + expected to be in OP1 and the initial match operands in args0. */
> > +bool
> > +complex_fms_pattern::matches (complex_operation_t op1,
> > + slp_tree_to_load_perm_map_t *perm_cache,
> > + vec<slp_tree> args0)
> > +{
> > + this->m_ifn = IFN_LAST;
> > +
> > + /* Find the two components. We match Complex MUL first which
> reduces the
> > + amount of work this pattern has to do. After that we just match the
> > + head node and we're done.:
> > +
> > + * FMS: - +. */
> > + slp_tree child = NULL;
> > +
> > + /* We need to ignore the two_operands nodes that may also match,
> > + for that we can check if they have any scalar statements and also
> > + check that it's not a permute node as we're looking for a normal
> > + PLUS_EXPR operation. */
> > + if (op1 != PLUS_MINUS)
> > + return false;
> > +
> > + child = SLP_TREE_CHILDREN (args0[1])[1];
> > + if (vect_detect_pair_op (child) != MINUS_PLUS)
> > + return false;
> > +
> > + /* First two nodes must be a multiply. */
> > + auto_vec<slp_tree> muls;
> > + if (vect_match_call_complex_mla (child, 0) != MULT_MULT
> > + || vect_match_call_complex_mla (child, 1, &muls) != MULT_MULT)
> > + return false;
> > +
> > + /* Now operand2+4 may lead to another expression. */
> > + auto_vec<slp_tree> left_op, right_op;
> > + left_op.safe_splice (SLP_TREE_CHILDREN (muls[1]));
> > + right_op.safe_splice (SLP_TREE_CHILDREN (muls[0]));
> > +
> > + bool is_neg = vect_normalize_conj_loc (right_op);
> > +
> > + this->m_ops.create (6);
> > + child = SLP_TREE_CHILDREN (args0[0])[0];
> > + this->m_ops.quick_push (child);
> > + if (!is_neg)
> > + {
> > + this->m_ifn = IFN_COMPLEX_FMS;
> > + this->m_ops.quick_push (left_op[1]);
> > + this->m_ops.quick_push (left_op[0]);
> > + }
> > + else if (is_neg)
> > + {
> > + bool conj_first_operand;
> > + if (!vect_validate_multiplication (perm_cache, left_op, right_op, false,
> > + &conj_first_operand))
> > + return false;
> > +
> > + this->m_ifn = IFN_COMPLEX_FMS_CONJ;
> > + /* Check if the conjucate is on the first or second parameter. */
> > + if (!conj_first_operand)
> > + {
> > + this->m_ops.quick_push (left_op[1]);
> > + this->m_ops.quick_push (left_op[0]);
> > + }
> > + else
> > + {
> > + this->m_ops.quick_push (left_op[0]);
> > + this->m_ops.quick_push (left_op[1]);
> > + }
> > + }
> > +
> > + this->m_ops.quick_push (child);
> > + this->m_ops.quick_push (right_op[1]);
> > + this->m_ops.quick_push (right_op[0]);
> > +
> > + vect_build_perm_groups (perm_cache, &this->m_blocks[0], this-
> >m_ops);
> > + this->m_nodes = &SLP_TREE_CHILDREN (*this->m_node);
> > + return true;
> > +}
> > +
> > +bool
> > +complex_fms_pattern::matches (slp_tree_to_load_perm_map_t
> *perm_cache)
> > +{
> > + auto_vec<slp_tree> args0;
> > + complex_operation_t op
> > + = vect_detect_pair_op (*this->m_node, true, &args0);
> > + return matches (op, perm_cache, args0);
> > +}
> > +
> > +
> >
> +/*********************************************************
> **********************
> > + * complex_operations_pattern class
> > +
> **********************************************************
> ********************/
> > +
> > +/* This function combines all the existing pattern matchers above into one
> class
> > + that shares the functionality between them. The initial match is shared
> > + between all complex operations. */
> > +
> > +class complex_operations_pattern : public complex_pattern
> > +{
> > + protected:
> > + complex_operations_pattern (slp_tree *node)
> > + : complex_pattern (node)
> > + { }
> > +
> > + gcall *build ();
> > +
> > + /* The current "active" pattern. */
> > + vect_pattern *m_patt = NULL;
> > +
> > + public:
> > + using complex_pattern::matches;
> > +
> > + bool matches (slp_tree_to_load_perm_map_t *perm_cache);
> > + const char* get_name ();
> > + bool validate_p (slp_tree_to_load_perm_map_t *perm_cache);
> > + bool is_optab_supported_p (tree, optimization_type);
> > + internal_fn get_ifn ();
> > + gcall *build (vec_info *);
> > + void reset (slp_tree *);
> > + ~complex_operations_pattern ();
> > +
> > + static vect_pattern* create (slp_tree *node)
> > + {
> > + return new complex_operations_pattern (node);
> > + }
> > +};
> > +
> > +
> > +/* Clean up the pattern if one existed. */
> > +
> > +complex_operations_pattern::~complex_operations_pattern ()
> > +{
> > + if (this->m_patt)
> > + delete this->m_patt;
> > + this->m_patt = NULL;
> > +}
> > +
> > +/* Match but do not perform any additional operations on the SLP tree. If
> the
> > + match is successful then that pattern is made "active" in the class. */
> > +
> > +bool
> > +complex_operations_pattern::matches (slp_tree_to_load_perm_map_t
> *perm_cache)
> > +{
> > + complex_operation_t op
> > + = vect_detect_pair_op (*this->m_node, true, &this->m_ops);
> > +
> > + /* Check which pattern this may be. Match longest pattern first. */
> > + this->m_patt = complex_fma_pattern::create (this->m_node);
> > + if (this->m_patt->matches (op, perm_cache, this->m_ops))
> > + return true;
>
> Hmm, all of these allocations like also
>
> + for (unsigned x = 0; x < num__slp_patterns; x++)
> + {
> + vect_pattern *pattern = slp_patterns[x] (ref_node);
> + found_p = pattern->recognize (perm_cache, vinfo);
> + delete pattern;
> + found_rec_p = found_p | found_rec_p;
> + }
>
> in the main driver are really due to the C++-ish layout of all this :/
> I don't like that very much. It also highlights when reading
> the patch that it's hard to get the difference betwee
> recognize (), match () and validate () where all but the first is
> kind-of internal implementation details of the complex matchers.
>
> So we could eventually get by with the main driver doing
>
> + for (unsigned x = 0; x < num__slp_patterns; x++)
> + {
> + vect_pattern *pattern = slp_patterns[x] (ref_node);
> if (pattern)
> found_rec_p = true;
> }
>
> thus merge ->recognize with the static allocator and not do
> any heap allocation when nothing is matched? Then ...
>
> > + if (op == CMPLX_NONE)
> > + return false;
> > +
> > + delete this->m_patt;
> > +
> > + this->m_patt = complex_fms_pattern::create (this->m_node);
>
> ... this could use static typing and an automatic instance?
>
> The myth that allocation & deallocation is cheap is really that,
> a myth. We're doing 6 allocations & deallocations at this
> very toplevel for each SLP node (which means eventually for
> each statement in a function).
>
>
> So why for example would one need to override recognize ()?
> I'd argue for nailing it in place and to remove validate_p ()
> (should be done by matches?). At least validate_p for
> the complex patterns seem to build SLP nodes which mean
> they actually build()?
>
>
> I think overall the patch looks sensible but it's still quite
> hard to follow the flow of an actual pattern match, possibly
> because of the strange factoring.
>
> At this point to speed up things can you split the patch
> so there's only the very simplest (complex_add for example)
> matcher plus requirements? Then we can push that which
> would allow me to actually play with it more easily and
> even try out changes?
>
> I'd say split out the optimize_load_redistribution part of
> the first patch and merge the complex_add support into
> its remains so we have a first patch that does something?
>
> Thanks and sorry for being dense here ... :/ There's pieces
> I think we should do better but I still can't lay out a
> solution and I guess pointing fingers at the individual pieces
> like above will lead to very slow progress only.
>
> Richard.
>
> > + if (this->m_patt->matches (op, perm_cache, this->m_ops))
> > + return true;
> > +
> > + delete this->m_patt;
> > +
> > + this->m_patt = complex_mul_pattern::create (this->m_node);
> > + if (this->m_patt->matches (op, perm_cache, this->m_ops))
> > + return true;
> > +
> > + delete this->m_patt;
> > +
> > + this->m_patt = complex_add_pattern::create (this->m_node);
> > + if (this->m_patt->matches (op, perm_cache, this->m_ops))
> > + return true;
> > +
> > + delete this->m_patt;
> > +
> > + this->m_patt = NULL;
> > + return false;
> > +}
> > +
> > +/* Friendly name of the operation the pattern matches. */
> > +
> > +const char*
> > +complex_operations_pattern::get_name ()
> > +{
> > + if (!this->m_patt)
> > + return "Complex Operations";
> > +
> > + return this->m_patt->get_name ();
> > +}
> > +
> > +/* Only perform the pattern creation part of the matcher. This creates
> and
> > + returns the new pattern statement and encapsulates the currently
> active
> > + pattern. */
> > +
> > +gcall *
> > +complex_operations_pattern::build (vec_info *vinfo)
> > +{
> > + if (!this->m_patt)
> > + return NULL;
> > +
> > + return this->m_patt->build (vinfo);
> > +}
> > +
> > +/* Validate the operation the currently active pattern detected. */
> > +
> > +bool
> > +complex_operations_pattern::validate_p
> (slp_tree_to_load_perm_map_t *perm_cache)
> > +{
> > + if (!this->m_patt)
> > + return false;
> > +
> > + return this->m_patt->validate_p (perm_cache);
> > +}
> > +
> > +/* Check if the obtab of the currently active matches is supported. */
> > +
> > +bool
> > +complex_operations_pattern::is_optab_supported_p (tree vectype,
> > + optimization_type opt_type)
> > +{
> > + if (!vectype || !this->m_patt)
> > + return false;
> > +
> > + return this->m_patt->is_optab_supported_p (vectype, opt_type);
> > +}
> > +
> > +/* Return the matched IFN from the current active matcher if any. */
> > +
> > +internal_fn
> > +complex_operations_pattern::get_ifn ()
> > +{
> > + if (!this->m_patt)
> > + return IFN_LAST;
> > +
> > + return this->m_patt->get_ifn ();
> > +}
> > +
> >
> +/*********************************************************
> **********************
> > + * Pattern matching definitions
> > +
> **********************************************************
> ********************/
> > +
> > +#define SLP_PATTERN(x) &x::create
> > +vect_pattern_decl_t slp_patterns[]
> > +{
> > + /* For least amount of back-tracking and more efficient matching
> > + order patterns from the largest to the smallest. Especially if they
> > + overlap in what they can detect. */
> > +
> > + /* FMA overlaps with MUL but is the longer sequence. Because we're in
> post
> > + order traversal we can't match FMA if included in
> > + complex_operations_pattern so must be checked on it's own. */
> > + SLP_PATTERN (complex_operations_pattern),
> > +};
> > +#undef SLP_PATTERN
> > +
> > +/* Set the number of SLP pattern matchers available. */
> > +size_t num__slp_patterns =
> sizeof(slp_patterns)/sizeof(vect_pattern_decl_t);
> >
> >
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> Nuernberg,
> Germany; GF: Felix Imend
next prev parent reply other threads:[~2020-11-16 15:55 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-14 14:37 [PATCH 1/2] middle-end : Initial scaffolding and definitions for SLP patttern matches Tamar Christina
2020-11-14 14:39 ` [PATCH 2/2] middle-end : Add support for complex pattern detection for Add, Mul, FMA and FMS Tamar Christina
2020-11-16 15:11 ` Richard Biener
2020-11-16 15:54 ` Tamar Christina [this message]
2020-11-17 7:50 ` Richard Biener
2020-11-16 13:16 ` [PATCH 1/2] middle-end : Initial scaffolding and definitions for SLP patttern matches Richard Biener
2020-11-16 13:33 ` Richard Biener
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=VI1PR08MB532584C37C8F6A2F9FFBD681FFE30@VI1PR08MB5325.eurprd08.prod.outlook.com \
--to=tamar.christina@arm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=hongtao.liu@intel.com \
--cc=nd@arm.com \
--cc=ook@ucw.cz \
--cc=rguenther@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).