public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Tamar Christina <Tamar.Christina@arm.com>
To: Richard Biener <rguenther@suse.de>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
	nd <nd@arm.com>, "ook@ucw.cz" <ook@ucw.cz>,
	"hongtao.liu@intel.com" <hongtao.liu@intel.com>
Subject: RE: [PATCH 2/2] middle-end : Add support for complex pattern detection for Add, Mul, FMA and FMS
Date: Mon, 16 Nov 2020 15:54:30 +0000	[thread overview]
Message-ID: <VI1PR08MB532584C37C8F6A2F9FFBD681FFE30@VI1PR08MB5325.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <nycvar.YFH.7.76.2011161509060.10073@p653.nepu.fhfr.qr>

Hi Richi, thanks for the review!

Just a quick comment on one of the questions asked:

> -----Original Message-----
> From: rguenther@c653.arch.suse.de <rguenther@c653.arch.suse.de> On
> Behalf Of Richard Biener
> Sent: Monday, November 16, 2020 3:12 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; ook@ucw.cz;
> hongtao.liu@intel.com
> Subject: Re: [PATCH 2/2] middle-end : Add support for complex pattern
> detection for Add, Mul, FMA and FMS
> 
> On Sat, 14 Nov 2020, Tamar Christina wrote:
> 
> > Hi All,
> >
> > This patch series adds support for SLP vectorization of complex instructions
> [1].
> >
> > These instructions exist only in their vector forms and require you to
> recognize
> > two statements in parallel.  Complex operations usually require a permute
> due to
> > the fact that the real and imaginary numbers are stored intermixed but
> these vector
> > instructions expect this and no longer need the compiler to generate a
> permute.
> >
> > The pass is able to insert additional permutes inside the SLP tree in order to
> > change the data flow such that the new instructions can be used.  These
> permutes
> > are then optimized out later during optimize_slp.
> >
> > These instructions also support rotations along the Argand plane, as such
> the operands
> > have to be re-ordered to coincide with their load group.
> >
> > For now, this patch only adds support for:
> >
> >   * Complex Addition with rotation of 90 and 270.
> >
> >   Addition with rotation of the second argument around the Argand plane.
> >     Supported rotations are 90 and 180.
> >
> >     c = a + (b * I) and c = a + (b * I * I * I)
> >
> >   * Complex Multiplication and Multiplication where one operand is
> conjucated.
> >
> >   Complex multiplication and Conjucate Complex multiplication of the
> second
> >      parameter.
> >
> >     c = a * b and c = a * conj (b)
> >
> >   * Complex FMA and FMA where one operand is conjucated.
> >   * Complex FMS and FMS where one operand is conjucated.
> >
> >   Complex FMLA, Conjucate FMLA of the second parameter and FMLS.
> >
> >     c += a * b, c += a * conj (b), c -= a * b and c -= a * conj (b)
> >
> >   For the conjucate cases it supports under fast-math that the operands
> that is
> >   being conjucated be flipped by flipping the arguments to the optab.  This
> >   allows it to support c = conj (a) * b and c += conj (a) * b.
> >
> >   where a, b and c are complex numbers.
> >
> > Complex dot-product is not currently supported in this patch set as
> build_slp fails
> > for it.  This will be provided as a future patch.
> >
> > These are supported for both integer and floating point and as such these
> don't look
> > for real or imaginary pairs but instead rely on the early lowering of complex
> > numbers by GCC and canonization of the operations such that it just
> recognizes any
> > instruction sequence matching the operations requested.
> >
> > To be safe when the it is not sure it can support the operation or if it finds
> > something it does not understand it backs off.  This is done by looking at
> the
> > order of the loads below the nodes.  Anything that changes the order like a
> > VEC_PERM is taken into account.  The order of the loads define which
> operation
> > is being done.   For instance a complex add with conjucate always has the
> second
> > loads in non-contiguous order.   Because this information is queried often it
> is
> > heavily cached and because we never modify nodes in place we can cache
> the
> > operation through all SLP instances.
> >
> > When SLP failes (e.g. due to costing) we dissolve the changes and restore
> the
> > previous relevancies of the stmts that were in a pattern before.
> >
> > [1] https://developer.arm.com/documentation/ddi0487/fc/
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master? (working on moving tests to middend IFN tests instead).
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	* tree-vect-slp-patterns.c: New file.
> > 	* Makefile.in: Use it.
> > 	* doc/passes.texi: Document it.
> > 	* internal-fn.def (COMPLEX_ADD_ROT90, COMPLEX_ADD_ROT270,
> > 	COMPLEX_MUL, COMPLEX_MUL_CONJ, COMPLEX_FMA,
> COMPLEX_FMA_CONJ):
> > 	COMPLEX_FMS, COMPLEX_FMS_CONJ): New.
> > 	* optabs.def (cadd90_optab, cadd270_optab, cmul_optab,
> cmul_conj_optab,
> > 	cmla_optab, cmla_conj_optab, cmls_optab, cmls_conj_optab): New.
> > 	* doc/md.texi: Document them
> >
> > -- inline copy of patch --
> >
> > diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> > index
> 273654cfa2525099ed0d9adc2f9085c8408b096f..4b556bdbbc2f989d894ef2dbe
> 894318858eaa7aa 100644
> > --- a/gcc/Makefile.in
> > +++ b/gcc/Makefile.in
> > @@ -1646,6 +1646,7 @@ OBJS = \
> >  	tree-vect-loop.o \
> >  	tree-vect-loop-manip.o \
> >  	tree-vect-slp.o \
> > +	tree-vect-slp-patterns.o \
> >  	tree-vectorizer.o \
> >  	tree-vector-builder.o \
> >  	tree-vrp.o \
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > index
> 813875b973bc73920f1195c4897ce3491ca713c2..68792e4c3eeca2bce0112fc28e
> ba38700e66dc00 100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -1493,7 +1493,7 @@ can be described by a series of letters for each
> operand.  The overall
> >  constraint for an operand is made from the letters for this operand
> >  from the first alternative, a comma, the letters for this operand from
> >  the second alternative, a comma, and so on until the last alternative.
> > -All operands for a single instruction must have the same number of
> > +All operands for a single instruction must have the same number of
> >  alternatives.
> >  @ifset INTERNALS
> >  Here is how it is done for fullword logical-or on the 68000:
> > @@ -1558,17 +1558,17 @@ the first, 1 for the second alternative, etc.).
> @xref{Output Statement}.
> >  @end ifset
> >  @ifclear INTERNALS
> >
> > -So the first alternative for the 68000's logical-or could be written as
> > -@code{"+m" (output) : "ir" (input)}.  The second could be @code{"+r"
> > -(output): "irm" (input)}.  However, the fact that two memory locations
> > -cannot be used in a single instruction prevents simply using @code{"+rm"
> > -(output) : "irm" (input)}.  Using multi-alternatives, this might be
> > +So the first alternative for the 68000's logical-or could be written as
> > +@code{"+m" (output) : "ir" (input)}.  The second could be @code{"+r"
> > +(output): "irm" (input)}.  However, the fact that two memory locations
> > +cannot be used in a single instruction prevents simply using @code{"+rm"
> > +(output) : "irm" (input)}.  Using multi-alternatives, this might be
> >  written as @code{"+m,r" (output) : "ir,irm" (input)}.  This describes
> > -all the available alternatives to the compiler, allowing it to choose
> > +all the available alternatives to the compiler, allowing it to choose
> >  the most efficient one for the current conditions.
> >
> > -There is no way within the template to determine which alternative was
> > -chosen.  However you may be able to wrap your @code{asm} statements
> with
> > +There is no way within the template to determine which alternative was
> > +chosen.  However you may be able to wrap your @code{asm} statements
> with
> >  builtins such as @code{__builtin_constant_p} to achieve the desired
> results.
> >  @end ifclear
> >
> > @@ -3062,7 +3062,7 @@ instruction taking only the upper 16-bits of a
> >  16-bits being 0.
> >
> >  @item L
> > -Integer that is valid as an immediate operand for a
> > +Integer that is valid as an immediate operand for a
> >  shift instruction. Range 0 to 31.
> >
> >  @item M
> > @@ -3076,7 +3076,7 @@ Integer that is valid as an immediate operand for
> >  a custom instruction opcode. Range 0 to 255.
> >
> >  @item P
> > -An immediate operand for R2 andchi/andci instructions.
> > +An immediate operand for R2 andchi/andci instructions.
> >
> >  @item S
> >  Matches immediates which are addresses in the small
> > @@ -6132,6 +6132,97 @@ floating-point mode.
> >
> >  This pattern is not allowed to @code{FAIL}.
> >
> > +@cindex @code{cadd@var{m}@var{n}3} instruction pattern
> > +@item @samp{cadd@var{m}@var{n}3}
> > +Perform a vector addition of complex numbers in operand 1 with operand
> 2
> > +rotated by @var{m} degrees around the argand plane and storing the
> result in
> > +operand 0.  The instruction must perform the operation on data that in
> GCC lane
> > +order where the 0th lane holds the real part and the the imaginary part in
> lane
> > +1.
> 
> So I'm not sure I'm happy with defining this in terms of 'complex
> numbers'.  At least we nowhere specify how complex numbers are laid out in
> vector registers.  For V4DF it could be any of {real0,imag0,real1,imag1},
> {real0,real1, imag0,imag1}, {imag0,real0,imag1,real1} or
> {imag0,imag1, real0,real1}.  While it's easy to give names to semantics
> this way I'd prefer to define this in terms of even/odd lanes of a
> vector.  Maybe it can refer to the aarch64 example of complex number
> instructions implementing such scheme.
> 
> So at _least_ we have to define the layout of complex components
> in the vector modes (preferably in a target independent way), say
> even lanes are real parts and odd lanes are imaginary parts?
> 
> > +
> > +The operation is only supported for vector modes @var{n} and with
> > +rotations @var{m} of 90 or 270.
> > +
> > +This pattern is not allowed to @code{FAIL}.
> > +
> > +@cindex @code{cmla@var{m}4} instruction pattern
> > +@item @samp{cmla@var{m}4}
> > +Perform a vector floating point multiply and accumulate of complex
> numbers
> > +in operand 0, operand 1 and operand 2.
> > +
> > +The instruction must perform the operation on data that in GCC lane
> > +order where the 0th lane holds the real part and the the imaginary part in
> lane
> > +1.
> > +
> > +The operation is only supported for vector modes @var{m}.
> > +
> > +This pattern is not allowed to @code{FAIL}.
> > +
> > +@cindex @code{cmla_conj@var{m}4} instruction pattern
> > +@item @samp{cmla_conj@var{m}4}
> > +Perform a vector floating point multiply and accumulate of complex
> numbers
> > +in operand 0, operand 1 and the conjucate of operand 2.
> > +
> > +The instruction must perform the operation on data that in GCC lane
> > +order where the 0th lane holds the real part and the the imaginary part in
> lane
> > +1.
> > +
> > +The operation is only supported for vector modes @var{m}.
> > +
> > +This pattern is not allowed to @code{FAIL}.
> > +
> > +@cindex @code{cmls@var{m}4} instruction pattern
> > +@item @samp{cmls@var{m}4}
> > +Perform a vector floating point multiply and subtract of complex numbers
> > +in operand 0, operand 1 and operand 2.
> > +
> > +The instruction must perform the operation on data that in GCC lane
> > +order where the 0th lane holds the real part and the the imaginary part in
> lane
> > +1.
> > +
> > +The operation is only supported for vector modes @var{m}.
> > +
> > +This pattern is not allowed to @code{FAIL}.
> > +
> > +@cindex @code{cmls_conj@var{m}4} instruction pattern
> > +@item @samp{cmls_conj@var{m}4}
> > +Perform a vector floating point multiply and subtract of complex numbers
> > +in operand 0, operand 1 and the conjucate of operand 2.
> > +
> > +The instruction must perform the operation on data that in GCC lane
> > +order where the 0th lane holds the real part and the the imaginary part in
> lane
> > +1.
> > +
> > +The operation is only supported for vector modes @var{m}.
> > +
> > +This pattern is not allowed to @code{FAIL}.
> > +
> > +@cindex @code{cmul@var{m}4} instruction pattern
> > +@item @samp{cmul@var{m}4}
> > +Perform a vector floating point multiplication of complex numbers in
> operand 0
> > +and operand 1.
> > +
> > +The instruction must perform the operation on data that in GCC lane
> > +order where the 0th lane holds the real part and the the imaginary part in
> lane
> > +1.
> > +
> > +The operation is only supported for vector modes @var{m}.
> > +
> > +This pattern is not allowed to @code{FAIL}.
> > +
> > +@cindex @code{cmul_conj@var{m}4} instruction pattern
> > +@item @samp{cmul_conj@var{m}4}
> > +Perform a vector floating point multiplication of complex numbers in
> operand 0
> > +and the conjucate of operand 1.
> > +
> > +The instruction must perform the operation on data that in GCC lane
> > +order where the 0th lane holds the real part and the the imaginary part in
> lane
> > +1.
> > +
> > +The operation is only supported for vector modes @var{m}.
> > +
> > +This pattern is not allowed to @code{FAIL}.
> > +
> >  @cindex @code{ffs@var{m}2} instruction pattern
> >  @item @samp{ffs@var{m}2}
> >  Store into operand 0 one plus the index of the least significant 1-bit
> > @@ -7409,7 +7500,7 @@ If this pattern is not defined, then a
> @code{memory_barrier} pattern
> >  will be emitted, followed by a store of the value to the memory operand.
> >
> >  @cindex @code{atomic_compare_and_swap@var{mode}} instruction
> pattern
> > -@item @samp{atomic_compare_and_swap@var{mode}}
> > +@item @samp{atomic_compare_and_swap@var{mode}}
> >  This pattern, if defined, emits code for an atomic compare-and-swap
> >  operation with memory model semantics.  Operand 2 is the memory on
> which
> >  the atomic operation is performed.  Operand 0 is an output operand which
> > @@ -7424,7 +7515,7 @@ if the operation fails.
> >
> >  If memory referred to in operand 2 contains the value in operand 3, then
> >  operand 4 is stored in memory pointed to by operand 2 and fencing based
> on
> > -the memory model in operand 6 is issued.
> > +the memory model in operand 6 is issued.
> >
> >  If memory referred to in operand 2 does not contain the value in operand
> 3,
> >  then fencing based on the memory model in operand 7 is issued.
> > @@ -7500,9 +7591,9 @@ none of these are available a compare-and-swap
> loop will be used.
> >  @itemx @samp{atomic_fetch_or@var{mode}},
> @samp{atomic_fetch_and@var{mode}}
> >  @itemx @samp{atomic_fetch_xor@var{mode}},
> @samp{atomic_fetch_nand@var{mode}}
> >  These patterns emit code for an atomic operation on memory with
> memory
> > -model semantics, and return the original value. Operand 0 is an output
> > -operand which contains the value of the memory location before the
> > -operation was performed.  Operand 1 is the memory on which the atomic
> > +model semantics, and return the original value. Operand 0 is an output
> > +operand which contains the value of the memory location before the
> > +operation was performed.  Operand 1 is the memory on which the atomic
> >  operation is performed.  Operand 2 is the second operand to the binary
> >  operator.  Operand 3 is the memory model to be used by the operation.
> >
> > @@ -7859,7 +7950,7 @@ simplified) from the PDP-11 target:
> >          (plus:HI (match_dup 0)
> >                (const_int -1)))]
> >    ""
> > -
> > +
> >    @{
> >      if (which_alternative == 0)
> >        return "sob %0,%l1";
> > diff --git a/gcc/doc/passes.texi b/gcc/doc/passes.texi
> > index
> a5ae4143a8c1293e674b499120372ee5fe5c412b..c86df5cd843084a5b7933ef99
> a23386891a7b0c1 100644
> > --- a/gcc/doc/passes.texi
> > +++ b/gcc/doc/passes.texi
> > @@ -709,7 +709,8 @@ loop.
> >  The pass is implemented in @file{tree-vectorizer.c} (the main driver),
> >  @file{tree-vect-loop.c} and @file{tree-vect-loop-manip.c} (loop specific
> parts
> >  and general loop utilities), @file{tree-vect-slp} (loop-aware SLP
> > -functionality), @file{tree-vect-stmts.c} and @file{tree-vect-data-refs.c}.
> > +functionality), @file{tree-vect-stmts.c}, @file{tree-vect-data-refs.c} and
> > +@file{tree-vect-slp-patterns.c} containing the SLP pattern matcher.
> >  Analysis of data references is in @file{tree-data-ref.c}.
> >
> >  SLP Vectorization.  This pass performs vectorization of straight-line code.
> The
> > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> > index
> 310d37aa53819791b5df1683afca831f08e5892a..acb7d9f3bdc757437d5492a652
> 144ba31c2ef702 100644
> > --- a/gcc/internal-fn.def
> > +++ b/gcc/internal-fn.def
> > @@ -277,12 +277,21 @@ DEF_INTERNAL_FLT_FN (SCALB, ECF_CONST,
> scalb, binary)
> >  DEF_INTERNAL_FLT_FLOATN_FN (FMIN, ECF_CONST, fmin, binary)
> >  DEF_INTERNAL_FLT_FLOATN_FN (FMAX, ECF_CONST, fmax, binary)
> >  DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONST, xorsign, binary)
> > +DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT90, ECF_CONST, cadd90,
> binary)
> > +DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST,
> cadd270, binary)
> > +DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL, ECF_CONST, cmul, binary)
> > +DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL_CONJ, ECF_CONST,
> cmul_conj, binary)
> > +
> >
> >  /* FP scales.  */
> >  DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
> >
> >  /* Ternary math functions.  */
> >  DEF_INTERNAL_FLT_FLOATN_FN (FMA, ECF_CONST, fma, ternary)
> > +DEF_INTERNAL_OPTAB_FN (COMPLEX_FMA, ECF_CONST, cmla, ternary)
> > +DEF_INTERNAL_OPTAB_FN (COMPLEX_FMA_CONJ, ECF_CONST,
> cmla_conj, ternary)
> > +DEF_INTERNAL_OPTAB_FN (COMPLEX_FMS, ECF_CONST, cmls, ternary)
> > +DEF_INTERNAL_OPTAB_FN (COMPLEX_FMS_CONJ, ECF_CONST,
> cmls_conj, ternary)
> >
> >  /* Unary integer ops.  */
> >  DEF_INTERNAL_INT_FN (CLRSB, ECF_CONST | ECF_NOTHROW, clrsb,
> unary)
> > diff --git a/gcc/optabs.def b/gcc/optabs.def
> > index
> 78409aa14537d259bf90277751aac00d452a0d3f..19db9c00896cd08adfd20a0166
> 9990bbbebd79f1 100644
> > --- a/gcc/optabs.def
> > +++ b/gcc/optabs.def
> > @@ -290,6 +290,14 @@ OPTAB_D (atan_optab, "atan$a2")
> >  OPTAB_D (atanh_optab, "atanh$a2")
> >  OPTAB_D (copysign_optab, "copysign$F$a3")
> >  OPTAB_D (xorsign_optab, "xorsign$F$a3")
> > +OPTAB_D (cadd90_optab, "cadd90$a3")
> > +OPTAB_D (cadd270_optab, "cadd270$a3")
> > +OPTAB_D (cmul_optab, "cmul$a3")
> > +OPTAB_D (cmul_conj_optab, "cmul_conj$a3")
> > +OPTAB_D (cmla_optab, "cmla$a4")
> > +OPTAB_D (cmla_conj_optab, "cmla_conj$a4")
> > +OPTAB_D (cmls_optab, "cmls$a4")
> > +OPTAB_D (cmls_conj_optab, "cmls_conj$a4")
> >  OPTAB_D (cos_optab, "cos$a2")
> >  OPTAB_D (cosh_optab, "cosh$a2")
> >  OPTAB_D (exp10_optab, "exp10$a2")
> > diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..7a2c64aeb9f212cd3ee18adf11
> 0e6ac6696249fd
> > --- /dev/null
> > +++ b/gcc/tree-vect-slp-patterns.c
> > @@ -0,0 +1,1844 @@
> > +/* SLP - Pattern matcher on SLP trees
> > +   Copyright (C) 2020 Free Software Foundation, Inc.
> > +
> > +This file is part of GCC.
> > +
> > +GCC is free software; you can redistribute it and/or modify it under
> > +the terms of the GNU General Public License as published by the Free
> > +Software Foundation; either version 3, or (at your option) any later
> > +version.
> > +
> > +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> > +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> > +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> License
> > +for more details.
> > +
> > +You should have received a copy of the GNU General Public License
> > +along with GCC; see the file COPYING3.  If not see
> > +<http://www.gnu.org/licenses/>.  */
> > +
> > +#include "config.h"
> > +#include "system.h"
> > +#include "coretypes.h"
> > +#include "backend.h"
> > +#include "target.h"
> > +#include "rtl.h"
> > +#include "tree.h"
> > +#include "gimple.h"
> > +#include "tree-pass.h"
> > +#include "ssa.h"
> > +#include "optabs-tree.h"
> > +#include "insn-config.h"
> > +#include "recog.h"		/* FIXME: for insn_data */
> > +#include "fold-const.h"
> > +#include "stor-layout.h"
> > +#include "gimple-iterator.h"
> > +#include "cfgloop.h"
> > +#include "tree-vectorizer.h"
> > +#include "langhooks.h"
> > +#include "gimple-walk.h"
> > +#include "dbgcnt.h"
> > +#include "tree-vector-builder.h"
> > +#include "vec-perm-indices.h"
> > +#include "gimple-fold.h"
> > +#include "internal-fn.h"
> > +
> > +/* SLP Pattern matching mechanism.
> > +
> > +  This extension to the SLP vectorizer allows one to transform the
> generated SLP
> > +  tree based on any pattern.  The difference between this and the normal
> vect
> > +  pattern matcher is that unlike the former, this matcher allows you to
> match
> > +  with instructions that do not belong to the same SSA dominator graph.
> > +
> > +  The only requirement that this pattern matcher has is that you are only
> > +  only allowed to either match an entire group or none.
> > +
> > +  The pattern matcher currently only allows you to perform replacements
> to
> > +  internal functions.
> > +
> > +  Once the patterns are matched it is one way, these cannot be undone.  It
> is
> > +  currently not supported to match patterns recursively.
> > +
> > +  To add a new pattern, implement the vect_pattern class and add the
> type to
> > +  slp_patterns.
> > +
> > +*/
> > +
> >
> +/*********************************************************
> **********************
> > + * vect_pattern class
> > +
> **********************************************************
> ********************/
> > +
> > +/* Default implementation of recognize that peforms matching, validation
> and
> > +   replacement of nodes but that can be overriden if required.  */
> > +
> > +bool
> > +vect_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache,
> > +			 vec_info *vinfo)
> > +{
> > +  if (this->matches (perm_cache))
> > +    {
> > +      tree vectype = SLP_TREE_VECTYPE (*this->m_node);
> > +      if (dump_enabled_p ())
> > +	dump_printf_loc (MSG_NOTE, vect_location,
> > +			 "Found %s pattern in SLP tree\n",
> > +			 this->get_name ());
> > +
> > +      if (this->is_optab_supported_p (vectype, OPTIMIZE_FOR_SPEED))
> > +	{
> > +	  if (dump_enabled_p ())
> > +	    dump_printf_loc (MSG_NOTE, vect_location,
> > +			     "Target supports %s vectorization with
> mode %T\n",
> > +			     internal_fn_name (this->get_ifn ()), vectype);
> > +	}
> > +      else
> > +	{
> > +	  if (dump_enabled_p ())
> > +	    {
> > +	      if (!vectype)
> > +		dump_printf_loc (MSG_NOTE, vect_location,
> > +				 "Target does not support vector type
> for %T\n",
> > +				 SLP_TREE_DEF_TYPE (*this->m_node));
> > +	      else
> > +		dump_printf_loc (MSG_NOTE, vect_location,
> > +				 "Target does not support %s for vector type
> "
> > +				 "%T\n", internal_fn_name (this->get_ifn ()),
> > +				 vectype);
> > +	    }
> > +          return false;
> > +	}
> > +    }
> > +  else
> > +    return false;
> > +
> > +  if (!this->validate_p (perm_cache))
> > +    {
> > +      if (dump_enabled_p ())
> > +	dump_printf_loc (MSG_NOTE, vect_location,
> > +			"transformation for %s not valid due to post "
> > +			"condition\n",
> > +			internal_fn_name (this->get_ifn ()));
> > +      return false;
> > +    }
> > +
> > +  if (dump_enabled_p ())
> > +    dump_printf_loc (MSG_NOTE, vect_location, "Creating vec
> patterns\n");
> > +
> > +  gcall* call_stmt = this->build (vinfo);
> > +  if (dump_enabled_p ())
> > +    dump_printf_loc (MSG_NOTE, vect_location, "\t %p stmt: %G",
> > +		     *this->m_node, call_stmt);
> > +
> > +  return true;
> > +}
> > +
> >
> +/*********************************************************
> **********************
> > + * General helper types
> > +
> **********************************************************
> ********************/
> > +
> > +/* The COMPLEX_OPERATION enum denotes the possible pair of
> operations that can
> > +   be matched when looking for expressions that we are interested
> matching for
> > +   complex numbers addition and mla.  */
> > +
> > +typedef enum _complex_operation : unsigned {
> > +  PLUS_PLUS,
> > +  MINUS_PLUS,
> > +  PLUS_MINUS,
> > +  MULT_MULT,
> > +  CMPLX_NONE
> > +} complex_operation_t;
> > +
> > +/* A simple pair type used for ordering of combine operations.  */
> > +
> > +typedef struct map_ { int a; int b; } map_t;
> > +
> >
> +/*********************************************************
> **********************
> > + * General helper functions
> > +
> **********************************************************
> ********************/
> > +
> > +/* Helper function of linear_loads_p that checks to see if the load
> permutation
> > +   is sequential and in monotonically increasing order of loads with no gaps.
> > +*/
> > +
> > +static inline bool
> > +is_linear_load_p (load_permutation_t loads)
> > +{
> > +  if (loads.length() == 0)
> > +    return false;
> > +
> > +  unsigned leader = loads[0];
> > +  unsigned load, i;
> > +  FOR_EACH_VEC_ELT_FROM (loads, i, load, 1)
> > +    if (load != ++leader)
> > +      return false;
> > +  return true;
> > +}
> > +
> > +
> > +/* Check to see if all loads rooted in ROOT are linear.  Linearity is
> > +   defined as having no gaps between values loaded.  */
> > +
> > +static load_permutation_t
> > +linear_loads_p (slp_tree_to_load_perm_map_t *perm_cache, slp_tree
> root,
> > +		bool *linear)
> > +{
> > +  *linear = false;
> > +  if (!root)
> > +    return vNULL;
> > +
> > +  unsigned i;
> > +  load_permutation_t loads = vNULL;
> > +  load_permutation_t *tmp;
> > +
> > +  if ((tmp = perm_cache->get (root)) != NULL)
> > +    {
> > +      *linear = is_linear_load_p (*tmp);
> > +      return *tmp;
> > +    }
> > +
> > +  perm_cache->put (root, vNULL);
> > +
> > +  /* If it's a load node, then just read the load permute.  */
> > +  if (SLP_TREE_LOAD_PERMUTATION (root).exists ())
> > +    {
> > +      loads = SLP_TREE_LOAD_PERMUTATION (root);
> > +      perm_cache->put (root, loads);
> > +      if (!is_linear_load_p (loads))
> > +	return loads;
> > +    }
> > +  else if (SLP_TREE_DEF_TYPE (root) == vect_external_def)
> > +    {
> > +       loads.create (SLP_TREE_LANES (root));
> > +       tree op;
> > +       FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_OPS (root), i, op)
> > +	 {
> > +	   gimple *defstmt = SSA_NAME_DEF_STMT (op);
> > +	   if (!is_gimple_assign (defstmt))
> > +	     return vNULL;
> > +
> > +	   switch (gimple_assign_rhs_code (defstmt))
> > +	   {
> > +	     case IMAGPART_EXPR:
> > +	       loads.safe_push (1);
> > +	       break;
> > +	     case REALPART_EXPR:
> > +	       loads.safe_push (0);
> > +	       break;
> > +	     default:
> > +	       {
> > +		 loads.release ();
> > +		 return vNULL;
> > +	       }
> > +	   }
> > +	 }
> > +
> > +       perm_cache->put (root, loads);
> > +       if (!is_linear_load_p (loads))
> > +	 return loads;
> > +    }
> > +  else if (SLP_TREE_DEF_TYPE (root) != vect_internal_def)
> > +    return vNULL;
> > +
> > +
> > +
> > +  auto_vec<load_permutation_t> all_loads;
> > +  bool is_perm = SLP_TREE_LANE_PERMUTATION (root).exists ();
> > +
> > +  slp_tree child;
> > +  FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (root), i, child)
> > +    {
> > +      /* Seed it with a value to prevent infinite recursions.  */
> > +      loads = linear_loads_p (perm_cache, child, linear);
> > +      if ((!*linear && !is_perm) || !loads.exists ())
> > +	return loads;
> > +
> > +      all_loads.safe_push (loads);
> > +    }
> > +
> > +  if (is_perm)
> > +    {
> > +      lane_permutation_t perm = SLP_TREE_LANE_PERMUTATION (root);
> > +      load_permutation_t nloads;
> > +      nloads.create (SLP_TREE_LANES (root));
> > +      nloads.quick_grow (SLP_TREE_LANES (root));
> > +      for (i = 0; i < SLP_TREE_LANES (root); i++)
> > +	nloads[i] = all_loads[perm[i].first][perm[i].second];
> > +
> > +      perm_cache->put (root, nloads);
> > +      if (!is_linear_load_p (nloads))
> > +	return nloads;
> > +      loads = nloads;
> > +    }
> > +
> > +  perm_cache->put (root, loads);
> > +  *linear = true;
> > +  return loads;
> > +}
> > +
> > +/* Builds a permutation group from the operands in OPS and stores it in
> BLOCKS.
> > +   The group describes how to combine the operators to get a valid linear
> node.
> > +
> > +   This is used when combining multiple children from a two_operators
> node into
> > +   one using a lane permute to select the appropriate lane. As an example
> the
> > +   permute { [0 0] [1 4] [2 2] [3 3] [1 4] [5 5] } says the nodes which occur
> > +   twice in a group, e.g [0 0] only needs itself to possibly be made linear
> > +   whereas [1 4] means to combine the nodes 1 and 4. */
> > +
> > +static void
> > +vect_build_perm_groups (slp_tree_to_load_perm_map_t *perm_cache,
> > +			map_t *blocks, vec<slp_tree> ops)
> > +{
> > +  slp_tree op;
> > +  unsigned i;
> > +  bool is_linear = false;
> > +  unsigned min_eq = -1, max_eq = 0;
> > +  unsigned min_idx = 0, max_idx = 0;
> > +  FOR_EACH_VEC_ELT (ops, i, op)
> > +    {
> > +      load_permutation_t perms = linear_loads_p (perm_cache, op,
> &is_linear);
> > +      unsigned x, imin = -1, imax = 0;
> > +      for (x = 0; x < perms.length () && !is_linear; x++)
> > +	{
> > +	  imin = MIN (imin, perms[x]);
> > +	  imax = MAX (imax, perms[x]);
> > +	}
> > +
> > +	if (imin != imax || perms.length () == 0 || is_linear)
> > +	  blocks[i] = {i, i};
> > +	else
> > +	  {
> > +	    if (imin <= min_eq)
> > +	      {
> > +		min_eq = imin;
> > +		min_idx = i;
> > +	      }
> > +
> > +	    if (imin >= max_eq)
> > +	      {
> > +		max_eq = imin;
> > +		max_idx = i;
> > +	      }
> > +	  }
> > +    }
> > +
> > +  /* Now fill in the gap.  */
> > +  blocks[min_idx] = {min_idx, max_idx};
> > +  blocks[max_idx] = {min_idx, max_idx};
> > +
> > +  if (dump_enabled_p ())
> > +    {
> > +      dump_printf_loc (MSG_NOTE, vect_location, "pattern group: { ");
> > +      for (i = 0; i < ops.length (); i++)
> > +	dump_printf (MSG_NOTE,"[%d %d] ", blocks[i].a, blocks[i].b);
> > +      dump_printf (MSG_NOTE,"}\n");
> > +    }
> > +
> > +}
> > +
> > +/* This function attempts to make a node rooted in NODE with parent
> PARENT
> > +   linear.  If the node if already linear than the node itself is returned
> > +   in RESULT.
> > +
> > +   If the node is not linear then a new VEC_PERM_EXPR node is created
> with a
> > +   lane permute that when applied will make the node linear.   If such a
> > +   permute cannot be created then FALSE is returned from the function.
> > +
> > +   Here linearity is defined as having a sequential, monotically increasing
> > +   load position inside the load permute generated by the loads reachable
> from
> > +   NODE.  */
> > +
> > +static bool
> > +vect_slp_make_linear (slp_tree_to_load_perm_map_t *perm_cache,
> > +		      slp_tree parent, slp_tree node, slp_tree *result)
> > +{
> > +  bool is_linear = false;
> > +  unsigned x, val;
> > +  load_permutation_t load_perm = linear_loads_p (perm_cache, node,
> &is_linear);
> > +  if (is_linear)
> > +    {
> > +      *result = node;
> > +      return true;
> > +    }
> > +
> > +  /* Attempt to linearise the permute.  */
> > +  vec<std::pair<unsigned, unsigned> > zipped;
> > +  zipped.create (load_perm.length ());
> > +  FOR_EACH_VEC_ELT (load_perm, x, val)
> > +    zipped.quick_push (std::make_pair (val, x));
> > +
> > +  typedef const std::pair<unsigned, unsigned>* cmp_t;
> > +  zipped.qsort ([](const void *a, const void *b) -> int
> > +    { return (int)((cmp_t)a)->first - (int)((cmp_t)b)->first; });
> > +
> > +  /* Verify if we have a linear permute sequence.  */
> > +  if (zipped.length () > 0)
> > +    {
> > +      unsigned leader = zipped[0].first;
> > +      for (x = 1; x < zipped.length (); x++)
> > +	if(!(is_linear = (zipped[x].first == ++leader)))
> > +	  break;
> > +    }
> > +
> > +  if (!is_linear)
> > +    {
> > +      if (dump_enabled_p ())
> > +	dump_printf_loc (MSG_NOTE, vect_location,
> > +			"Loads could not be made linear %p\n",
> > +			node);
> > +      zipped.release ();
> > +      return false;
> > +  }
> > +
> > +  for (x = 0; x < zipped.length (); x++)
> > +    zipped[x].first = 0;
> > +
> > +  /* Create the new permute node and store it instead.  */
> > +  slp_tree vnode = vect_create_new_slp_node (vNULL, 1);
> > +  SLP_TREE_CODE (vnode) = VEC_PERM_EXPR;
> > +  SLP_TREE_LANE_PERMUTATION (vnode) = zipped;
> > +  SLP_TREE_VECTYPE (vnode) = SLP_TREE_VECTYPE (parent);
> > +  SLP_TREE_CHILDREN (vnode).quick_push (node);
> > +  SLP_TREE_REF_COUNT (vnode) = 1;
> > +  SLP_TREE_LANES (vnode) = SLP_TREE_LANES (node);
> > +  SLP_TREE_REPRESENTATIVE (vnode) = SLP_TREE_REPRESENTATIVE
> (parent);
> > +  *result = vnode;
> > +  return is_linear;
> > +}
> > +
> > +/* Helper utility to check to see if the permutation PERM is one that can
> be
> > +   used in a node combination operation.  This is defined as the permute
> not
> > +   having all the elements being the same. e.g [0 0].  */
> > +
> > +static inline bool
> > +vect_can_combine_node_p (load_permutation_t perm, bool is_linear)
> > +{
> > +  if (is_linear)
> > +    return false;
> > +
> > +  unsigned i, x;
> > +  FOR_EACH_VEC_ELT (perm, i, x)
> > +    if (perm[0] != x)
> > +      return false;
> > +
> > +  return true;
> > +}
> > +
> > +/* This function combines the nodes in MAP together to make a new
> node using a
> > +   lane permute.  The nodes to combine are stored in ENTRIES and the
> resulting
> > +   node is returned in RESULT.
> > +
> > +   If the nodes are already linear then this function fails and returns FALSE.
> > +   Otherwise it returns the new node and TRUE.  */
> > +
> > +static bool
> > +vect_slp_make_combine_linear (slp_tree_to_load_perm_map_t
> *perm_cache,
> > +			      slp_tree parent, vec<slp_tree> entries, map_t
> map,
> > +			      slp_tree *result)
> > +{
> > +  if (map.a == map.b)
> > +    return false;
> > +
> > +  slp_tree node_a = entries[map.a];
> > +  slp_tree node_b = entries[map.b];
> > +
> > +  bool is_a_linear = false;
> > +  bool is_b_linear = false;
> > +
> > +  load_permutation_t load_perm_a
> > +    = linear_loads_p (perm_cache, node_a, &is_a_linear);
> > +  if (!vect_can_combine_node_p (load_perm_a, is_a_linear))
> > +    return false;
> > +
> > +  load_permutation_t load_perm_b
> > +    = linear_loads_p (perm_cache, node_b, &is_b_linear);
> > +  if (!vect_can_combine_node_p (load_perm_b, is_b_linear))
> > +    return false;
> > +
> > +  if (!load_perm_a.exists () || !load_perm_b.exists ())
> > +    return false;
> > +
> > +  /* Now we need to figure which node is first.  */
> > +  auto_vec<slp_tree> nodes;
> > +  nodes.create (2);
> > +  vec<std::pair<unsigned, unsigned> > perm;
> > +  perm.create (2);
> > +  if (load_perm_a[0] < load_perm_b[0])
> > +    {
> > +      perm.quick_push (std::make_pair (0, 0));
> > +      perm.quick_push (std::make_pair (1, 0));
> > +    }
> > +  else
> > +    {
> > +      perm.quick_push (std::make_pair (1, 0));
> > +      perm.quick_push (std::make_pair (0, 0));
> > +    }
> > +
> > +  nodes.quick_push (node_a);
> > +  nodes.quick_push (node_b);
> > +
> > +  slp_tree vnode = vect_create_new_slp_node (vNULL, 1);
> > +  SLP_TREE_CODE (vnode) = VEC_PERM_EXPR;
> > +  SLP_TREE_LANE_PERMUTATION (vnode) = perm;
> > +  SLP_TREE_VECTYPE (vnode) = SLP_TREE_VECTYPE (parent);
> > +  SLP_TREE_CHILDREN (vnode).safe_splice (nodes);
> > +  SLP_TREE_REF_COUNT (vnode) = 1;
> > +  SLP_TREE_LANES (vnode) = SLP_TREE_LANES (parent);
> > +  SLP_TREE_REPRESENTATIVE (vnode) = SLP_TREE_REPRESENTATIVE
> (parent);
> > +  *result = vnode;
> > +  return true;
> > +}
> > +
> > +/* Checks to see of the expression represented by NODE is a gimple
> assign with
> > +   code CODE.  */
> > +
> > +static inline bool
> > +vect_match_expression_p (slp_tree node, tree_code code)
> > +{
> > +  if (!node
> > +      || !SLP_TREE_REPRESENTATIVE (node))
> > +    return false;
> > +
> > +  gimple* expr = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE
> (node));
> > +  if (!is_gimple_assign (expr)
> > +      || gimple_assign_rhs_code (expr) != code)
> > +    return false;
> > +
> > +  return true;
> > +}
> > +
> > +/* Checks to see if the expression represented by NODE is a call to the
> internal
> > +   function FN.  */
> > +
> > +static inline bool
> > +vect_match_call_p (slp_tree node, internal_fn fn)
> > +{
> > +  if (!node
> > +      || !SLP_TREE_REPRESENTATIVE (node))
> > +    return false;
> > +
> > +  gimple* expr = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE
> (node));
> > +  if (!gimple_call_internal_p (expr, fn))
> > +    return false;
> > +
> > +   return true;
> > +}
> > +
> > +/* Checks to see if NODE is a two_operands node whom's operands are
> calls to the
> > +   internal function FN.  */
> > +
> > +static bool
> > +vect_match_two_call_p (slp_tree node, internal_fn fn)
> > +{
> > +  if (!SLP_TREE_REPRESENTATIVE (node)
> > +      || SLP_TREE_CODE (node) != VEC_PERM_EXPR
> > +      || SLP_TREE_CHILDREN (node).length () != 2)
> > +    return false;
> > +
> > +  vec<slp_tree> children = SLP_TREE_CHILDREN (node);
> > +
> > +  return vect_match_call_p (children[0], fn)
> > +	 && vect_match_call_p (children[1], fn);
> > +}
> > +
> > +/* Check if the given lane permute in PERMUTES matches an alternating
> sequence
> > +   of {P0 P1 P0 P1 ...}.  This to account for unrolled loops.  Further mode
> > +   there resulting permute must be linear.   */
> > +
> > +static bool
> > +vect_check_lane_permute (lane_permutation_t &permutes,
> > +			 unsigned p0, unsigned p1)
> > +{
> > +  if (permutes.length () == 0)
> > +    return false;
> > +
> > +  unsigned val[2] = {p0, p1};
> > +  unsigned seed = permutes[0].second;
> > +  for (unsigned i = 0; i < permutes.length (); i++)
> > +    if (permutes[i].first != val[i % 2]
> > +	|| permutes[i].second != seed++)
> > +      return false;
> > +
> > +  return true;
> > +}
> > +
> > +/* This function will match the two gimple expressions representing
> NODE1 and
> > +   NODE2 in parallel and returns the pair operation that represents the two
> > +   expressions in the two statements.
> > +
> > +   If match is successful then the corresponding complex_operation is
> > +   returned and the arguments to the two matched operations are
> returned in OPS.
> > +
> > +   If TWO_OPERANDS it is expected that the LANES of the parent
> VEC_PERM select
> > +   from the two nodes alternatingly.
> > +
> > +   If unsuccessful then CMPLX_NONE is returned and OPS is untouched.
> > +
> > +   e.g. the following gimple statements
> > +
> > +   stmt 0 _39 = _37 + _12;
> > +   stmt 1 _6 = _38 - _36;
> > +
> > +   will return PLUS_MINUS along with OPS containing {_37, _12, _38, _36}.
> > +*/
> > +
> > +static complex_operation_t
> > +vect_detect_pair_op (slp_tree node1, slp_tree node2,
> lane_permutation_t &lanes,
> > +		     bool two_operands = true, vec<slp_tree> *ops = NULL)
> > +{
> > +  complex_operation_t result = CMPLX_NONE;
> > +
> > +  if (vect_match_expression_p (node1, MINUS_EXPR)
> > +      && vect_match_expression_p (node2, PLUS_EXPR)
> > +      && (!two_operands || vect_check_lane_permute (lanes, 0, 1)))
> > +    result = MINUS_PLUS;
> > +  else if (vect_match_expression_p (node1, PLUS_EXPR)
> > +	   && vect_match_expression_p (node2, MINUS_EXPR)
> > +	   && (!two_operands || vect_check_lane_permute (lanes, 0, 1)))
> > +    result = PLUS_MINUS;
> > +  else if (vect_match_expression_p (node1, PLUS_EXPR)
> > +	   && vect_match_expression_p (node2, PLUS_EXPR))
> > +    result = PLUS_PLUS;
> > +  else if (vect_match_expression_p (node1, MULT_EXPR)
> > +	   && vect_match_expression_p (node2, MULT_EXPR))
> > +    result = MULT_MULT;
> > +
> > +  if (result != CMPLX_NONE && ops != NULL)
> > +    {
> > +      ops->create (2);
> > +      ops->quick_push (node1);
> > +      ops->quick_push (node2);
> > +    }
> > +  return result;
> > +}
> > +
> > +/* Overload of vect_detect_pair_op that matches against the
> representative
> > +   statements in the children of NODE.  It is expected that NODE has
> exactly
> > +   two children and when TWO_OPERANDS then NODE must be a
> VEC_PERM.  */
> > +
> > +static complex_operation_t
> > +vect_detect_pair_op (slp_tree node, bool two_operands = true,
> > +		     vec<slp_tree> *ops = NULL)
> > +{
> > +  if (!two_operands && SLP_TREE_CODE (node) == VEC_PERM_EXPR)
> > +    return CMPLX_NONE;
> > +
> > +  if (SLP_TREE_CHILDREN (node).length () != 2)
> > +    return CMPLX_NONE;
> > +
> > +  vec<slp_tree> children = SLP_TREE_CHILDREN (node);
> > +  lane_permutation_t &lanes = SLP_TREE_LANE_PERMUTATION (node);
> > +
> > +  return vect_detect_pair_op (children[0], children[1], lanes,
> two_operands,
> > +			      ops);
> > +}
> > +
> > +/* This function marks every statement that is being replaced during the
> > +   the pattern matching as PURE.  Normally when replacing a statement
> due
> > +   to a pattern we add the statement to the
> STMT_VINFO_PATTERN_DEF_SEQ of
> > +   the pattern that is replacing them.  In this case however this won't
> > +   work as when doing the replacement we are changing the nodes that
> are
> > +   used by the statements.  This means that when vectorized the SSA chain
> > +   is different than in the BB.
> > +
> > +   Declaring the statements as part of the sequence will then cause SSA
> > +   verification to fail as we may refer to statements that were not in the
> > +   original USE-DEF chain of the statement we are replacing.
> > +
> > +   The relevancy of the statements are saved and are later restored if SLP
> is to
> > +   be dissolved.  */
> > +
> > +static void
> > +vect_mark_stmts_as_in_pattern (hash_set<slp_tree> *cache, slp_tree
> node)
> > +{
> > +  if (cache->add (node))
> > +    return;
> > +
> > +  unsigned i;
> > +  stmt_vec_info stmt_info;
> > +  slp_tree child;
> > +
> > +  FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt_info)
> > +    {
> > +      if (gimple_assign_load_p (STMT_VINFO_STMT (stmt_info)))
> > +       return;
> > +
> > +      vect_save_relevancy (stmt_info);
> > +      STMT_VINFO_SLP_VECT_ONLY (stmt_info) = true;
> > +      STMT_SLP_TYPE (stmt_info) = pure_slp;
> > +    }
> > +
> > +  FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
> > +    vect_mark_stmts_as_in_pattern (cache, child);
> > +}
> > +
> > +
> >
> +/*********************************************************
> **********************
> > + * complex_pattern class
> > +
> **********************************************************
> ********************/
> > +
> > +/* SLP Complex Numbers pattern matching.
> > +
> > +  As an example, the following simple loop:
> > +
> > +    double a[restrict N]; double b[restrict N]; double c[restrict N];
> > +
> > +    for (int i=0; i < N; i+=2)
> > +    {
> > +      c[i] = a[i] - b[i+1];
> > +      c[i+1] = a[i+1] + b[i];
> > +    }
> > +
> > +  which represents a complex addition on with a rotation of 90* around
> the
> > +  argand plane. i.e. if `a` and `b` were complex numbers then this would be
> the
> > +  same as `a + (b * I)`.
> > +
> > +  Here the expressions for `c[i]` and `c[i+1]` are independent but have to
> be
> > +  both recognized in order for the pattern to work.  As an SLP tree this is
> > +  represented as
> > +
> > +                +--------------------------------+
> > +                |       stmt 0 *_9 = _10;        |
> > +                |       stmt 1 *_15 = _16;       |
> > +                +--------------------------------+
> > +                                |
> > +                                |
> > +                                v
> > +                +--------------------------------+
> > +                |     stmt 0 _10 = _4 - _8;      |
> > +                |    stmt 1 _16 = _12 + _14;     |
> > +                | lane permutation { 0[0] 1[1] } |
> > +                +--------------------------------+
> > +                            |        |
> > +                            |        |
> > +                            |        |
> > +               +-----+      |        |      +-----+
> > +               |     |      |        |      |     |
> > +         +-----| { } |<-----+        +----->| { } --------+
> > +         |     |     |   +------------------|     |       |
> > +         |     +-----+   |                  +-----+       |
> > +         |        |      |                                |
> > +         |        |      |                                |
> > +         |        +------|------------------+             |
> > +         |               |                  |             |
> > +         v               v                  v             v
> > +     +--------------------------+     +--------------------------------+
> > +     |     stmt 0 _8 = *_7;     |     |        stmt 0 _4 = *_3;        |
> > +     |    stmt 1 _14 = *_13;    |     |       stmt 1 _12 = *_11;       |
> > +     | load permutation { 1 0 } |     |    load permutation { 0 1 }    |
> > +     +--------------------------+     +--------------------------------+
> > +
> > +  The pattern matcher allows you to replace both statements 0 and 1 or
> none at
> > +  all.  Because this operation is a two operands operation the actual nodes
> > +  being replaced are those in the { } nodes.  The actual scalar statements
> > +  themselves are not replaced or used during the matching but instead the
> > +  SLP_TREE_REPRESENTATIVE statements are inspected.  You are also
> allowed to
> > +  replace and match on any number of nodes.
> > +
> > +  Because the pattern matcher matches on the representative statement
> for the
> > +  SLP node the case of two_operators it allows you to match the children
> of the
> > +  node.  This is done using the method `recognize ()`.
> > +
> > +*/
> > +
> > +/* The complex_pattern class contains common code for pattern
> matchers that work
> > +   on complex numbers.  These provide functionality to allow de-
> construction and
> > +   validation of sequences depicting/transforming REAL and IMAG pairs.  */
> > +
> > +class complex_pattern : public vect_pattern
> > +{
> > +  protected:
> > +    vec<slp_tree> *m_nodes = NULL;
> > +    complex_pattern (slp_tree *node)
> > +      : vect_pattern (node)
> > +    { }
> > +
> > +  public:
> > +    bool validate_p (slp_tree_to_load_perm_map_t *);
> > +    gcall *build (vec_info *);
> > +};
> > +
> > +/* The post transform and validation function for the complex number
> > +   patterns.  This will re-arrange the tree and re-organize the nodes such
> > +   that they can be used by the complex number instructions that are to be
> > +   created.  It does this by doing the following steps:
> > +
> > +   For each node that isn't already linear a new permute node is created
> which
> > +   when applied makes the node linear.  If such permutation does not exist
> then
> > +   the function returns FALSE.  For nodes that are already linear no work is
> > +   done.   */
> > +
> > +bool
> > +complex_pattern::validate_p (slp_tree_to_load_perm_map_t
> *perm_cache)
> > +{
> > +  slp_tree node;
> > +  unsigned ix;
> > +  auto_vec<slp_tree> workset;
> > +  FOR_EACH_VEC_ELT (this->m_ops, ix, node)
> > +    {
> > +      auto_vec<slp_tree> nodes;
> > +      nodes.create (this->m_num_args);
> > +      slp_tree tmp = NULL;
> > +      auto_vec<slp_tree> new_nodes;
> > +
> > +      unsigned i;
> > +      FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, tmp)
> > +	{
> > +	  slp_tree vnode = NULL;
> > +	  if (vect_slp_make_linear (perm_cache, node, tmp, &vnode))
> > +	    {
> > +	      nodes.quick_push (vnode);
> > +	      if (vnode != tmp)
> > +		new_nodes.safe_push (vnode);
> > +	    }
> > +	  else
> > +	    {
> > +	      FOR_EACH_VEC_ELT (new_nodes, i, tmp)
> > +		delete tmp;
> > +
> > +	      FOR_EACH_VEC_ELT (workset, i, tmp)
> > +		delete tmp;
> > +
> > +	      nodes.release();
> > +	      return false;
> > +	    }
> > +	}
> > +
> > +      slp_tree new_node
> > +	= vect_create_new_slp_node (SLP_TREE_SCALAR_STMTS (node),
> > +				    SLP_TREE_CHILDREN (node).length ());
> > +      SLP_TREE_VECTYPE (new_node) = SLP_TREE_VECTYPE (node);
> > +      SLP_TREE_LANE_PERMUTATION (new_node) =
> SLP_TREE_LANE_PERMUTATION (node);
> > +      SLP_TREE_CODE (new_node) = SLP_TREE_CODE (node);
> > +      SLP_TREE_REF_COUNT (new_node) = SLP_TREE_REF_COUNT (node);
> > +      SLP_TREE_REPRESENTATIVE (new_node) =
> SLP_TREE_REPRESENTATIVE (node);
> > +      SLP_TREE_CHILDREN (new_node).safe_splice (nodes);
> > +      SLP_TREE_LANES (new_node) = SLP_TREE_LANES (node);
> > +
> > +      workset.safe_push (new_node);
> > +    }
> > +
> > +  FOR_EACH_VEC_ELT (workset, ix, node)
> > +    SLP_TREE_CHILDREN (*this->m_node)[ix] = node;
> > +
> > +  return true;
> > +}
> > +
> > +
> > +/* Create a replacement pattern statement for each node in m_node and
> inserts
> > +   the new statement into m_node as the new representative statement.
> The old
> > +   statement is marked as being in a pattern defined by the new statement.
> The
> > +   statement is created as call to internal function IFN with m_num_args
> > +   arguments.
> > +
> > +   Futhermore the new pattern is also added to the vectorization
> information
> > +   structure VINFO and the old statement STMT_INFO is marked as unused
> while
> > +   the new statement is marked as used and the number of SLP uses of the
> new
> > +   statement is incremented.
> > +
> > +   The newly created SLP nodes are marked as SLP only and will be
> dissolved
> > +   if SLP is aborted.
> > +
> > +   The newly created gimple call is returned and the BB remains unchanged.
> > +
> > +   This default method is designed to only match against simple operands
> where
> > +   all the input and output types are the same.
> > +*/
> > +
> > +gcall *
> > +complex_pattern::build (vec_info *vinfo)
> > +{
> > +  stmt_vec_info stmt_info;
> > +
> > +  auto_vec<tree> args;
> > +  args.create (this->m_num_args);
> > +  args.quick_grow_cleared (this->m_num_args);
> > +  slp_tree node;
> > +  unsigned ix;
> > +  stmt_vec_info call_stmt_info;
> > +  gcall *call_stmt = NULL;
> > +
> > +  FOR_EACH_VEC_ELT (*this->m_nodes, ix, node)
> 
> So elsewhere I see m_nodes assigned from SLP_TREE_CHILDREN
> of m_node.  I assume m_node is the VEC_PERM of the two-operator
> operation.  I expected we create a call for m_node here, not
> for all children?

It depends on the pattern. For most, particularly those of two_operands nodes,
unless I have mistaken something it's the children of the node that determines
what operation is being performed isn't it?

So for  ADD for instance, the tree is

node 0x49aeae0 (max_nunits=4, refcnt=2)
 stmt 0 REALPART_EXPR <*_10> = _4;
 stmt 1 IMAGPART_EXPR <*_10> = _22;
 children 0x49b0b20
node 0x49b0b20 (max_nunits=4, refcnt=2)
 stmt 0 _4 = _13 - _6;
 stmt 1 _22 = _8 + _12;
 lane permutation { 0[0] 1[1] }
 children 0x49b0430 0x49a73c0
node 0x49b0430 (max_nunits=1, refcnt=1)
 { }
 children 0x49ad660 0x499e400
node 0x49ad660 (max_nunits=4, refcnt=3)
 stmt 0 _13 = REALPART_EXPR <*_3>;
 stmt 1 _12 = IMAGPART_EXPR <*_3>;
 load permutation { 0 1 }
node 0x499e400 (max_nunits=4, refcnt=3)
 stmt 0 _6 = IMAGPART_EXPR <*_5>;
 stmt 1 _8 = REALPART_EXPR <*_5>;
 load permutation { 1 0 }
node 0x49a73c0 (max_nunits=1, refcnt=1)
 { }
 children 0x49ad660 0x499e400

where the VEC_PERM node itself just contains the permute and the children of it
0x49b0430 and 0x49a73c0 contain the operations - and + and from what I understood
the lane permute defined how the input is deinterleaved between the two nodes.

So when replacing them it's those two that I need to change and not VEC_PERM itself
otherwise the node could contain a permute and an operation that needs to be vectorized?

Or did you mean I should dissolve the two_operand node completely at this point? I was trying
to minimize the number of changes done on each replacement, because then it means I can assume
that any information I have already calculated of the children of the node being replaced is still the same
when doing recursive matching.

The tree directly out of the matcher is 

    node 0x49aeae0 (max_nunits=4, refcnt=2)
     stmt 0 REALPART_EXPR <*_10> = _4;
     stmt 1 IMAGPART_EXPR <*_10> = _22;
     children 0x49b0b20
    node 0x49b0b20 (max_nunits=4, refcnt=2)
     stmt 0 _4 = _13 - _6;
     stmt 1 _22 = _8 + _12;
     lane permutation { 0[0] 1[1] }
     children 0x4ade0c0 0x4ade220
    node 0x4ade0c0 (max_nunits=1, refcnt=1)
     { }
     SLP Only pattern:
      slp_patt_41 = .COMPLEX_ADD_ROT90 (_pa_43, _pa_42);
     children 0x49ad660 0x49b60f0
    node 0x49ad660 (max_nunits=4, refcnt=3)
     stmt 0 _13 = REALPART_EXPR <*_3>;
     stmt 1 _12 = IMAGPART_EXPR <*_3>;
     load permutation { 0 1 }
    node 0x49b60f0 (max_nunits=1, refcnt=1)
     { }
     lane permutation { 0[1] 0[0] }
     children 0x499e400
    node 0x499e400 (max_nunits=4, refcnt=3)
     stmt 0 _6 = IMAGPART_EXPR <*_5>;
     stmt 1 _8 = REALPART_EXPR <*_5>;
     load permutation { 1 0 }
    node 0x4ade220 (max_nunits=1, refcnt=1)
     { }
     SLP Only pattern:
      slp_patt_32 = .COMPLEX_ADD_ROT90 (_pa_40, _pa_33);
     children 0x49ad660 0x4ade180
    node 0x4ade180 (max_nunits=1, refcnt=1)
     { }
     lane permutation { 0[1] 0[0] }
     children 0x499e400

The one exception to this is when the node being replaced isn't a two_operands node,
like is the case with FMA.  In that case the node is just an ADD node which itself is replaced.
This is why it's a variable, so I can tell it where to perform the replacement.

Thanks,
Tamar

> 
> > +    {
> > +      /* Calculate the location of the statement in NODE to replace.  */
> > +      stmt_info = SLP_TREE_REPRESENTATIVE (node);
> > +      gimple* old_stmt = STMT_VINFO_STMT (stmt_info);
> > +      tree type = TREE_TYPE (gimple_get_lhs (old_stmt));
> > +
> > +      /* Create the argument set for use by gimple_build_call_internal_vec.
> */
> > +      for (unsigned i = 0; i < this->m_num_args; i++)
> > +	{
> > +	  tree var = make_temp_ssa_name (type, old_stmt, "_pa");
> 
> Why do you need to create N new SSA names (all with the same
> definition statement)?  It looks like this new "scalar" stmt
> is only a template stored in SLP_TREE_REPRESENTATIVE
> and supposedly there needs to be "something" in the actual
> arguments it make us not crash later?  Can't you for example
> simply push the old_stmt LHS N times?
> 
> For VEC_PERM I originally wanted to set SLP_TREE_REPRESENTATIVE
> to NULL_TREE but we don't like that obviously.  So long-term
> we might want to change SLP_TREE_CODE to a match_code to allow
> tree codes and internal-functions and "ignore" the representative
> in case it is not ERROR_MARK.
> 
> > +	  args[i] = var;
> > +	}
> > +
> > +      /* Create the new pattern statements.  */
> > +      call_stmt = gimple_build_call_internal_vec (this->m_ifn, args);
> > +      tree var = make_temp_ssa_name (type, call_stmt, "slp_patt");
> > +      gimple_call_set_lhs (call_stmt, var);
> > +      gimple_set_location (call_stmt, gimple_location (old_stmt));
> > +      gimple_call_set_nothrow (call_stmt, true);
> > +
> > +      /* Adjust the book-keeping for the new and old statements for use
> during
> > +	 SLP.  This is required to get the right VF and statement during SLP
> > +	 analysis.  These changes are created after relevancy has been set for
> > +	 the nodes as such we need to manually update them.  Any changes
> will be
> > +	 undone if SLP is cancelled.  */
> > +      call_stmt_info
> > +	= vinfo->add_pattern_stmt (call_stmt, stmt_info);
> > +
> > +      /* add_pattern_stmt can't be done in vect_mark_pattern_stmts
> because
> > +	 the non-SLP pattern matchers already have added the statement to
> VINFO
> > +	 by the time it is called.  Some of them need to modify the returned
> > +	 stmt_info.  vect_mark_pattern_stmts is called by recog_pattern and
> it
> > +	 would increase the size of each pattern with boilerplate code to
> make
> > +	 the call there.  */
> > +      vect_mark_pattern_stmts (vinfo, stmt_info, call_stmt,
> > +			       SLP_TREE_VECTYPE (node));
> > +      STMT_SLP_TYPE (call_stmt_info) = pure_slp;
> > +      STMT_VINFO_SLP_VECT_ONLY (call_stmt_info) = true;
> > +
> > +      /* Since we are replacing all the statements in the group with the same
> > +	 thing it doesn't really matter.  So just set it every time a new stmt
> > +	 is created.  */
> > +      SLP_TREE_REPRESENTATIVE (node) = call_stmt_info;
> > +      SLP_TREE_CODE (node) = CALL_EXPR;
> > +    }
> > +  return call_stmt;
> > +}
> > +
> >
> +/*********************************************************
> **********************
> > + * complex_add_pattern class
> > +
> **********************************************************
> ********************/
> > +
> > +class complex_add_pattern : public complex_pattern
> > +{
> > +  protected:
> > +    complex_add_pattern (slp_tree *node)
> > +      : complex_pattern (node)
> > +    {
> > +      this->m_num_args = 2;
> > +    }
> > +
> > +  public:
> > +    static vect_pattern* create (slp_tree *node)
> > +    {
> > +       return new complex_add_pattern (node);
> > +    }
> > +
> > +    const char* get_name ()
> > +    {
> > +      return "Complex Addition";
> > +    }
> > +
> > +    bool matches (slp_tree_to_load_perm_map_t *);
> > +    bool matches (complex_operation_t op,
> slp_tree_to_load_perm_map_t *,
> > +		  vec<slp_tree> ops);
> > +};
> > +
> > +/* Pattern matcher for trying to match complex addition pattern in SLP
> tree.
> > +
> > +   If no match is found then IFN is set to IFN_LAST.
> > +   This function matches the patterns shaped as:
> > +
> > +   c[i] = a[i] - b[i+1];
> > +   c[i+1] = a[i+1] + b[i];
> > +
> > +   If a match occurred then TRUE is returned, else FALSE.  The initial match
> is
> > +   expected to be in OP1 and the initial match operands in args0.  */
> > +
> > +bool
> > +complex_add_pattern::matches (complex_operation_t op,
> > +			      slp_tree_to_load_perm_map_t *perm_cache,
> > +			      vec<slp_tree> ops)
> > +{
> > +  this->m_ifn = IFN_LAST;
> > +  this->m_ops.safe_splice (ops);
> > +
> > +  /* Find the two components.  Rotation in the complex plane will modify
> > +     the operations:
> > +
> > +      * Rotation  0: + +
> > +      * Rotation 90: - +
> > +      * Rotation 180: - -
> > +      * Rotation 270: + -
> > +
> > +      Rotation 0 and 180 can be handled by normal SIMD code, so we don't
> need
> > +      to care about them here.  */
> > +  if (op == MINUS_PLUS)
> > +    this->m_ifn = IFN_COMPLEX_ADD_ROT90;
> > +  else if (op == PLUS_MINUS)
> > +    this->m_ifn = IFN_COMPLEX_ADD_ROT270;
> > +
> > +  /* verify that there is a permute, otherwise this isn't a pattern we
> > +     we support.  */
> > +  bool is_linear = false;
> > +  bool all_linear = true;
> > +  unsigned x;
> > +  for (x = 0; x < this->m_ops.length () && all_linear; x++)
> > +    {
> > +      linear_loads_p (perm_cache, this->m_ops[x], &is_linear);
> > +      all_linear = all_linear && is_linear;
> > +    }
> > +
> > +  if (this->m_ifn == IFN_LAST || all_linear)
> > +    return false;
> > +
> > +  this->m_nodes = &SLP_TREE_CHILDREN (*this->m_node);
> > +  return true;
> > +}
> > +
> > +bool
> > +complex_add_pattern::matches (slp_tree_to_load_perm_map_t
> *perm_cache)
> > +{
> > +  complex_operation_t op
> > +    = vect_detect_pair_op (*this->m_node, true, &this->m_ops);
> > +  return matches (op, perm_cache, this->m_ops);
> > +}
> > +
> >
> +/*********************************************************
> **********************
> > + * complex_mul_pattern
> > +
> **********************************************************
> ********************/
> > +
> > +/* Helper function of that looks for a match in the CHILDth child of NODE.
> The
> > +   child used is stored in RES.
> > +
> > +   If the match is successful then ARGS will contain the operands matched
> > +   and the complex_operation_t type is returned.  If match is not
> successful
> > +   then CMPLX_NONE is returned and ARGS is left unmodified.  */
> > +
> > +static inline complex_operation_t
> > +vect_match_call_complex_mla (slp_tree node, unsigned child,
> > +			     vec<slp_tree> *args = NULL, slp_tree *res = NULL)
> > +{
> > +  gcc_assert (child < SLP_TREE_CHILDREN (node).length ());
> > +
> > +  slp_tree data = SLP_TREE_CHILDREN (node)[child];
> > +
> > +  if (res)
> > +    *res = data;
> > +
> > +  return vect_detect_pair_op (data, false, args);
> > +}
> > +
> > +/* Check to see if either of the trees in ARGS are a NEGATE_EXPR.  If the
> first
> > +   child (args[0]) is a NEGATE_EXPR then NEG_FIRST_P is set to TRUE.
> > +
> > +   If a negate is found then the values in ARGS are reordered such that the
> > +   negate node is always the second one and the entry is replaced by the
> child
> > +   of the negate node.  */
> > +
> > +static inline bool
> > +vect_normalize_conj_loc (vec<slp_tree> args, bool *neg_first_p = NULL)
> > +{
> > +  gcc_assert (args.length () == 2);
> > +  bool neg_found = false;
> > +
> > +  if (vect_match_expression_p (args[0], NEGATE_EXPR))
> > +    {
> > +      std::swap (args[0], args[1]);
> > +      neg_found = true;
> > +      if (neg_first_p)
> > +	*neg_first_p = true;
> > +    }
> > +  else if (vect_match_expression_p (args[1], NEGATE_EXPR))
> > +    {
> > +      neg_found = true;
> > +      if (neg_first_p)
> > +	*neg_first_p = false;
> > +    }
> > +
> > +  if (neg_found)
> > +    args[1] = SLP_TREE_CHILDREN (args[1])[0];
> > +
> > +  return neg_found;
> > +}
> > +
> > +/* Helper function that checks to see if the load permutation PERM is on
> that
> > +   belongs to a duplication operation.  i.e. checks to see if all the values in
> > +   PERM are the same.  */
> > +
> > +static inline bool
> > +vect_is_dup (load_permutation_t perm, unsigned idx)
> > +{
> > +  for (unsigned i = 0; i < perm.length (); i++)
> > +    if (perm[i] != idx)
> > +     return false;
> > +
> > +  return true;
> > +}
> > +
> > +/* Helper function that checks to see if LEFT_OP and RIGHT_OP are both
> MULT_EXPR
> > +   nodes but also that they represent an operation that is either a complex
> > +   multiplication or a complex multiplication by conjucated value.
> > +
> > +   Of the negation is expected to be in the first half of the tree (As
> required
> > +   by an FMS pattern) then NEG_FIRST is true.  If the operation is a
> conjucate
> > +   operation then CONJ_FIRST_OPERAND is set to indicate whether the
> first or
> > +   second operand contains the conjucate operation.  */
> > +
> > +static inline bool
> > +vect_validate_multiplication (slp_tree_to_load_perm_map_t
> *perm_cache,
> > +			     vec<slp_tree> left_op, vec<slp_tree> right_op,
> > +			     bool neg_first, bool *conj_first_operand)
> > +{
> > +  /* The presence of a negation indicates that we have either a conjucate
> or a
> > +     rotation.  We need to distinguish which one.  */
> > +  *conj_first_operand = false;
> > +
> > +  load_permutation_t perm;
> > +  bool is_linear;
> > +  /* Complex conjucates have the negation on the imaginary part of the
> > +     number where rotations affect the real component.  So check if the
> > +     negation is on a dup of lane 1.  */
> > +  perm = linear_loads_p (perm_cache, right_op[1], &is_linear);
> > +  if (is_linear || !vect_is_dup (perm, 1))
> > +    return false;
> > +
> > +  /* Check if the conjucate is on the second first or second operand.  The
> > +     order of the node with the conjucate value determines this, and the
> dup
> > +     node must be one of lane 0 of the same DR as the neg node.  */
> > +  perm = linear_loads_p (perm_cache, left_op[0], &is_linear);
> > +  if (is_linear)
> > +    {
> > +      perm = linear_loads_p (perm_cache, left_op[1], &is_linear);
> > +      if (is_linear)
> > +	return false;
> > +    }
> > +  else if (!neg_first)
> > +    *conj_first_operand = true;
> > +  else
> > +    return false;
> > +
> > +  if (!vect_is_dup (perm, 0))
> > +    return false;
> > +
> > +  return true;
> > +}
> > +
> > +/* Helper function to help distinguish between a conjucate and a rotation
> in a
> > +   complex multiplication.  The operations have similar shapes but the
> order of
> > +   the load permutes are different.  This function returns TRUE when the
> order
> > +   is consistent with a multiplication or multiplication by conjucated
> > +   operand but returns FALSE if it's a multiplication by rotated operand.  */
> > +
> > +static inline bool
> > +vect_validate_multiplication (slp_tree_to_load_perm_map_t
> *perm_cache,
> > +			     vec<slp_tree> op, unsigned idx)
> > +{
> > +  /* The left node is the more common case, test it first.  */
> > +  bool is_linear;
> > +  load_permutation_t perm = linear_loads_p (perm_cache, op[0],
> &is_linear);
> > +  if (is_linear || !vect_is_dup (perm, idx))
> > +    {
> > +      perm = linear_loads_p (perm_cache, op[1], &is_linear);
> > +      if (is_linear || !vect_is_dup (perm, idx))
> > +	return false;
> > +    }
> > +
> > +  return true;
> > +}
> > +
> > +class complex_mul_pattern : public complex_pattern
> > +{
> > +  protected:
> > +    /* Allocate enough space for FMA as well.  */
> > +    map_t m_blocks[6] = {};
> > +    bool m_inplace = false;
> > +    complex_mul_pattern (slp_tree *node)
> > +      : complex_pattern (node)
> > +    {
> > +      this->m_num_args = 2;
> > +    }
> > +
> > +  public:
> > +    static vect_pattern* create (slp_tree *node)
> > +    {
> > +       return new complex_mul_pattern (node);
> > +    }
> > +
> > +    const char* get_name ()
> > +    {
> > +      return "Complex Multiplication";
> > +    }
> > +
> > +    bool validate_p (slp_tree_to_load_perm_map_t *);
> > +    bool matches (slp_tree_to_load_perm_map_t *);
> > +    bool matches (complex_operation_t, slp_tree_to_load_perm_map_t *,
> > +		  vec<slp_tree>);
> > +};
> > +
> > +/* Pattern matcher for trying to match complex multiply pattern in SLP
> tree
> > +   If the operation matches then IFN is set to the operation it matched
> > +   and the arguments to the two replacement statements are put in
> m_ops.
> > +
> > +   If no match is found then IFN is set to IFN_LAST and m_ops is unchanged.
> > +
> > +   This function matches the patterns shaped as:
> > +
> > +   double ax = (b[i+1] * a[i]);
> > +   double bx = (a[i+1] * b[i]);
> > +
> > +   c[i] = c[i] - ax;
> > +   c[i+1] = c[i+1] + bx;
> > +
> > +   If a match occurred then TRUE is returned, else FALSE.  The initial match
> is
> > +   expected to be in OP1 and the initial match operands in args0.  */
> > +
> > +bool
> > +complex_mul_pattern::matches (complex_operation_t op,
> > +			      slp_tree_to_load_perm_map_t *perm_cache,
> > +			      vec<slp_tree> /* ops */)
> > +{
> > +  this->m_ifn = IFN_LAST;
> > +
> > +  if (op != MINUS_PLUS)
> > +    return false;
> > +
> > +  slp_tree root = *this->m_node;
> > +  /* First two nodes must be a multiply.  */
> > +  auto_vec<slp_tree> muls;
> > +  if (vect_match_call_complex_mla (root, 0) != MULT_MULT
> > +      || vect_match_call_complex_mla (root, 1, &muls) != MULT_MULT)
> > +    return false;
> > +
> > +  /* Now operand2+4 may lead to another expression.  */
> > +  auto_vec<slp_tree> left_op, right_op;
> > +  left_op.safe_splice (SLP_TREE_CHILDREN (muls[0]));
> > +  right_op.safe_splice (SLP_TREE_CHILDREN (muls[1]));
> > +
> > +  bool neg_first;
> > +  bool is_neg = vect_normalize_conj_loc (right_op, &neg_first);
> > +
> > +  if (!is_neg)
> > +    {
> > +      /* A multiplication needs to multiply agains the real pair, otherwise
> > +	 the pattern matches that of FMS.   */
> > +      if (!vect_validate_multiplication (perm_cache, left_op, 0))
> > +	return false;
> > +      this->m_ifn = IFN_COMPLEX_MUL;
> > +      this->m_ops.safe_splice (left_op);
> > +      this->m_ops.safe_splice (right_op);
> > +    }
> > +  else if (is_neg)
> > +    {
> > +      bool conj_first_operand;
> > +      if (!vect_validate_multiplication (perm_cache, left_op, right_op,
> > +					 neg_first, &conj_first_operand))
> > +	return false;
> > +
> > +      this->m_ifn = IFN_COMPLEX_MUL_CONJ;
> > +      /* Check if the conjucate is on the first or second parameter.  */
> > +      if (!conj_first_operand)
> > +	{
> > +	  this->m_ops.safe_push (left_op[1]);
> > +	  this->m_ops.safe_push (left_op[0]);
> > +	}
> > +      else
> > +	{
> > +	  this->m_ops.safe_push (left_op[0]);
> > +	  this->m_ops.safe_push (left_op[1]);
> > +	}
> > +      this->m_ops.safe_push (right_op[1]);
> > +      this->m_ops.safe_push (right_op[0]);
> > +    }
> > +
> > +  vect_build_perm_groups (perm_cache, &this->m_blocks[0], this-
> >m_ops);
> > +  this->m_nodes = &SLP_TREE_CHILDREN (*this->m_node);
> > +
> > +  return true;
> > +}
> > +
> > +bool
> > +complex_mul_pattern::matches (slp_tree_to_load_perm_map_t
> *perm_cache)
> > +{
> > +  complex_operation_t op
> > +    = vect_detect_pair_op (*this->m_node);
> > +  return matches (op, perm_cache, this->m_ops);
> > +}
> > +
> > +
> > +/* Validates to see if the Complex MUL that we have matched is valid.
> This is
> > +   done through a combination of making nodes linear and combining
> nodes.  */
> > +
> > +bool
> > +complex_mul_pattern::validate_p (slp_tree_to_load_perm_map_t
> *perm_cache)
> > +{
> > +  slp_tree node;
> > +  unsigned ix;
> > +  hash_set<slp_tree> cache;
> > +  auto_vec<slp_tree> workset;
> > +  FOR_EACH_VEC_ELT (*this->m_nodes, ix, node)
> > +    {
> > +      auto_vec<slp_tree> nodes;
> > +      nodes.create (this->m_num_args);
> > +      slp_tree tmp = NULL;
> > +      auto_vec<slp_tree> new_nodes;
> > +
> > +      unsigned i;
> > +      for (i = 0; i < this->m_num_args; i++)
> > +	{
> > +	  unsigned index = (ix * this->m_num_args) + i;
> > +	  map_t map = this->m_blocks[index];
> > +	  slp_tree vnode = NULL;
> > +	  bool needs_linear = map.a == map.b;
> > +	  tmp = this->m_ops[index];
> > +	  cache.add (tmp);
> > +	  if (needs_linear
> > +	      && vect_slp_make_linear (perm_cache, node, tmp, &vnode))
> > +	    {
> > +	      nodes.quick_push (vnode);
> > +	      if (vnode != tmp)
> > +		new_nodes.safe_push (vnode);
> > +	    }
> > +	  else if (!needs_linear
> > +		   && vect_slp_make_combine_linear (perm_cache, node,
> > +						    this->m_ops, map,
> &vnode))
> > +	    {
> > +	      nodes.quick_push (vnode);
> > +	      if (vnode != tmp)
> > +		new_nodes.safe_push (vnode);
> > +	    }
> > +	  else
> > +	    {
> > +	      if (dump_enabled_p ())
> > +		dump_printf_loc (MSG_NOTE, vect_location,
> > +				"stmts could not be made %s %p\n",
> > +				needs_linear ? "linear" : "linear/combined",
> > +				tmp);
> > +	      FOR_EACH_VEC_ELT (new_nodes, i, tmp)
> > +		delete tmp;
> > +
> > +	      if (!m_inplace)
> > +		FOR_EACH_VEC_ELT (workset, i, tmp)
> > +		  delete tmp;
> > +
> > +	      nodes.release();
> > +	      return false;
> > +	    }
> > +
> > +	  vect_mark_stmts_as_in_pattern (&cache, node);
> > +	}
> > +
> > +      if (m_inplace)
> > +	{
> > +	  workset.safe_splice (nodes);
> > +	}
> > +      else
> > +	{
> > +	  slp_tree new_node
> > +	    = vect_create_new_slp_node (SLP_TREE_SCALAR_STMTS (node),
> > +					SLP_TREE_CHILDREN (node).length
> ());
> > +	  SLP_TREE_VECTYPE (new_node) = SLP_TREE_VECTYPE (node);
> > +	  SLP_TREE_LANE_PERMUTATION (new_node)
> > +	    = SLP_TREE_LANE_PERMUTATION (node);
> > +	  SLP_TREE_CODE (new_node) = SLP_TREE_CODE (node);
> > +	  SLP_TREE_REF_COUNT (new_node) = SLP_TREE_REF_COUNT
> (node);
> > +	  SLP_TREE_REPRESENTATIVE (new_node) =
> SLP_TREE_REPRESENTATIVE (node);
> > +	  SLP_TREE_CHILDREN (new_node).safe_splice (nodes);
> > +	  SLP_TREE_LANES (new_node) = SLP_TREE_LANES (node);
> > +
> > +	  workset.safe_push (new_node);
> > +	}
> > +    }
> > +
> > +  if (m_inplace)
> > +    {
> > +      SLP_TREE_CHILDREN (*this->m_node).truncate (0);
> > +      SLP_TREE_CHILDREN (*this->m_node).safe_splice (workset);
> > +    }
> > +  else
> > +    {
> > +      FOR_EACH_VEC_ELT (workset, ix, node)
> > +	SLP_TREE_CHILDREN (*this->m_node)[ix] = node;
> > +    }
> > +
> > +  return true;
> > +}
> > +
> >
> +/*********************************************************
> **********************
> > + * complex_fma_pattern class
> > +
> **********************************************************
> ********************/
> > +
> > +class complex_fma_pattern : public complex_mul_pattern
> > +{
> > +  protected:
> > +    /* Temporary nodes cache as scope of the class so we don't
> > +       have to worry about freeing it. */
> > +    auto_vec<slp_tree> m_ncache;
> > +    /* List of nodes to mark as no longer being a mul.  The first
> > +       one is chosen and the last one is dangling.  */
> > +    auto_vec<slp_tree> m_mul_nodes;
> > +    complex_fma_pattern (slp_tree *node)
> > +      : complex_mul_pattern (node)
> > +    {
> > +      this->m_num_args = 3;
> > +    }
> > +
> > +  public:
> > +    static vect_pattern* create (slp_tree *node)
> > +    {
> > +       return new complex_fma_pattern (node);
> > +    }
> > +
> > +    const char* get_name ()
> > +    {
> > +      return "Complex FMA";
> > +    }
> > +
> > +    bool matches (slp_tree_to_load_perm_map_t *);
> > +    bool matches (complex_operation_t, slp_tree_to_load_perm_map_t *,
> > +		  vec<slp_tree> ops);
> > +    bool validate_p (slp_tree_to_load_perm_map_t *);
> > +};
> > +
> > +/* Helper function to "reset" a previously matched node and undo the
> changes
> > +   made enough so that the node is treated as an irrelevant node.  */
> > +
> > +static inline void
> > +vect_slp_reset_pattern (slp_tree node)
> > +{
> > +  stmt_vec_info stmt_info = vect_orig_stmt (SLP_TREE_REPRESENTATIVE
> (node));
> > +  STMT_VINFO_IN_PATTERN_P (stmt_info) = false;
> > +  STMT_SLP_TYPE (stmt_info) = pure_slp;
> > +  SLP_TREE_REPRESENTATIVE (node) = stmt_info;
> > +}
> > +
> > +/* Pattern matcher for trying to match complex multiply and accumulate
> > +   and multiply and subtract patterns in SLP tree.
> > +   If the operation matches then IFN is set to the operation it matched and
> > +   the arguments to the two replacement statements are put in m_ops.
> > +
> > +   If no match is found then IFN is set to IFN_LAST and m_ops is unchanged.
> > +
> > +   This function matches the patterns shaped as:
> > +
> > +   double ax = (b[i+1] * a[i]) + (b[i] * a[i]);
> > +   double bx = (a[i+1] * b[i]) - (a[i+1] * b[i+1]);
> > +
> > +   c[i] = c[i] - ax;
> > +   c[i+1] = c[i+1] + bx;
> > +
> > +   If a match occurred then TRUE is returned, else FALSE.  The match is
> > +   performed after COMPLEX_MUL which would have done the majority of
> the work.
> > +   This function merely matches an ADD with a COMPLEX_MUL IFN.  The
> initial
> > +   match is expected to be in OP1 and the initial match operands in args0.
> */
> > +
> > +bool
> > +complex_fma_pattern::matches (complex_operation_t op1,
> > +			      slp_tree_to_load_perm_map_t * /* perm_cache
> */,
> > +			      vec<slp_tree> /* args0 */)
> > +{
> > +  this->m_ifn = IFN_LAST;
> > +
> > +  /* Find the two components.  We match Complex MUL first which
> reduces the
> > +     amount of work this pattern has to do.  After that we just match the
> > +     head node and we're done.:
> > +
> > +     * FMA: + +.
> > +
> > +     We need to ignore the two_operands nodes that may also match.
> > +     For that we can check if they have any scalar statements and also
> > +     check that it's not a permute node as we're looking for a normal
> > +     PLUS_EXPR operation.  */
> > +  if (op1 != CMPLX_NONE)
> > +    return false;
> > +
> > +  /* Find the two components.  We match Complex MUL first which
> reduces the
> > +     amount of work this pattern has to do.  After that we just match the
> > +     head node and we're done.:
> > +
> > +   * FMA: + + on a non-two_operands node.  */
> > +  if (SLP_TREE_LANE_PERMUTATION (*this->m_node).exists ()
> > +      || !SLP_TREE_SCALAR_STMTS (*this->m_node).exists ()
> > +      || !vect_match_expression_p (*this->m_node, PLUS_EXPR))
> > +    return false;
> > +
> > +  slp_tree node = SLP_TREE_CHILDREN (*this->m_node)[1];
> > +
> > +  if (vect_match_two_call_p (node, IFN_COMPLEX_MUL))
> > +    this->m_ifn = IFN_COMPLEX_FMA;
> > +  else if (vect_match_two_call_p (node, IFN_COMPLEX_MUL_CONJ))
> > +    this->m_ifn = IFN_COMPLEX_FMA_CONJ;
> > +  else
> > +    return false;
> > +
> > +  this->m_mul_nodes.safe_splice (SLP_TREE_CHILDREN (node));
> > +  node = SLP_TREE_CHILDREN (node)[0];
> > +  this->m_ops.create (this->m_num_args);
> > +
> > +  if (this->m_ifn == IFN_COMPLEX_FMA)
> > +    {
> > +      this->m_ops.quick_push (SLP_TREE_CHILDREN (*this->m_node)[0]);
> > +      this->m_ops.quick_push (SLP_TREE_CHILDREN (node)[1]);
> > +      this->m_ops.quick_push (SLP_TREE_CHILDREN (node)[0]);
> > +    }
> > +  else
> > +    {
> > +      this->m_ops.quick_push (SLP_TREE_CHILDREN (*this->m_node)[0]);
> > +      this->m_ops.quick_push (SLP_TREE_CHILDREN (node)[0]);
> > +      this->m_ops.quick_push (SLP_TREE_CHILDREN (node)[1]);
> > +    }
> > +
> > +  this->m_ncache.safe_push (*this->m_node);
> > +  this->m_nodes = &this->m_ncache;
> > +
> > +  return true;
> > +}
> > +
> > +bool
> > +complex_fma_pattern::matches (slp_tree_to_load_perm_map_t
> *perm_cache)
> > +{
> > +  auto_vec<slp_tree> args0;
> > +  complex_operation_t op
> > +    = vect_detect_pair_op (*this->m_node, true, &args0);
> > +  return matches (op, perm_cache, args0);
> > +}
> > +
> > +bool
> > +complex_fma_pattern::validate_p (slp_tree_to_load_perm_map_t
> *perm_cache)
> > +{
> > +  /* FMA's top level node is + + and so is not two_operator,
> > +     we only need one child.  */
> > +  slp_tree node;
> > +  auto_vec<slp_tree> nodes;
> > +  auto_vec<slp_tree> new_nodes;
> > +  nodes.create (this->m_num_args);
> > +  hash_set<slp_tree> cache;
> > +  unsigned ix;
> > +  FOR_EACH_VEC_ELT (this->m_ops, ix, node)
> > +    {
> > +      slp_tree vnode = NULL;
> > +      if (vect_slp_make_linear (perm_cache, *this->m_node, node,
> &vnode))
> > +	{
> > +	  nodes.quick_push (vnode);
> > +	  if (vnode != node)
> > +	    new_nodes.safe_push (vnode);
> > +	  if (SLP_TREE_CHILDREN (node).length () == 2)
> > +	    {
> > +	      cache.add(SLP_TREE_CHILDREN (node)[0]);
> > +	      cache.add(SLP_TREE_CHILDREN (node)[1]);
> > +	    }
> > +	  else
> > +	    cache.add (node);
> > +	}
> > +      else
> > +	{
> > +	  FOR_EACH_VEC_ELT (new_nodes, ix, node)
> > +	    delete node;
> > +
> > +	  nodes.release();
> > +	  return false;
> > +	}
> > +    }
> > +
> > +  SLP_TREE_CHILDREN (*this->m_node).truncate (0);
> > +  SLP_TREE_CHILDREN (*this->m_node).safe_splice (nodes);
> > +
> > +  /* Ignore the part of the tree that becomes irrelevant for FMA.  */
> > +  vect_slp_reset_pattern (this->m_mul_nodes[0]);
> > +  vect_slp_reset_pattern (this->m_mul_nodes[1]);
> > +  vect_mark_stmts_as_in_pattern (&cache, this->m_mul_nodes[1]);
> > +
> > +  return true;
> > +}
> > +
> > +
> >
> +/*********************************************************
> **********************
> > + * complex_fms_pattern class
> > +
> **********************************************************
> ********************/
> > +
> > +class complex_fms_pattern : public complex_mul_pattern
> > +{
> > +  protected:
> > +    complex_fms_pattern (slp_tree *node)
> > +      : complex_mul_pattern (node)
> > +    {
> > +      this->m_num_args = 3;
> > +    }
> > +
> > +  public:
> > +    static vect_pattern* create (slp_tree *node)
> > +    {
> > +       return new complex_fms_pattern (node);
> > +    }
> > +
> > +    const char* get_name ()
> > +    {
> > +      return "Complex FMS";
> > +    }
> > +
> > +    bool matches (slp_tree_to_load_perm_map_t *);
> > +    bool matches (complex_operation_t, slp_tree_to_load_perm_map_t *,
> > +		  vec<slp_tree>);
> > +};
> > +
> > +/* Pattern matcher for trying to match complex multiply and accumulate
> > +   and multiply and subtract patterns in SLP tree.
> > +   If the operation matches then IFN is set to the operation it matched and
> > +   the arguments to the two replacement statements are put in m_ops.
> > +
> > +   If no match is found then IFN is set to IFN_LAST and m_ops is unchanged.
> > +
> > +   This function matches the patterns shaped as:
> > +
> > +   double ax = (b[i+1] * a[i]) + (b[i] * a[i]);
> > +   double bx = (a[i+1] * b[i]) - (a[i+1] * b[i+1]);
> > +
> > +   c[i] = c[i] - ax;
> > +   c[i+1] = c[i+1] + bx;
> > +
> > +   If a match occurred then TRUE is returned, else FALSE.  The initial match
> is
> > +   expected to be in OP1 and the initial match operands in args0.  */
> > +bool
> > +complex_fms_pattern::matches (complex_operation_t op1,
> > +			      slp_tree_to_load_perm_map_t *perm_cache,
> > +			      vec<slp_tree> args0)
> > +{
> > +  this->m_ifn = IFN_LAST;
> > +
> > +  /* Find the two components.  We match Complex MUL first which
> reduces the
> > +     amount of work this pattern has to do.  After that we just match the
> > +     head node and we're done.:
> > +
> > +     * FMS: - +.  */
> > +  slp_tree child = NULL;
> > +
> > +  /* We need to ignore the two_operands nodes that may also match,
> > +     for that we can check if they have any scalar statements and also
> > +     check that it's not a permute node as we're looking for a normal
> > +     PLUS_EXPR operation.  */
> > +  if (op1 != PLUS_MINUS)
> > +    return false;
> > +
> > +  child = SLP_TREE_CHILDREN (args0[1])[1];
> > +  if (vect_detect_pair_op (child) != MINUS_PLUS)
> > +    return false;
> > +
> > +  /* First two nodes must be a multiply.  */
> > +  auto_vec<slp_tree> muls;
> > +  if (vect_match_call_complex_mla (child, 0) != MULT_MULT
> > +      || vect_match_call_complex_mla (child, 1, &muls) != MULT_MULT)
> > +    return false;
> > +
> > +  /* Now operand2+4 may lead to another expression.  */
> > +  auto_vec<slp_tree> left_op, right_op;
> > +  left_op.safe_splice (SLP_TREE_CHILDREN (muls[1]));
> > +  right_op.safe_splice (SLP_TREE_CHILDREN (muls[0]));
> > +
> > +  bool is_neg = vect_normalize_conj_loc (right_op);
> > +
> > +  this->m_ops.create (6);
> > +  child = SLP_TREE_CHILDREN (args0[0])[0];
> > +  this->m_ops.quick_push (child);
> > +  if (!is_neg)
> > +    {
> > +      this->m_ifn = IFN_COMPLEX_FMS;
> > +      this->m_ops.quick_push (left_op[1]);
> > +      this->m_ops.quick_push (left_op[0]);
> > +    }
> > +  else if (is_neg)
> > +    {
> > +      bool conj_first_operand;
> > +      if (!vect_validate_multiplication (perm_cache, left_op, right_op, false,
> > +					 &conj_first_operand))
> > +	return false;
> > +
> > +      this->m_ifn = IFN_COMPLEX_FMS_CONJ;
> > +      /* Check if the conjucate is on the first or second parameter.  */
> > +      if (!conj_first_operand)
> > +	{
> > +	  this->m_ops.quick_push (left_op[1]);
> > +	  this->m_ops.quick_push (left_op[0]);
> > +	}
> > +      else
> > +	{
> > +	  this->m_ops.quick_push (left_op[0]);
> > +	  this->m_ops.quick_push (left_op[1]);
> > +	}
> > +    }
> > +
> > +  this->m_ops.quick_push (child);
> > +  this->m_ops.quick_push (right_op[1]);
> > +  this->m_ops.quick_push (right_op[0]);
> > +
> > +  vect_build_perm_groups (perm_cache, &this->m_blocks[0], this-
> >m_ops);
> > +  this->m_nodes = &SLP_TREE_CHILDREN (*this->m_node);
> > +  return true;
> > +}
> > +
> > +bool
> > +complex_fms_pattern::matches (slp_tree_to_load_perm_map_t
> *perm_cache)
> > +{
> > +  auto_vec<slp_tree> args0;
> > +  complex_operation_t op
> > +    = vect_detect_pair_op (*this->m_node, true, &args0);
> > +  return matches (op, perm_cache, args0);
> > +}
> > +
> > +
> >
> +/*********************************************************
> **********************
> > + * complex_operations_pattern class
> > +
> **********************************************************
> ********************/
> > +
> > +/* This function combines all the existing pattern matchers above into one
> class
> > +   that shares the functionality between them.  The initial match is shared
> > +   between all complex operations.  */
> > +
> > +class complex_operations_pattern : public complex_pattern
> > +{
> > +  protected:
> > +    complex_operations_pattern (slp_tree *node)
> > +      : complex_pattern (node)
> > +    { }
> > +
> > +  gcall *build ();
> > +
> > +  /* The current "active" pattern.  */
> > +  vect_pattern *m_patt = NULL;
> > +
> > +  public:
> > +    using complex_pattern::matches;
> > +
> > +    bool matches (slp_tree_to_load_perm_map_t *perm_cache);
> > +    const char* get_name ();
> > +    bool validate_p (slp_tree_to_load_perm_map_t *perm_cache);
> > +    bool is_optab_supported_p (tree, optimization_type);
> > +    internal_fn get_ifn ();
> > +    gcall *build (vec_info *);
> > +    void reset (slp_tree *);
> > +    ~complex_operations_pattern ();
> > +
> > +    static vect_pattern* create (slp_tree *node)
> > +    {
> > +       return new complex_operations_pattern (node);
> > +    }
> > +};
> > +
> > +
> > +/* Clean up the pattern if one existed.  */
> > +
> > +complex_operations_pattern::~complex_operations_pattern ()
> > +{
> > +  if (this->m_patt)
> > +    delete this->m_patt;
> > +  this->m_patt = NULL;
> > +}
> > +
> > +/* Match but do not perform any additional operations on the SLP tree.  If
> the
> > +   match is successful then that pattern is made "active" in the class.  */
> > +
> > +bool
> > +complex_operations_pattern::matches (slp_tree_to_load_perm_map_t
> *perm_cache)
> > +{
> > +  complex_operation_t op
> > +    = vect_detect_pair_op (*this->m_node, true, &this->m_ops);
> > +
> > +  /* Check which pattern this may be.  Match longest pattern first.  */
> > +  this->m_patt = complex_fma_pattern::create (this->m_node);
> > +  if (this->m_patt->matches (op, perm_cache, this->m_ops))
> > +    return true;
> 
> Hmm, all of these allocations like also
> 
> +  for (unsigned x = 0; x < num__slp_patterns; x++)
> +    {
> +      vect_pattern *pattern = slp_patterns[x] (ref_node);
> +      found_p = pattern->recognize (perm_cache, vinfo);
> +      delete pattern;
> +      found_rec_p = found_p | found_rec_p;
> +    }
> 
> in the main driver are really due to the C++-ish layout of all this :/
> I don't like that very much.  It also highlights when reading
> the patch that it's hard to get the difference betwee
> recognize (), match () and validate () where all but the first is
> kind-of internal implementation details of the complex matchers.
> 
> So we could eventually get by with the main driver doing
> 
> +  for (unsigned x = 0; x < num__slp_patterns; x++)
> +    {
> +      vect_pattern *pattern = slp_patterns[x] (ref_node);
>        if (pattern)
>          found_rec_p = true;
>      }
> 
> thus merge ->recognize with the static allocator and not do
> any heap allocation when nothing is matched?  Then ...
> 
> > +  if (op == CMPLX_NONE)
> > +    return false;
> > +
> > +  delete this->m_patt;
> > +
> > +  this->m_patt = complex_fms_pattern::create (this->m_node);
> 
> ... this could use static typing and an automatic instance?
> 
> The myth that allocation & deallocation is cheap is really that,
> a myth.  We're doing 6 allocations & deallocations at this
> very toplevel for each SLP node (which means eventually for
> each statement in a function).
> 
> 
> So why for example would one need to override recognize ()?
> I'd argue for nailing it in place and to remove validate_p ()
> (should be done by matches?).  At least validate_p for
> the complex patterns seem to build SLP nodes which mean
> they actually build()?
> 
> 
> I think overall the patch looks sensible but it's still quite
> hard to follow the flow of an actual pattern match, possibly
> because of the strange factoring.
> 
> At this point to speed up things can you split the patch
> so there's only the very simplest (complex_add for example)
> matcher plus requirements?  Then we can push that which
> would allow me to actually play with it more easily and
> even try out changes?
> 
> I'd say split out the optimize_load_redistribution part of
> the first patch and merge the complex_add support into
> its remains so we have a first patch that does something?
> 
> Thanks and sorry for being dense here ... :/  There's pieces
> I think we should do better but I still can't lay out a
> solution and I guess pointing fingers at the individual pieces
> like above will lead to very slow progress only.
> 
> Richard.
> 
> > +  if (this->m_patt->matches (op, perm_cache, this->m_ops))
> > +    return true;
> > +
> > +  delete this->m_patt;
> > +
> > +  this->m_patt = complex_mul_pattern::create (this->m_node);
> > +  if (this->m_patt->matches (op, perm_cache, this->m_ops))
> > +    return true;
> > +
> > +  delete this->m_patt;
> > +
> > +  this->m_patt = complex_add_pattern::create (this->m_node);
> > +  if (this->m_patt->matches (op, perm_cache, this->m_ops))
> > +    return true;
> > +
> > +  delete this->m_patt;
> > +
> > +  this->m_patt = NULL;
> > +  return false;
> > +}
> > +
> > +/* Friendly name of the operation the pattern matches.  */
> > +
> > +const char*
> > +complex_operations_pattern::get_name ()
> > +{
> > +  if (!this->m_patt)
> > +    return "Complex Operations";
> > +
> > +  return this->m_patt->get_name ();
> > +}
> > +
> > +/* Only perform the pattern creation part of the matcher.  This creates
> and
> > +   returns the new pattern statement and encapsulates the currently
> active
> > +   pattern.  */
> > +
> > +gcall *
> > +complex_operations_pattern::build (vec_info *vinfo)
> > +{
> > +  if (!this->m_patt)
> > +    return NULL;
> > +
> > +  return this->m_patt->build (vinfo);
> > +}
> > +
> > +/* Validate the operation the currently active pattern detected.  */
> > +
> > +bool
> > +complex_operations_pattern::validate_p
> (slp_tree_to_load_perm_map_t *perm_cache)
> > +{
> > +  if (!this->m_patt)
> > +    return false;
> > +
> > +  return this->m_patt->validate_p (perm_cache);
> > +}
> > +
> > +/* Check if the obtab of the currently active matches is supported.  */
> > +
> > +bool
> > +complex_operations_pattern::is_optab_supported_p (tree vectype,
> > +						  optimization_type opt_type)
> > +{
> > +  if (!vectype || !this->m_patt)
> > +    return false;
> > +
> > +  return this->m_patt->is_optab_supported_p (vectype, opt_type);
> > +}
> > +
> > +/* Return the matched IFN from the current active matcher if any.  */
> > +
> > +internal_fn
> > +complex_operations_pattern::get_ifn ()
> > +{
> > +  if (!this->m_patt)
> > +    return IFN_LAST;
> > +
> > +  return this->m_patt->get_ifn ();
> > +}
> > +
> >
> +/*********************************************************
> **********************
> > + * Pattern matching definitions
> > +
> **********************************************************
> ********************/
> > +
> > +#define SLP_PATTERN(x) &x::create
> > +vect_pattern_decl_t slp_patterns[]
> > +{
> > +  /* For least amount of back-tracking and more efficient matching
> > +     order patterns from the largest to the smallest.  Especially if they
> > +     overlap in what they can detect.  */
> > +
> > +  /* FMA overlaps with MUL but is the longer sequence.  Because we're in
> post
> > +     order traversal we can't match FMA if included in
> > +     complex_operations_pattern so must be checked on it's own.  */
> > +  SLP_PATTERN (complex_operations_pattern),
> > +};
> > +#undef SLP_PATTERN
> > +
> > +/* Set the number of SLP pattern matchers available.  */
> > +size_t num__slp_patterns =
> sizeof(slp_patterns)/sizeof(vect_pattern_decl_t);
> >
> >
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> Nuernberg,
> Germany; GF: Felix Imend

  reply	other threads:[~2020-11-16 15:55 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-14 14:37 [PATCH 1/2] middle-end : Initial scaffolding and definitions for SLP patttern matches Tamar Christina
2020-11-14 14:39 ` [PATCH 2/2] middle-end : Add support for complex pattern detection for Add, Mul, FMA and FMS Tamar Christina
2020-11-16 15:11   ` Richard Biener
2020-11-16 15:54     ` Tamar Christina [this message]
2020-11-17  7:50       ` Richard Biener
2020-11-16 13:16 ` [PATCH 1/2] middle-end : Initial scaffolding and definitions for SLP patttern matches Richard Biener
2020-11-16 13:33   ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=VI1PR08MB532584C37C8F6A2F9FFBD681FFE30@VI1PR08MB5325.eurprd08.prod.outlook.com \
    --to=tamar.christina@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=hongtao.liu@intel.com \
    --cc=nd@arm.com \
    --cc=ook@ucw.cz \
    --cc=rguenther@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).