Hi All, This patch series adds support for SLP vectorization of complex instructions [1]. These instructions exist only in their vector forms and require you to recognize two statements in parallel. Complex operations usually require a permute due to the fact that the real and imaginary numbers are stored intermixed but these vector instructions expect this and no longer need the compiler to generate a permute. The pass is able to insert additional permutes inside the SLP tree in order to change the data flow such that the new instructions can be used. These permutes are then optimized out later during optimize_slp. These instructions also support rotations along the Argand plane, as such the operands have to be re-ordered to coincide with their load group. For now, this patch only adds support for: * Complex Addition with rotation of 90 and 270. Addition with rotation of the second argument around the Argand plane. Supported rotations are 90 and 180. c = a + (b * I) and c = a + (b * I * I * I) * Complex Multiplication and Multiplication where one operand is conjucated. Complex multiplication and Conjucate Complex multiplication of the second parameter. c = a * b and c = a * conj (b) * Complex FMA and FMA where one operand is conjucated. * Complex FMS and FMS where one operand is conjucated. Complex FMLA, Conjucate FMLA of the second parameter and FMLS. c += a * b, c += a * conj (b), c -= a * b and c -= a * conj (b) For the conjucate cases it supports under fast-math that the operands that is being conjucated be flipped by flipping the arguments to the optab. This allows it to support c = conj (a) * b and c += conj (a) * b. where a, b and c are complex numbers. Complex dot-product is not currently supported in this patch set as build_slp fails for it. This will be provided as a future patch. These are supported for both integer and floating point and as such these don't look for real or imaginary pairs but instead rely on the early lowering of complex numbers by GCC and canonization of the operations such that it just recognizes any instruction sequence matching the operations requested. To be safe when the it is not sure it can support the operation or if it finds something it does not understand it backs off. This is done by looking at the order of the loads below the nodes. Anything that changes the order like a VEC_PERM is taken into account. The order of the loads define which operation is being done. For instance a complex add with conjucate always has the second loads in non-contiguous order. Because this information is queried often it is heavily cached and because we never modify nodes in place we can cache the operation through all SLP instances. When SLP failes (e.g. due to costing) we dissolve the changes and restore the previous relevancies of the stmts that were in a pattern before. [1] https://developer.arm.com/documentation/ddi0487/fc/ Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? (working on moving tests to middend IFN tests instead). Thanks, Tamar gcc/ChangeLog: * tree-vect-slp-patterns.c: New file. * Makefile.in: Use it. * doc/passes.texi: Document it. * internal-fn.def (COMPLEX_ADD_ROT90, COMPLEX_ADD_ROT270, COMPLEX_MUL, COMPLEX_MUL_CONJ, COMPLEX_FMA, COMPLEX_FMA_CONJ): COMPLEX_FMS, COMPLEX_FMS_CONJ): New. * optabs.def (cadd90_optab, cadd270_optab, cmul_optab, cmul_conj_optab, cmla_optab, cmla_conj_optab, cmls_optab, cmls_conj_optab): New. * doc/md.texi: Document them -- inline copy of patch -- diff --git a/gcc/Makefile.in b/gcc/Makefile.in index 273654cfa2525099ed0d9adc2f9085c8408b096f..4b556bdbbc2f989d894ef2dbe894318858eaa7aa 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -1646,6 +1646,7 @@ OBJS = \ tree-vect-loop.o \ tree-vect-loop-manip.o \ tree-vect-slp.o \ + tree-vect-slp-patterns.o \ tree-vectorizer.o \ tree-vector-builder.o \ tree-vrp.o \ diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 813875b973bc73920f1195c4897ce3491ca713c2..68792e4c3eeca2bce0112fc28eba38700e66dc00 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -1493,7 +1493,7 @@ can be described by a series of letters for each operand. The overall constraint for an operand is made from the letters for this operand from the first alternative, a comma, the letters for this operand from the second alternative, a comma, and so on until the last alternative. -All operands for a single instruction must have the same number of +All operands for a single instruction must have the same number of alternatives. @ifset INTERNALS Here is how it is done for fullword logical-or on the 68000: @@ -1558,17 +1558,17 @@ the first, 1 for the second alternative, etc.). @xref{Output Statement}. @end ifset @ifclear INTERNALS -So the first alternative for the 68000's logical-or could be written as -@code{"+m" (output) : "ir" (input)}. The second could be @code{"+r" -(output): "irm" (input)}. However, the fact that two memory locations -cannot be used in a single instruction prevents simply using @code{"+rm" -(output) : "irm" (input)}. Using multi-alternatives, this might be +So the first alternative for the 68000's logical-or could be written as +@code{"+m" (output) : "ir" (input)}. The second could be @code{"+r" +(output): "irm" (input)}. However, the fact that two memory locations +cannot be used in a single instruction prevents simply using @code{"+rm" +(output) : "irm" (input)}. Using multi-alternatives, this might be written as @code{"+m,r" (output) : "ir,irm" (input)}. This describes -all the available alternatives to the compiler, allowing it to choose +all the available alternatives to the compiler, allowing it to choose the most efficient one for the current conditions. -There is no way within the template to determine which alternative was -chosen. However you may be able to wrap your @code{asm} statements with +There is no way within the template to determine which alternative was +chosen. However you may be able to wrap your @code{asm} statements with builtins such as @code{__builtin_constant_p} to achieve the desired results. @end ifclear @@ -3062,7 +3062,7 @@ instruction taking only the upper 16-bits of a 16-bits being 0. @item L -Integer that is valid as an immediate operand for a +Integer that is valid as an immediate operand for a shift instruction. Range 0 to 31. @item M @@ -3076,7 +3076,7 @@ Integer that is valid as an immediate operand for a custom instruction opcode. Range 0 to 255. @item P -An immediate operand for R2 andchi/andci instructions. +An immediate operand for R2 andchi/andci instructions. @item S Matches immediates which are addresses in the small @@ -6132,6 +6132,97 @@ floating-point mode. This pattern is not allowed to @code{FAIL}. +@cindex @code{cadd@var{m}@var{n}3} instruction pattern +@item @samp{cadd@var{m}@var{n}3} +Perform a vector addition of complex numbers in operand 1 with operand 2 +rotated by @var{m} degrees around the argand plane and storing the result in +operand 0. The instruction must perform the operation on data that in GCC lane +order where the 0th lane holds the real part and the the imaginary part in lane +1. + +The operation is only supported for vector modes @var{n} and with +rotations @var{m} of 90 or 270. + +This pattern is not allowed to @code{FAIL}. + +@cindex @code{cmla@var{m}4} instruction pattern +@item @samp{cmla@var{m}4} +Perform a vector floating point multiply and accumulate of complex numbers +in operand 0, operand 1 and operand 2. + +The instruction must perform the operation on data that in GCC lane +order where the 0th lane holds the real part and the the imaginary part in lane +1. + +The operation is only supported for vector modes @var{m}. + +This pattern is not allowed to @code{FAIL}. + +@cindex @code{cmla_conj@var{m}4} instruction pattern +@item @samp{cmla_conj@var{m}4} +Perform a vector floating point multiply and accumulate of complex numbers +in operand 0, operand 1 and the conjucate of operand 2. + +The instruction must perform the operation on data that in GCC lane +order where the 0th lane holds the real part and the the imaginary part in lane +1. + +The operation is only supported for vector modes @var{m}. + +This pattern is not allowed to @code{FAIL}. + +@cindex @code{cmls@var{m}4} instruction pattern +@item @samp{cmls@var{m}4} +Perform a vector floating point multiply and subtract of complex numbers +in operand 0, operand 1 and operand 2. + +The instruction must perform the operation on data that in GCC lane +order where the 0th lane holds the real part and the the imaginary part in lane +1. + +The operation is only supported for vector modes @var{m}. + +This pattern is not allowed to @code{FAIL}. + +@cindex @code{cmls_conj@var{m}4} instruction pattern +@item @samp{cmls_conj@var{m}4} +Perform a vector floating point multiply and subtract of complex numbers +in operand 0, operand 1 and the conjucate of operand 2. + +The instruction must perform the operation on data that in GCC lane +order where the 0th lane holds the real part and the the imaginary part in lane +1. + +The operation is only supported for vector modes @var{m}. + +This pattern is not allowed to @code{FAIL}. + +@cindex @code{cmul@var{m}4} instruction pattern +@item @samp{cmul@var{m}4} +Perform a vector floating point multiplication of complex numbers in operand 0 +and operand 1. + +The instruction must perform the operation on data that in GCC lane +order where the 0th lane holds the real part and the the imaginary part in lane +1. + +The operation is only supported for vector modes @var{m}. + +This pattern is not allowed to @code{FAIL}. + +@cindex @code{cmul_conj@var{m}4} instruction pattern +@item @samp{cmul_conj@var{m}4} +Perform a vector floating point multiplication of complex numbers in operand 0 +and the conjucate of operand 1. + +The instruction must perform the operation on data that in GCC lane +order where the 0th lane holds the real part and the the imaginary part in lane +1. + +The operation is only supported for vector modes @var{m}. + +This pattern is not allowed to @code{FAIL}. + @cindex @code{ffs@var{m}2} instruction pattern @item @samp{ffs@var{m}2} Store into operand 0 one plus the index of the least significant 1-bit @@ -7409,7 +7500,7 @@ If this pattern is not defined, then a @code{memory_barrier} pattern will be emitted, followed by a store of the value to the memory operand. @cindex @code{atomic_compare_and_swap@var{mode}} instruction pattern -@item @samp{atomic_compare_and_swap@var{mode}} +@item @samp{atomic_compare_and_swap@var{mode}} This pattern, if defined, emits code for an atomic compare-and-swap operation with memory model semantics. Operand 2 is the memory on which the atomic operation is performed. Operand 0 is an output operand which @@ -7424,7 +7515,7 @@ if the operation fails. If memory referred to in operand 2 contains the value in operand 3, then operand 4 is stored in memory pointed to by operand 2 and fencing based on -the memory model in operand 6 is issued. +the memory model in operand 6 is issued. If memory referred to in operand 2 does not contain the value in operand 3, then fencing based on the memory model in operand 7 is issued. @@ -7500,9 +7591,9 @@ none of these are available a compare-and-swap loop will be used. @itemx @samp{atomic_fetch_or@var{mode}}, @samp{atomic_fetch_and@var{mode}} @itemx @samp{atomic_fetch_xor@var{mode}}, @samp{atomic_fetch_nand@var{mode}} These patterns emit code for an atomic operation on memory with memory -model semantics, and return the original value. Operand 0 is an output -operand which contains the value of the memory location before the -operation was performed. Operand 1 is the memory on which the atomic +model semantics, and return the original value. Operand 0 is an output +operand which contains the value of the memory location before the +operation was performed. Operand 1 is the memory on which the atomic operation is performed. Operand 2 is the second operand to the binary operator. Operand 3 is the memory model to be used by the operation. @@ -7859,7 +7950,7 @@ simplified) from the PDP-11 target: (plus:HI (match_dup 0) (const_int -1)))] "" - + @{ if (which_alternative == 0) return "sob %0,%l1"; diff --git a/gcc/doc/passes.texi b/gcc/doc/passes.texi index a5ae4143a8c1293e674b499120372ee5fe5c412b..c86df5cd843084a5b7933ef99a23386891a7b0c1 100644 --- a/gcc/doc/passes.texi +++ b/gcc/doc/passes.texi @@ -709,7 +709,8 @@ loop. The pass is implemented in @file{tree-vectorizer.c} (the main driver), @file{tree-vect-loop.c} and @file{tree-vect-loop-manip.c} (loop specific parts and general loop utilities), @file{tree-vect-slp} (loop-aware SLP -functionality), @file{tree-vect-stmts.c} and @file{tree-vect-data-refs.c}. +functionality), @file{tree-vect-stmts.c}, @file{tree-vect-data-refs.c} and +@file{tree-vect-slp-patterns.c} containing the SLP pattern matcher. Analysis of data references is in @file{tree-data-ref.c}. SLP Vectorization. This pass performs vectorization of straight-line code. The diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index 310d37aa53819791b5df1683afca831f08e5892a..acb7d9f3bdc757437d5492a652144ba31c2ef702 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -277,12 +277,21 @@ DEF_INTERNAL_FLT_FN (SCALB, ECF_CONST, scalb, binary) DEF_INTERNAL_FLT_FLOATN_FN (FMIN, ECF_CONST, fmin, binary) DEF_INTERNAL_FLT_FLOATN_FN (FMAX, ECF_CONST, fmax, binary) DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONST, xorsign, binary) +DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT90, ECF_CONST, cadd90, binary) +DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary) +DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL, ECF_CONST, cmul, binary) +DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL_CONJ, ECF_CONST, cmul_conj, binary) + /* FP scales. */ DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary) /* Ternary math functions. */ DEF_INTERNAL_FLT_FLOATN_FN (FMA, ECF_CONST, fma, ternary) +DEF_INTERNAL_OPTAB_FN (COMPLEX_FMA, ECF_CONST, cmla, ternary) +DEF_INTERNAL_OPTAB_FN (COMPLEX_FMA_CONJ, ECF_CONST, cmla_conj, ternary) +DEF_INTERNAL_OPTAB_FN (COMPLEX_FMS, ECF_CONST, cmls, ternary) +DEF_INTERNAL_OPTAB_FN (COMPLEX_FMS_CONJ, ECF_CONST, cmls_conj, ternary) /* Unary integer ops. */ DEF_INTERNAL_INT_FN (CLRSB, ECF_CONST | ECF_NOTHROW, clrsb, unary) diff --git a/gcc/optabs.def b/gcc/optabs.def index 78409aa14537d259bf90277751aac00d452a0d3f..19db9c00896cd08adfd20a01669990bbbebd79f1 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -290,6 +290,14 @@ OPTAB_D (atan_optab, "atan$a2") OPTAB_D (atanh_optab, "atanh$a2") OPTAB_D (copysign_optab, "copysign$F$a3") OPTAB_D (xorsign_optab, "xorsign$F$a3") +OPTAB_D (cadd90_optab, "cadd90$a3") +OPTAB_D (cadd270_optab, "cadd270$a3") +OPTAB_D (cmul_optab, "cmul$a3") +OPTAB_D (cmul_conj_optab, "cmul_conj$a3") +OPTAB_D (cmla_optab, "cmla$a4") +OPTAB_D (cmla_conj_optab, "cmla_conj$a4") +OPTAB_D (cmls_optab, "cmls$a4") +OPTAB_D (cmls_conj_optab, "cmls_conj$a4") OPTAB_D (cos_optab, "cos$a2") OPTAB_D (cosh_optab, "cosh$a2") OPTAB_D (exp10_optab, "exp10$a2") diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c new file mode 100644 index 0000000000000000000000000000000000000000..7a2c64aeb9f212cd3ee18adf110e6ac6696249fd --- /dev/null +++ b/gcc/tree-vect-slp-patterns.c @@ -0,0 +1,1844 @@ +/* SLP - Pattern matcher on SLP trees + Copyright (C) 2020 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +. */ + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "backend.h" +#include "target.h" +#include "rtl.h" +#include "tree.h" +#include "gimple.h" +#include "tree-pass.h" +#include "ssa.h" +#include "optabs-tree.h" +#include "insn-config.h" +#include "recog.h" /* FIXME: for insn_data */ +#include "fold-const.h" +#include "stor-layout.h" +#include "gimple-iterator.h" +#include "cfgloop.h" +#include "tree-vectorizer.h" +#include "langhooks.h" +#include "gimple-walk.h" +#include "dbgcnt.h" +#include "tree-vector-builder.h" +#include "vec-perm-indices.h" +#include "gimple-fold.h" +#include "internal-fn.h" + +/* SLP Pattern matching mechanism. + + This extension to the SLP vectorizer allows one to transform the generated SLP + tree based on any pattern. The difference between this and the normal vect + pattern matcher is that unlike the former, this matcher allows you to match + with instructions that do not belong to the same SSA dominator graph. + + The only requirement that this pattern matcher has is that you are only + only allowed to either match an entire group or none. + + The pattern matcher currently only allows you to perform replacements to + internal functions. + + Once the patterns are matched it is one way, these cannot be undone. It is + currently not supported to match patterns recursively. + + To add a new pattern, implement the vect_pattern class and add the type to + slp_patterns. + +*/ + +/******************************************************************************* + * vect_pattern class + ******************************************************************************/ + +/* Default implementation of recognize that peforms matching, validation and + replacement of nodes but that can be overriden if required. */ + +bool +vect_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache, + vec_info *vinfo) +{ + if (this->matches (perm_cache)) + { + tree vectype = SLP_TREE_VECTYPE (*this->m_node); + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "Found %s pattern in SLP tree\n", + this->get_name ()); + + if (this->is_optab_supported_p (vectype, OPTIMIZE_FOR_SPEED)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "Target supports %s vectorization with mode %T\n", + internal_fn_name (this->get_ifn ()), vectype); + } + else + { + if (dump_enabled_p ()) + { + if (!vectype) + dump_printf_loc (MSG_NOTE, vect_location, + "Target does not support vector type for %T\n", + SLP_TREE_DEF_TYPE (*this->m_node)); + else + dump_printf_loc (MSG_NOTE, vect_location, + "Target does not support %s for vector type " + "%T\n", internal_fn_name (this->get_ifn ()), + vectype); + } + return false; + } + } + else + return false; + + if (!this->validate_p (perm_cache)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "transformation for %s not valid due to post " + "condition\n", + internal_fn_name (this->get_ifn ())); + return false; + } + + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, "Creating vec patterns\n"); + + gcall* call_stmt = this->build (vinfo); + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, "\t %p stmt: %G", + *this->m_node, call_stmt); + + return true; +} + +/******************************************************************************* + * General helper types + ******************************************************************************/ + +/* The COMPLEX_OPERATION enum denotes the possible pair of operations that can + be matched when looking for expressions that we are interested matching for + complex numbers addition and mla. */ + +typedef enum _complex_operation : unsigned { + PLUS_PLUS, + MINUS_PLUS, + PLUS_MINUS, + MULT_MULT, + CMPLX_NONE +} complex_operation_t; + +/* A simple pair type used for ordering of combine operations. */ + +typedef struct map_ { int a; int b; } map_t; + +/******************************************************************************* + * General helper functions + ******************************************************************************/ + +/* Helper function of linear_loads_p that checks to see if the load permutation + is sequential and in monotonically increasing order of loads with no gaps. +*/ + +static inline bool +is_linear_load_p (load_permutation_t loads) +{ + if (loads.length() == 0) + return false; + + unsigned leader = loads[0]; + unsigned load, i; + FOR_EACH_VEC_ELT_FROM (loads, i, load, 1) + if (load != ++leader) + return false; + return true; +} + + +/* Check to see if all loads rooted in ROOT are linear. Linearity is + defined as having no gaps between values loaded. */ + +static load_permutation_t +linear_loads_p (slp_tree_to_load_perm_map_t *perm_cache, slp_tree root, + bool *linear) +{ + *linear = false; + if (!root) + return vNULL; + + unsigned i; + load_permutation_t loads = vNULL; + load_permutation_t *tmp; + + if ((tmp = perm_cache->get (root)) != NULL) + { + *linear = is_linear_load_p (*tmp); + return *tmp; + } + + perm_cache->put (root, vNULL); + + /* If it's a load node, then just read the load permute. */ + if (SLP_TREE_LOAD_PERMUTATION (root).exists ()) + { + loads = SLP_TREE_LOAD_PERMUTATION (root); + perm_cache->put (root, loads); + if (!is_linear_load_p (loads)) + return loads; + } + else if (SLP_TREE_DEF_TYPE (root) == vect_external_def) + { + loads.create (SLP_TREE_LANES (root)); + tree op; + FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_OPS (root), i, op) + { + gimple *defstmt = SSA_NAME_DEF_STMT (op); + if (!is_gimple_assign (defstmt)) + return vNULL; + + switch (gimple_assign_rhs_code (defstmt)) + { + case IMAGPART_EXPR: + loads.safe_push (1); + break; + case REALPART_EXPR: + loads.safe_push (0); + break; + default: + { + loads.release (); + return vNULL; + } + } + } + + perm_cache->put (root, loads); + if (!is_linear_load_p (loads)) + return loads; + } + else if (SLP_TREE_DEF_TYPE (root) != vect_internal_def) + return vNULL; + + + + auto_vec all_loads; + bool is_perm = SLP_TREE_LANE_PERMUTATION (root).exists (); + + slp_tree child; + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (root), i, child) + { + /* Seed it with a value to prevent infinite recursions. */ + loads = linear_loads_p (perm_cache, child, linear); + if ((!*linear && !is_perm) || !loads.exists ()) + return loads; + + all_loads.safe_push (loads); + } + + if (is_perm) + { + lane_permutation_t perm = SLP_TREE_LANE_PERMUTATION (root); + load_permutation_t nloads; + nloads.create (SLP_TREE_LANES (root)); + nloads.quick_grow (SLP_TREE_LANES (root)); + for (i = 0; i < SLP_TREE_LANES (root); i++) + nloads[i] = all_loads[perm[i].first][perm[i].second]; + + perm_cache->put (root, nloads); + if (!is_linear_load_p (nloads)) + return nloads; + loads = nloads; + } + + perm_cache->put (root, loads); + *linear = true; + return loads; +} + +/* Builds a permutation group from the operands in OPS and stores it in BLOCKS. + The group describes how to combine the operators to get a valid linear node. + + This is used when combining multiple children from a two_operators node into + one using a lane permute to select the appropriate lane. As an example the + permute { [0 0] [1 4] [2 2] [3 3] [1 4] [5 5] } says the nodes which occur + twice in a group, e.g [0 0] only needs itself to possibly be made linear + whereas [1 4] means to combine the nodes 1 and 4. */ + +static void +vect_build_perm_groups (slp_tree_to_load_perm_map_t *perm_cache, + map_t *blocks, vec ops) +{ + slp_tree op; + unsigned i; + bool is_linear = false; + unsigned min_eq = -1, max_eq = 0; + unsigned min_idx = 0, max_idx = 0; + FOR_EACH_VEC_ELT (ops, i, op) + { + load_permutation_t perms = linear_loads_p (perm_cache, op, &is_linear); + unsigned x, imin = -1, imax = 0; + for (x = 0; x < perms.length () && !is_linear; x++) + { + imin = MIN (imin, perms[x]); + imax = MAX (imax, perms[x]); + } + + if (imin != imax || perms.length () == 0 || is_linear) + blocks[i] = {i, i}; + else + { + if (imin <= min_eq) + { + min_eq = imin; + min_idx = i; + } + + if (imin >= max_eq) + { + max_eq = imin; + max_idx = i; + } + } + } + + /* Now fill in the gap. */ + blocks[min_idx] = {min_idx, max_idx}; + blocks[max_idx] = {min_idx, max_idx}; + + if (dump_enabled_p ()) + { + dump_printf_loc (MSG_NOTE, vect_location, "pattern group: { "); + for (i = 0; i < ops.length (); i++) + dump_printf (MSG_NOTE,"[%d %d] ", blocks[i].a, blocks[i].b); + dump_printf (MSG_NOTE,"}\n"); + } + +} + +/* This function attempts to make a node rooted in NODE with parent PARENT + linear. If the node if already linear than the node itself is returned + in RESULT. + + If the node is not linear then a new VEC_PERM_EXPR node is created with a + lane permute that when applied will make the node linear. If such a + permute cannot be created then FALSE is returned from the function. + + Here linearity is defined as having a sequential, monotically increasing + load position inside the load permute generated by the loads reachable from + NODE. */ + +static bool +vect_slp_make_linear (slp_tree_to_load_perm_map_t *perm_cache, + slp_tree parent, slp_tree node, slp_tree *result) +{ + bool is_linear = false; + unsigned x, val; + load_permutation_t load_perm = linear_loads_p (perm_cache, node, &is_linear); + if (is_linear) + { + *result = node; + return true; + } + + /* Attempt to linearise the permute. */ + vec > zipped; + zipped.create (load_perm.length ()); + FOR_EACH_VEC_ELT (load_perm, x, val) + zipped.quick_push (std::make_pair (val, x)); + + typedef const std::pair* cmp_t; + zipped.qsort ([](const void *a, const void *b) -> int + { return (int)((cmp_t)a)->first - (int)((cmp_t)b)->first; }); + + /* Verify if we have a linear permute sequence. */ + if (zipped.length () > 0) + { + unsigned leader = zipped[0].first; + for (x = 1; x < zipped.length (); x++) + if(!(is_linear = (zipped[x].first == ++leader))) + break; + } + + if (!is_linear) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "Loads could not be made linear %p\n", + node); + zipped.release (); + return false; + } + + for (x = 0; x < zipped.length (); x++) + zipped[x].first = 0; + + /* Create the new permute node and store it instead. */ + slp_tree vnode = vect_create_new_slp_node (vNULL, 1); + SLP_TREE_CODE (vnode) = VEC_PERM_EXPR; + SLP_TREE_LANE_PERMUTATION (vnode) = zipped; + SLP_TREE_VECTYPE (vnode) = SLP_TREE_VECTYPE (parent); + SLP_TREE_CHILDREN (vnode).quick_push (node); + SLP_TREE_REF_COUNT (vnode) = 1; + SLP_TREE_LANES (vnode) = SLP_TREE_LANES (node); + SLP_TREE_REPRESENTATIVE (vnode) = SLP_TREE_REPRESENTATIVE (parent); + *result = vnode; + return is_linear; +} + +/* Helper utility to check to see if the permutation PERM is one that can be + used in a node combination operation. This is defined as the permute not + having all the elements being the same. e.g [0 0]. */ + +static inline bool +vect_can_combine_node_p (load_permutation_t perm, bool is_linear) +{ + if (is_linear) + return false; + + unsigned i, x; + FOR_EACH_VEC_ELT (perm, i, x) + if (perm[0] != x) + return false; + + return true; +} + +/* This function combines the nodes in MAP together to make a new node using a + lane permute. The nodes to combine are stored in ENTRIES and the resulting + node is returned in RESULT. + + If the nodes are already linear then this function fails and returns FALSE. + Otherwise it returns the new node and TRUE. */ + +static bool +vect_slp_make_combine_linear (slp_tree_to_load_perm_map_t *perm_cache, + slp_tree parent, vec entries, map_t map, + slp_tree *result) +{ + if (map.a == map.b) + return false; + + slp_tree node_a = entries[map.a]; + slp_tree node_b = entries[map.b]; + + bool is_a_linear = false; + bool is_b_linear = false; + + load_permutation_t load_perm_a + = linear_loads_p (perm_cache, node_a, &is_a_linear); + if (!vect_can_combine_node_p (load_perm_a, is_a_linear)) + return false; + + load_permutation_t load_perm_b + = linear_loads_p (perm_cache, node_b, &is_b_linear); + if (!vect_can_combine_node_p (load_perm_b, is_b_linear)) + return false; + + if (!load_perm_a.exists () || !load_perm_b.exists ()) + return false; + + /* Now we need to figure which node is first. */ + auto_vec nodes; + nodes.create (2); + vec > perm; + perm.create (2); + if (load_perm_a[0] < load_perm_b[0]) + { + perm.quick_push (std::make_pair (0, 0)); + perm.quick_push (std::make_pair (1, 0)); + } + else + { + perm.quick_push (std::make_pair (1, 0)); + perm.quick_push (std::make_pair (0, 0)); + } + + nodes.quick_push (node_a); + nodes.quick_push (node_b); + + slp_tree vnode = vect_create_new_slp_node (vNULL, 1); + SLP_TREE_CODE (vnode) = VEC_PERM_EXPR; + SLP_TREE_LANE_PERMUTATION (vnode) = perm; + SLP_TREE_VECTYPE (vnode) = SLP_TREE_VECTYPE (parent); + SLP_TREE_CHILDREN (vnode).safe_splice (nodes); + SLP_TREE_REF_COUNT (vnode) = 1; + SLP_TREE_LANES (vnode) = SLP_TREE_LANES (parent); + SLP_TREE_REPRESENTATIVE (vnode) = SLP_TREE_REPRESENTATIVE (parent); + *result = vnode; + return true; +} + +/* Checks to see of the expression represented by NODE is a gimple assign with + code CODE. */ + +static inline bool +vect_match_expression_p (slp_tree node, tree_code code) +{ + if (!node + || !SLP_TREE_REPRESENTATIVE (node)) + return false; + + gimple* expr = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE (node)); + if (!is_gimple_assign (expr) + || gimple_assign_rhs_code (expr) != code) + return false; + + return true; +} + +/* Checks to see if the expression represented by NODE is a call to the internal + function FN. */ + +static inline bool +vect_match_call_p (slp_tree node, internal_fn fn) +{ + if (!node + || !SLP_TREE_REPRESENTATIVE (node)) + return false; + + gimple* expr = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE (node)); + if (!gimple_call_internal_p (expr, fn)) + return false; + + return true; +} + +/* Checks to see if NODE is a two_operands node whom's operands are calls to the + internal function FN. */ + +static bool +vect_match_two_call_p (slp_tree node, internal_fn fn) +{ + if (!SLP_TREE_REPRESENTATIVE (node) + || SLP_TREE_CODE (node) != VEC_PERM_EXPR + || SLP_TREE_CHILDREN (node).length () != 2) + return false; + + vec children = SLP_TREE_CHILDREN (node); + + return vect_match_call_p (children[0], fn) + && vect_match_call_p (children[1], fn); +} + +/* Check if the given lane permute in PERMUTES matches an alternating sequence + of {P0 P1 P0 P1 ...}. This to account for unrolled loops. Further mode + there resulting permute must be linear. */ + +static bool +vect_check_lane_permute (lane_permutation_t &permutes, + unsigned p0, unsigned p1) +{ + if (permutes.length () == 0) + return false; + + unsigned val[2] = {p0, p1}; + unsigned seed = permutes[0].second; + for (unsigned i = 0; i < permutes.length (); i++) + if (permutes[i].first != val[i % 2] + || permutes[i].second != seed++) + return false; + + return true; +} + +/* This function will match the two gimple expressions representing NODE1 and + NODE2 in parallel and returns the pair operation that represents the two + expressions in the two statements. + + If match is successful then the corresponding complex_operation is + returned and the arguments to the two matched operations are returned in OPS. + + If TWO_OPERANDS it is expected that the LANES of the parent VEC_PERM select + from the two nodes alternatingly. + + If unsuccessful then CMPLX_NONE is returned and OPS is untouched. + + e.g. the following gimple statements + + stmt 0 _39 = _37 + _12; + stmt 1 _6 = _38 - _36; + + will return PLUS_MINUS along with OPS containing {_37, _12, _38, _36}. +*/ + +static complex_operation_t +vect_detect_pair_op (slp_tree node1, slp_tree node2, lane_permutation_t &lanes, + bool two_operands = true, vec *ops = NULL) +{ + complex_operation_t result = CMPLX_NONE; + + if (vect_match_expression_p (node1, MINUS_EXPR) + && vect_match_expression_p (node2, PLUS_EXPR) + && (!two_operands || vect_check_lane_permute (lanes, 0, 1))) + result = MINUS_PLUS; + else if (vect_match_expression_p (node1, PLUS_EXPR) + && vect_match_expression_p (node2, MINUS_EXPR) + && (!two_operands || vect_check_lane_permute (lanes, 0, 1))) + result = PLUS_MINUS; + else if (vect_match_expression_p (node1, PLUS_EXPR) + && vect_match_expression_p (node2, PLUS_EXPR)) + result = PLUS_PLUS; + else if (vect_match_expression_p (node1, MULT_EXPR) + && vect_match_expression_p (node2, MULT_EXPR)) + result = MULT_MULT; + + if (result != CMPLX_NONE && ops != NULL) + { + ops->create (2); + ops->quick_push (node1); + ops->quick_push (node2); + } + return result; +} + +/* Overload of vect_detect_pair_op that matches against the representative + statements in the children of NODE. It is expected that NODE has exactly + two children and when TWO_OPERANDS then NODE must be a VEC_PERM. */ + +static complex_operation_t +vect_detect_pair_op (slp_tree node, bool two_operands = true, + vec *ops = NULL) +{ + if (!two_operands && SLP_TREE_CODE (node) == VEC_PERM_EXPR) + return CMPLX_NONE; + + if (SLP_TREE_CHILDREN (node).length () != 2) + return CMPLX_NONE; + + vec children = SLP_TREE_CHILDREN (node); + lane_permutation_t &lanes = SLP_TREE_LANE_PERMUTATION (node); + + return vect_detect_pair_op (children[0], children[1], lanes, two_operands, + ops); +} + +/* This function marks every statement that is being replaced during the + the pattern matching as PURE. Normally when replacing a statement due + to a pattern we add the statement to the STMT_VINFO_PATTERN_DEF_SEQ of + the pattern that is replacing them. In this case however this won't + work as when doing the replacement we are changing the nodes that are + used by the statements. This means that when vectorized the SSA chain + is different than in the BB. + + Declaring the statements as part of the sequence will then cause SSA + verification to fail as we may refer to statements that were not in the + original USE-DEF chain of the statement we are replacing. + + The relevancy of the statements are saved and are later restored if SLP is to + be dissolved. */ + +static void +vect_mark_stmts_as_in_pattern (hash_set *cache, slp_tree node) +{ + if (cache->add (node)) + return; + + unsigned i; + stmt_vec_info stmt_info; + slp_tree child; + + FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt_info) + { + if (gimple_assign_load_p (STMT_VINFO_STMT (stmt_info))) + return; + + vect_save_relevancy (stmt_info); + STMT_VINFO_SLP_VECT_ONLY (stmt_info) = true; + STMT_SLP_TYPE (stmt_info) = pure_slp; + } + + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child) + vect_mark_stmts_as_in_pattern (cache, child); +} + + +/******************************************************************************* + * complex_pattern class + ******************************************************************************/ + +/* SLP Complex Numbers pattern matching. + + As an example, the following simple loop: + + double a[restrict N]; double b[restrict N]; double c[restrict N]; + + for (int i=0; i < N; i+=2) + { + c[i] = a[i] - b[i+1]; + c[i+1] = a[i+1] + b[i]; + } + + which represents a complex addition on with a rotation of 90* around the + argand plane. i.e. if `a` and `b` were complex numbers then this would be the + same as `a + (b * I)`. + + Here the expressions for `c[i]` and `c[i+1]` are independent but have to be + both recognized in order for the pattern to work. As an SLP tree this is + represented as + + +--------------------------------+ + | stmt 0 *_9 = _10; | + | stmt 1 *_15 = _16; | + +--------------------------------+ + | + | + v + +--------------------------------+ + | stmt 0 _10 = _4 - _8; | + | stmt 1 _16 = _12 + _14; | + | lane permutation { 0[0] 1[1] } | + +--------------------------------+ + | | + | | + | | + +-----+ | | +-----+ + | | | | | | + +-----| { } |<-----+ +----->| { } --------+ + | | | +------------------| | | + | +-----+ | +-----+ | + | | | | + | | | | + | +------|------------------+ | + | | | | + v v v v + +--------------------------+ +--------------------------------+ + | stmt 0 _8 = *_7; | | stmt 0 _4 = *_3; | + | stmt 1 _14 = *_13; | | stmt 1 _12 = *_11; | + | load permutation { 1 0 } | | load permutation { 0 1 } | + +--------------------------+ +--------------------------------+ + + The pattern matcher allows you to replace both statements 0 and 1 or none at + all. Because this operation is a two operands operation the actual nodes + being replaced are those in the { } nodes. The actual scalar statements + themselves are not replaced or used during the matching but instead the + SLP_TREE_REPRESENTATIVE statements are inspected. You are also allowed to + replace and match on any number of nodes. + + Because the pattern matcher matches on the representative statement for the + SLP node the case of two_operators it allows you to match the children of the + node. This is done using the method `recognize ()`. + +*/ + +/* The complex_pattern class contains common code for pattern matchers that work + on complex numbers. These provide functionality to allow de-construction and + validation of sequences depicting/transforming REAL and IMAG pairs. */ + +class complex_pattern : public vect_pattern +{ + protected: + vec *m_nodes = NULL; + complex_pattern (slp_tree *node) + : vect_pattern (node) + { } + + public: + bool validate_p (slp_tree_to_load_perm_map_t *); + gcall *build (vec_info *); +}; + +/* The post transform and validation function for the complex number + patterns. This will re-arrange the tree and re-organize the nodes such + that they can be used by the complex number instructions that are to be + created. It does this by doing the following steps: + + For each node that isn't already linear a new permute node is created which + when applied makes the node linear. If such permutation does not exist then + the function returns FALSE. For nodes that are already linear no work is + done. */ + +bool +complex_pattern::validate_p (slp_tree_to_load_perm_map_t *perm_cache) +{ + slp_tree node; + unsigned ix; + auto_vec workset; + FOR_EACH_VEC_ELT (this->m_ops, ix, node) + { + auto_vec nodes; + nodes.create (this->m_num_args); + slp_tree tmp = NULL; + auto_vec new_nodes; + + unsigned i; + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, tmp) + { + slp_tree vnode = NULL; + if (vect_slp_make_linear (perm_cache, node, tmp, &vnode)) + { + nodes.quick_push (vnode); + if (vnode != tmp) + new_nodes.safe_push (vnode); + } + else + { + FOR_EACH_VEC_ELT (new_nodes, i, tmp) + delete tmp; + + FOR_EACH_VEC_ELT (workset, i, tmp) + delete tmp; + + nodes.release(); + return false; + } + } + + slp_tree new_node + = vect_create_new_slp_node (SLP_TREE_SCALAR_STMTS (node), + SLP_TREE_CHILDREN (node).length ()); + SLP_TREE_VECTYPE (new_node) = SLP_TREE_VECTYPE (node); + SLP_TREE_LANE_PERMUTATION (new_node) = SLP_TREE_LANE_PERMUTATION (node); + SLP_TREE_CODE (new_node) = SLP_TREE_CODE (node); + SLP_TREE_REF_COUNT (new_node) = SLP_TREE_REF_COUNT (node); + SLP_TREE_REPRESENTATIVE (new_node) = SLP_TREE_REPRESENTATIVE (node); + SLP_TREE_CHILDREN (new_node).safe_splice (nodes); + SLP_TREE_LANES (new_node) = SLP_TREE_LANES (node); + + workset.safe_push (new_node); + } + + FOR_EACH_VEC_ELT (workset, ix, node) + SLP_TREE_CHILDREN (*this->m_node)[ix] = node; + + return true; +} + + +/* Create a replacement pattern statement for each node in m_node and inserts + the new statement into m_node as the new representative statement. The old + statement is marked as being in a pattern defined by the new statement. The + statement is created as call to internal function IFN with m_num_args + arguments. + + Futhermore the new pattern is also added to the vectorization information + structure VINFO and the old statement STMT_INFO is marked as unused while + the new statement is marked as used and the number of SLP uses of the new + statement is incremented. + + The newly created SLP nodes are marked as SLP only and will be dissolved + if SLP is aborted. + + The newly created gimple call is returned and the BB remains unchanged. + + This default method is designed to only match against simple operands where + all the input and output types are the same. +*/ + +gcall * +complex_pattern::build (vec_info *vinfo) +{ + stmt_vec_info stmt_info; + + auto_vec args; + args.create (this->m_num_args); + args.quick_grow_cleared (this->m_num_args); + slp_tree node; + unsigned ix; + stmt_vec_info call_stmt_info; + gcall *call_stmt = NULL; + + FOR_EACH_VEC_ELT (*this->m_nodes, ix, node) + { + /* Calculate the location of the statement in NODE to replace. */ + stmt_info = SLP_TREE_REPRESENTATIVE (node); + gimple* old_stmt = STMT_VINFO_STMT (stmt_info); + tree type = TREE_TYPE (gimple_get_lhs (old_stmt)); + + /* Create the argument set for use by gimple_build_call_internal_vec. */ + for (unsigned i = 0; i < this->m_num_args; i++) + { + tree var = make_temp_ssa_name (type, old_stmt, "_pa"); + args[i] = var; + } + + /* Create the new pattern statements. */ + call_stmt = gimple_build_call_internal_vec (this->m_ifn, args); + tree var = make_temp_ssa_name (type, call_stmt, "slp_patt"); + gimple_call_set_lhs (call_stmt, var); + gimple_set_location (call_stmt, gimple_location (old_stmt)); + gimple_call_set_nothrow (call_stmt, true); + + /* Adjust the book-keeping for the new and old statements for use during + SLP. This is required to get the right VF and statement during SLP + analysis. These changes are created after relevancy has been set for + the nodes as such we need to manually update them. Any changes will be + undone if SLP is cancelled. */ + call_stmt_info + = vinfo->add_pattern_stmt (call_stmt, stmt_info); + + /* add_pattern_stmt can't be done in vect_mark_pattern_stmts because + the non-SLP pattern matchers already have added the statement to VINFO + by the time it is called. Some of them need to modify the returned + stmt_info. vect_mark_pattern_stmts is called by recog_pattern and it + would increase the size of each pattern with boilerplate code to make + the call there. */ + vect_mark_pattern_stmts (vinfo, stmt_info, call_stmt, + SLP_TREE_VECTYPE (node)); + STMT_SLP_TYPE (call_stmt_info) = pure_slp; + STMT_VINFO_SLP_VECT_ONLY (call_stmt_info) = true; + + /* Since we are replacing all the statements in the group with the same + thing it doesn't really matter. So just set it every time a new stmt + is created. */ + SLP_TREE_REPRESENTATIVE (node) = call_stmt_info; + SLP_TREE_CODE (node) = CALL_EXPR; + } + return call_stmt; +} + +/******************************************************************************* + * complex_add_pattern class + ******************************************************************************/ + +class complex_add_pattern : public complex_pattern +{ + protected: + complex_add_pattern (slp_tree *node) + : complex_pattern (node) + { + this->m_num_args = 2; + } + + public: + static vect_pattern* create (slp_tree *node) + { + return new complex_add_pattern (node); + } + + const char* get_name () + { + return "Complex Addition"; + } + + bool matches (slp_tree_to_load_perm_map_t *); + bool matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, + vec ops); +}; + +/* Pattern matcher for trying to match complex addition pattern in SLP tree. + + If no match is found then IFN is set to IFN_LAST. + This function matches the patterns shaped as: + + c[i] = a[i] - b[i+1]; + c[i+1] = a[i+1] + b[i]; + + If a match occurred then TRUE is returned, else FALSE. The initial match is + expected to be in OP1 and the initial match operands in args0. */ + +bool +complex_add_pattern::matches (complex_operation_t op, + slp_tree_to_load_perm_map_t *perm_cache, + vec ops) +{ + this->m_ifn = IFN_LAST; + this->m_ops.safe_splice (ops); + + /* Find the two components. Rotation in the complex plane will modify + the operations: + + * Rotation 0: + + + * Rotation 90: - + + * Rotation 180: - - + * Rotation 270: + - + + Rotation 0 and 180 can be handled by normal SIMD code, so we don't need + to care about them here. */ + if (op == MINUS_PLUS) + this->m_ifn = IFN_COMPLEX_ADD_ROT90; + else if (op == PLUS_MINUS) + this->m_ifn = IFN_COMPLEX_ADD_ROT270; + + /* verify that there is a permute, otherwise this isn't a pattern we + we support. */ + bool is_linear = false; + bool all_linear = true; + unsigned x; + for (x = 0; x < this->m_ops.length () && all_linear; x++) + { + linear_loads_p (perm_cache, this->m_ops[x], &is_linear); + all_linear = all_linear && is_linear; + } + + if (this->m_ifn == IFN_LAST || all_linear) + return false; + + this->m_nodes = &SLP_TREE_CHILDREN (*this->m_node); + return true; +} + +bool +complex_add_pattern::matches (slp_tree_to_load_perm_map_t *perm_cache) +{ + complex_operation_t op + = vect_detect_pair_op (*this->m_node, true, &this->m_ops); + return matches (op, perm_cache, this->m_ops); +} + +/******************************************************************************* + * complex_mul_pattern + ******************************************************************************/ + +/* Helper function of that looks for a match in the CHILDth child of NODE. The + child used is stored in RES. + + If the match is successful then ARGS will contain the operands matched + and the complex_operation_t type is returned. If match is not successful + then CMPLX_NONE is returned and ARGS is left unmodified. */ + +static inline complex_operation_t +vect_match_call_complex_mla (slp_tree node, unsigned child, + vec *args = NULL, slp_tree *res = NULL) +{ + gcc_assert (child < SLP_TREE_CHILDREN (node).length ()); + + slp_tree data = SLP_TREE_CHILDREN (node)[child]; + + if (res) + *res = data; + + return vect_detect_pair_op (data, false, args); +} + +/* Check to see if either of the trees in ARGS are a NEGATE_EXPR. If the first + child (args[0]) is a NEGATE_EXPR then NEG_FIRST_P is set to TRUE. + + If a negate is found then the values in ARGS are reordered such that the + negate node is always the second one and the entry is replaced by the child + of the negate node. */ + +static inline bool +vect_normalize_conj_loc (vec args, bool *neg_first_p = NULL) +{ + gcc_assert (args.length () == 2); + bool neg_found = false; + + if (vect_match_expression_p (args[0], NEGATE_EXPR)) + { + std::swap (args[0], args[1]); + neg_found = true; + if (neg_first_p) + *neg_first_p = true; + } + else if (vect_match_expression_p (args[1], NEGATE_EXPR)) + { + neg_found = true; + if (neg_first_p) + *neg_first_p = false; + } + + if (neg_found) + args[1] = SLP_TREE_CHILDREN (args[1])[0]; + + return neg_found; +} + +/* Helper function that checks to see if the load permutation PERM is on that + belongs to a duplication operation. i.e. checks to see if all the values in + PERM are the same. */ + +static inline bool +vect_is_dup (load_permutation_t perm, unsigned idx) +{ + for (unsigned i = 0; i < perm.length (); i++) + if (perm[i] != idx) + return false; + + return true; +} + +/* Helper function that checks to see if LEFT_OP and RIGHT_OP are both MULT_EXPR + nodes but also that they represent an operation that is either a complex + multiplication or a complex multiplication by conjucated value. + + Of the negation is expected to be in the first half of the tree (As required + by an FMS pattern) then NEG_FIRST is true. If the operation is a conjucate + operation then CONJ_FIRST_OPERAND is set to indicate whether the first or + second operand contains the conjucate operation. */ + +static inline bool +vect_validate_multiplication (slp_tree_to_load_perm_map_t *perm_cache, + vec left_op, vec right_op, + bool neg_first, bool *conj_first_operand) +{ + /* The presence of a negation indicates that we have either a conjucate or a + rotation. We need to distinguish which one. */ + *conj_first_operand = false; + + load_permutation_t perm; + bool is_linear; + /* Complex conjucates have the negation on the imaginary part of the + number where rotations affect the real component. So check if the + negation is on a dup of lane 1. */ + perm = linear_loads_p (perm_cache, right_op[1], &is_linear); + if (is_linear || !vect_is_dup (perm, 1)) + return false; + + /* Check if the conjucate is on the second first or second operand. The + order of the node with the conjucate value determines this, and the dup + node must be one of lane 0 of the same DR as the neg node. */ + perm = linear_loads_p (perm_cache, left_op[0], &is_linear); + if (is_linear) + { + perm = linear_loads_p (perm_cache, left_op[1], &is_linear); + if (is_linear) + return false; + } + else if (!neg_first) + *conj_first_operand = true; + else + return false; + + if (!vect_is_dup (perm, 0)) + return false; + + return true; +} + +/* Helper function to help distinguish between a conjucate and a rotation in a + complex multiplication. The operations have similar shapes but the order of + the load permutes are different. This function returns TRUE when the order + is consistent with a multiplication or multiplication by conjucated + operand but returns FALSE if it's a multiplication by rotated operand. */ + +static inline bool +vect_validate_multiplication (slp_tree_to_load_perm_map_t *perm_cache, + vec op, unsigned idx) +{ + /* The left node is the more common case, test it first. */ + bool is_linear; + load_permutation_t perm = linear_loads_p (perm_cache, op[0], &is_linear); + if (is_linear || !vect_is_dup (perm, idx)) + { + perm = linear_loads_p (perm_cache, op[1], &is_linear); + if (is_linear || !vect_is_dup (perm, idx)) + return false; + } + + return true; +} + +class complex_mul_pattern : public complex_pattern +{ + protected: + /* Allocate enough space for FMA as well. */ + map_t m_blocks[6] = {}; + bool m_inplace = false; + complex_mul_pattern (slp_tree *node) + : complex_pattern (node) + { + this->m_num_args = 2; + } + + public: + static vect_pattern* create (slp_tree *node) + { + return new complex_mul_pattern (node); + } + + const char* get_name () + { + return "Complex Multiplication"; + } + + bool validate_p (slp_tree_to_load_perm_map_t *); + bool matches (slp_tree_to_load_perm_map_t *); + bool matches (complex_operation_t, slp_tree_to_load_perm_map_t *, + vec); +}; + +/* Pattern matcher for trying to match complex multiply pattern in SLP tree + If the operation matches then IFN is set to the operation it matched + and the arguments to the two replacement statements are put in m_ops. + + If no match is found then IFN is set to IFN_LAST and m_ops is unchanged. + + This function matches the patterns shaped as: + + double ax = (b[i+1] * a[i]); + double bx = (a[i+1] * b[i]); + + c[i] = c[i] - ax; + c[i+1] = c[i+1] + bx; + + If a match occurred then TRUE is returned, else FALSE. The initial match is + expected to be in OP1 and the initial match operands in args0. */ + +bool +complex_mul_pattern::matches (complex_operation_t op, + slp_tree_to_load_perm_map_t *perm_cache, + vec /* ops */) +{ + this->m_ifn = IFN_LAST; + + if (op != MINUS_PLUS) + return false; + + slp_tree root = *this->m_node; + /* First two nodes must be a multiply. */ + auto_vec muls; + if (vect_match_call_complex_mla (root, 0) != MULT_MULT + || vect_match_call_complex_mla (root, 1, &muls) != MULT_MULT) + return false; + + /* Now operand2+4 may lead to another expression. */ + auto_vec left_op, right_op; + left_op.safe_splice (SLP_TREE_CHILDREN (muls[0])); + right_op.safe_splice (SLP_TREE_CHILDREN (muls[1])); + + bool neg_first; + bool is_neg = vect_normalize_conj_loc (right_op, &neg_first); + + if (!is_neg) + { + /* A multiplication needs to multiply agains the real pair, otherwise + the pattern matches that of FMS. */ + if (!vect_validate_multiplication (perm_cache, left_op, 0)) + return false; + this->m_ifn = IFN_COMPLEX_MUL; + this->m_ops.safe_splice (left_op); + this->m_ops.safe_splice (right_op); + } + else if (is_neg) + { + bool conj_first_operand; + if (!vect_validate_multiplication (perm_cache, left_op, right_op, + neg_first, &conj_first_operand)) + return false; + + this->m_ifn = IFN_COMPLEX_MUL_CONJ; + /* Check if the conjucate is on the first or second parameter. */ + if (!conj_first_operand) + { + this->m_ops.safe_push (left_op[1]); + this->m_ops.safe_push (left_op[0]); + } + else + { + this->m_ops.safe_push (left_op[0]); + this->m_ops.safe_push (left_op[1]); + } + this->m_ops.safe_push (right_op[1]); + this->m_ops.safe_push (right_op[0]); + } + + vect_build_perm_groups (perm_cache, &this->m_blocks[0], this->m_ops); + this->m_nodes = &SLP_TREE_CHILDREN (*this->m_node); + + return true; +} + +bool +complex_mul_pattern::matches (slp_tree_to_load_perm_map_t *perm_cache) +{ + complex_operation_t op + = vect_detect_pair_op (*this->m_node); + return matches (op, perm_cache, this->m_ops); +} + + +/* Validates to see if the Complex MUL that we have matched is valid. This is + done through a combination of making nodes linear and combining nodes. */ + +bool +complex_mul_pattern::validate_p (slp_tree_to_load_perm_map_t *perm_cache) +{ + slp_tree node; + unsigned ix; + hash_set cache; + auto_vec workset; + FOR_EACH_VEC_ELT (*this->m_nodes, ix, node) + { + auto_vec nodes; + nodes.create (this->m_num_args); + slp_tree tmp = NULL; + auto_vec new_nodes; + + unsigned i; + for (i = 0; i < this->m_num_args; i++) + { + unsigned index = (ix * this->m_num_args) + i; + map_t map = this->m_blocks[index]; + slp_tree vnode = NULL; + bool needs_linear = map.a == map.b; + tmp = this->m_ops[index]; + cache.add (tmp); + if (needs_linear + && vect_slp_make_linear (perm_cache, node, tmp, &vnode)) + { + nodes.quick_push (vnode); + if (vnode != tmp) + new_nodes.safe_push (vnode); + } + else if (!needs_linear + && vect_slp_make_combine_linear (perm_cache, node, + this->m_ops, map, &vnode)) + { + nodes.quick_push (vnode); + if (vnode != tmp) + new_nodes.safe_push (vnode); + } + else + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "stmts could not be made %s %p\n", + needs_linear ? "linear" : "linear/combined", + tmp); + FOR_EACH_VEC_ELT (new_nodes, i, tmp) + delete tmp; + + if (!m_inplace) + FOR_EACH_VEC_ELT (workset, i, tmp) + delete tmp; + + nodes.release(); + return false; + } + + vect_mark_stmts_as_in_pattern (&cache, node); + } + + if (m_inplace) + { + workset.safe_splice (nodes); + } + else + { + slp_tree new_node + = vect_create_new_slp_node (SLP_TREE_SCALAR_STMTS (node), + SLP_TREE_CHILDREN (node).length ()); + SLP_TREE_VECTYPE (new_node) = SLP_TREE_VECTYPE (node); + SLP_TREE_LANE_PERMUTATION (new_node) + = SLP_TREE_LANE_PERMUTATION (node); + SLP_TREE_CODE (new_node) = SLP_TREE_CODE (node); + SLP_TREE_REF_COUNT (new_node) = SLP_TREE_REF_COUNT (node); + SLP_TREE_REPRESENTATIVE (new_node) = SLP_TREE_REPRESENTATIVE (node); + SLP_TREE_CHILDREN (new_node).safe_splice (nodes); + SLP_TREE_LANES (new_node) = SLP_TREE_LANES (node); + + workset.safe_push (new_node); + } + } + + if (m_inplace) + { + SLP_TREE_CHILDREN (*this->m_node).truncate (0); + SLP_TREE_CHILDREN (*this->m_node).safe_splice (workset); + } + else + { + FOR_EACH_VEC_ELT (workset, ix, node) + SLP_TREE_CHILDREN (*this->m_node)[ix] = node; + } + + return true; +} + +/******************************************************************************* + * complex_fma_pattern class + ******************************************************************************/ + +class complex_fma_pattern : public complex_mul_pattern +{ + protected: + /* Temporary nodes cache as scope of the class so we don't + have to worry about freeing it. */ + auto_vec m_ncache; + /* List of nodes to mark as no longer being a mul. The first + one is chosen and the last one is dangling. */ + auto_vec m_mul_nodes; + complex_fma_pattern (slp_tree *node) + : complex_mul_pattern (node) + { + this->m_num_args = 3; + } + + public: + static vect_pattern* create (slp_tree *node) + { + return new complex_fma_pattern (node); + } + + const char* get_name () + { + return "Complex FMA"; + } + + bool matches (slp_tree_to_load_perm_map_t *); + bool matches (complex_operation_t, slp_tree_to_load_perm_map_t *, + vec ops); + bool validate_p (slp_tree_to_load_perm_map_t *); +}; + +/* Helper function to "reset" a previously matched node and undo the changes + made enough so that the node is treated as an irrelevant node. */ + +static inline void +vect_slp_reset_pattern (slp_tree node) +{ + stmt_vec_info stmt_info = vect_orig_stmt (SLP_TREE_REPRESENTATIVE (node)); + STMT_VINFO_IN_PATTERN_P (stmt_info) = false; + STMT_SLP_TYPE (stmt_info) = pure_slp; + SLP_TREE_REPRESENTATIVE (node) = stmt_info; +} + +/* Pattern matcher for trying to match complex multiply and accumulate + and multiply and subtract patterns in SLP tree. + If the operation matches then IFN is set to the operation it matched and + the arguments to the two replacement statements are put in m_ops. + + If no match is found then IFN is set to IFN_LAST and m_ops is unchanged. + + This function matches the patterns shaped as: + + double ax = (b[i+1] * a[i]) + (b[i] * a[i]); + double bx = (a[i+1] * b[i]) - (a[i+1] * b[i+1]); + + c[i] = c[i] - ax; + c[i+1] = c[i+1] + bx; + + If a match occurred then TRUE is returned, else FALSE. The match is + performed after COMPLEX_MUL which would have done the majority of the work. + This function merely matches an ADD with a COMPLEX_MUL IFN. The initial + match is expected to be in OP1 and the initial match operands in args0. */ + +bool +complex_fma_pattern::matches (complex_operation_t op1, + slp_tree_to_load_perm_map_t * /* perm_cache */, + vec /* args0 */) +{ + this->m_ifn = IFN_LAST; + + /* Find the two components. We match Complex MUL first which reduces the + amount of work this pattern has to do. After that we just match the + head node and we're done.: + + * FMA: + +. + + We need to ignore the two_operands nodes that may also match. + For that we can check if they have any scalar statements and also + check that it's not a permute node as we're looking for a normal + PLUS_EXPR operation. */ + if (op1 != CMPLX_NONE) + return false; + + /* Find the two components. We match Complex MUL first which reduces the + amount of work this pattern has to do. After that we just match the + head node and we're done.: + + * FMA: + + on a non-two_operands node. */ + if (SLP_TREE_LANE_PERMUTATION (*this->m_node).exists () + || !SLP_TREE_SCALAR_STMTS (*this->m_node).exists () + || !vect_match_expression_p (*this->m_node, PLUS_EXPR)) + return false; + + slp_tree node = SLP_TREE_CHILDREN (*this->m_node)[1]; + + if (vect_match_two_call_p (node, IFN_COMPLEX_MUL)) + this->m_ifn = IFN_COMPLEX_FMA; + else if (vect_match_two_call_p (node, IFN_COMPLEX_MUL_CONJ)) + this->m_ifn = IFN_COMPLEX_FMA_CONJ; + else + return false; + + this->m_mul_nodes.safe_splice (SLP_TREE_CHILDREN (node)); + node = SLP_TREE_CHILDREN (node)[0]; + this->m_ops.create (this->m_num_args); + + if (this->m_ifn == IFN_COMPLEX_FMA) + { + this->m_ops.quick_push (SLP_TREE_CHILDREN (*this->m_node)[0]); + this->m_ops.quick_push (SLP_TREE_CHILDREN (node)[1]); + this->m_ops.quick_push (SLP_TREE_CHILDREN (node)[0]); + } + else + { + this->m_ops.quick_push (SLP_TREE_CHILDREN (*this->m_node)[0]); + this->m_ops.quick_push (SLP_TREE_CHILDREN (node)[0]); + this->m_ops.quick_push (SLP_TREE_CHILDREN (node)[1]); + } + + this->m_ncache.safe_push (*this->m_node); + this->m_nodes = &this->m_ncache; + + return true; +} + +bool +complex_fma_pattern::matches (slp_tree_to_load_perm_map_t *perm_cache) +{ + auto_vec args0; + complex_operation_t op + = vect_detect_pair_op (*this->m_node, true, &args0); + return matches (op, perm_cache, args0); +} + +bool +complex_fma_pattern::validate_p (slp_tree_to_load_perm_map_t *perm_cache) +{ + /* FMA's top level node is + + and so is not two_operator, + we only need one child. */ + slp_tree node; + auto_vec nodes; + auto_vec new_nodes; + nodes.create (this->m_num_args); + hash_set cache; + unsigned ix; + FOR_EACH_VEC_ELT (this->m_ops, ix, node) + { + slp_tree vnode = NULL; + if (vect_slp_make_linear (perm_cache, *this->m_node, node, &vnode)) + { + nodes.quick_push (vnode); + if (vnode != node) + new_nodes.safe_push (vnode); + if (SLP_TREE_CHILDREN (node).length () == 2) + { + cache.add(SLP_TREE_CHILDREN (node)[0]); + cache.add(SLP_TREE_CHILDREN (node)[1]); + } + else + cache.add (node); + } + else + { + FOR_EACH_VEC_ELT (new_nodes, ix, node) + delete node; + + nodes.release(); + return false; + } + } + + SLP_TREE_CHILDREN (*this->m_node).truncate (0); + SLP_TREE_CHILDREN (*this->m_node).safe_splice (nodes); + + /* Ignore the part of the tree that becomes irrelevant for FMA. */ + vect_slp_reset_pattern (this->m_mul_nodes[0]); + vect_slp_reset_pattern (this->m_mul_nodes[1]); + vect_mark_stmts_as_in_pattern (&cache, this->m_mul_nodes[1]); + + return true; +} + + +/******************************************************************************* + * complex_fms_pattern class + ******************************************************************************/ + +class complex_fms_pattern : public complex_mul_pattern +{ + protected: + complex_fms_pattern (slp_tree *node) + : complex_mul_pattern (node) + { + this->m_num_args = 3; + } + + public: + static vect_pattern* create (slp_tree *node) + { + return new complex_fms_pattern (node); + } + + const char* get_name () + { + return "Complex FMS"; + } + + bool matches (slp_tree_to_load_perm_map_t *); + bool matches (complex_operation_t, slp_tree_to_load_perm_map_t *, + vec); +}; + +/* Pattern matcher for trying to match complex multiply and accumulate + and multiply and subtract patterns in SLP tree. + If the operation matches then IFN is set to the operation it matched and + the arguments to the two replacement statements are put in m_ops. + + If no match is found then IFN is set to IFN_LAST and m_ops is unchanged. + + This function matches the patterns shaped as: + + double ax = (b[i+1] * a[i]) + (b[i] * a[i]); + double bx = (a[i+1] * b[i]) - (a[i+1] * b[i+1]); + + c[i] = c[i] - ax; + c[i+1] = c[i+1] + bx; + + If a match occurred then TRUE is returned, else FALSE. The initial match is + expected to be in OP1 and the initial match operands in args0. */ +bool +complex_fms_pattern::matches (complex_operation_t op1, + slp_tree_to_load_perm_map_t *perm_cache, + vec args0) +{ + this->m_ifn = IFN_LAST; + + /* Find the two components. We match Complex MUL first which reduces the + amount of work this pattern has to do. After that we just match the + head node and we're done.: + + * FMS: - +. */ + slp_tree child = NULL; + + /* We need to ignore the two_operands nodes that may also match, + for that we can check if they have any scalar statements and also + check that it's not a permute node as we're looking for a normal + PLUS_EXPR operation. */ + if (op1 != PLUS_MINUS) + return false; + + child = SLP_TREE_CHILDREN (args0[1])[1]; + if (vect_detect_pair_op (child) != MINUS_PLUS) + return false; + + /* First two nodes must be a multiply. */ + auto_vec muls; + if (vect_match_call_complex_mla (child, 0) != MULT_MULT + || vect_match_call_complex_mla (child, 1, &muls) != MULT_MULT) + return false; + + /* Now operand2+4 may lead to another expression. */ + auto_vec left_op, right_op; + left_op.safe_splice (SLP_TREE_CHILDREN (muls[1])); + right_op.safe_splice (SLP_TREE_CHILDREN (muls[0])); + + bool is_neg = vect_normalize_conj_loc (right_op); + + this->m_ops.create (6); + child = SLP_TREE_CHILDREN (args0[0])[0]; + this->m_ops.quick_push (child); + if (!is_neg) + { + this->m_ifn = IFN_COMPLEX_FMS; + this->m_ops.quick_push (left_op[1]); + this->m_ops.quick_push (left_op[0]); + } + else if (is_neg) + { + bool conj_first_operand; + if (!vect_validate_multiplication (perm_cache, left_op, right_op, false, + &conj_first_operand)) + return false; + + this->m_ifn = IFN_COMPLEX_FMS_CONJ; + /* Check if the conjucate is on the first or second parameter. */ + if (!conj_first_operand) + { + this->m_ops.quick_push (left_op[1]); + this->m_ops.quick_push (left_op[0]); + } + else + { + this->m_ops.quick_push (left_op[0]); + this->m_ops.quick_push (left_op[1]); + } + } + + this->m_ops.quick_push (child); + this->m_ops.quick_push (right_op[1]); + this->m_ops.quick_push (right_op[0]); + + vect_build_perm_groups (perm_cache, &this->m_blocks[0], this->m_ops); + this->m_nodes = &SLP_TREE_CHILDREN (*this->m_node); + return true; +} + +bool +complex_fms_pattern::matches (slp_tree_to_load_perm_map_t *perm_cache) +{ + auto_vec args0; + complex_operation_t op + = vect_detect_pair_op (*this->m_node, true, &args0); + return matches (op, perm_cache, args0); +} + + +/******************************************************************************* + * complex_operations_pattern class + ******************************************************************************/ + +/* This function combines all the existing pattern matchers above into one class + that shares the functionality between them. The initial match is shared + between all complex operations. */ + +class complex_operations_pattern : public complex_pattern +{ + protected: + complex_operations_pattern (slp_tree *node) + : complex_pattern (node) + { } + + gcall *build (); + + /* The current "active" pattern. */ + vect_pattern *m_patt = NULL; + + public: + using complex_pattern::matches; + + bool matches (slp_tree_to_load_perm_map_t *perm_cache); + const char* get_name (); + bool validate_p (slp_tree_to_load_perm_map_t *perm_cache); + bool is_optab_supported_p (tree, optimization_type); + internal_fn get_ifn (); + gcall *build (vec_info *); + void reset (slp_tree *); + ~complex_operations_pattern (); + + static vect_pattern* create (slp_tree *node) + { + return new complex_operations_pattern (node); + } +}; + + +/* Clean up the pattern if one existed. */ + +complex_operations_pattern::~complex_operations_pattern () +{ + if (this->m_patt) + delete this->m_patt; + this->m_patt = NULL; +} + +/* Match but do not perform any additional operations on the SLP tree. If the + match is successful then that pattern is made "active" in the class. */ + +bool +complex_operations_pattern::matches (slp_tree_to_load_perm_map_t *perm_cache) +{ + complex_operation_t op + = vect_detect_pair_op (*this->m_node, true, &this->m_ops); + + /* Check which pattern this may be. Match longest pattern first. */ + this->m_patt = complex_fma_pattern::create (this->m_node); + if (this->m_patt->matches (op, perm_cache, this->m_ops)) + return true; + + if (op == CMPLX_NONE) + return false; + + delete this->m_patt; + + this->m_patt = complex_fms_pattern::create (this->m_node); + if (this->m_patt->matches (op, perm_cache, this->m_ops)) + return true; + + delete this->m_patt; + + this->m_patt = complex_mul_pattern::create (this->m_node); + if (this->m_patt->matches (op, perm_cache, this->m_ops)) + return true; + + delete this->m_patt; + + this->m_patt = complex_add_pattern::create (this->m_node); + if (this->m_patt->matches (op, perm_cache, this->m_ops)) + return true; + + delete this->m_patt; + + this->m_patt = NULL; + return false; +} + +/* Friendly name of the operation the pattern matches. */ + +const char* +complex_operations_pattern::get_name () +{ + if (!this->m_patt) + return "Complex Operations"; + + return this->m_patt->get_name (); +} + +/* Only perform the pattern creation part of the matcher. This creates and + returns the new pattern statement and encapsulates the currently active + pattern. */ + +gcall * +complex_operations_pattern::build (vec_info *vinfo) +{ + if (!this->m_patt) + return NULL; + + return this->m_patt->build (vinfo); +} + +/* Validate the operation the currently active pattern detected. */ + +bool +complex_operations_pattern::validate_p (slp_tree_to_load_perm_map_t *perm_cache) +{ + if (!this->m_patt) + return false; + + return this->m_patt->validate_p (perm_cache); +} + +/* Check if the obtab of the currently active matches is supported. */ + +bool +complex_operations_pattern::is_optab_supported_p (tree vectype, + optimization_type opt_type) +{ + if (!vectype || !this->m_patt) + return false; + + return this->m_patt->is_optab_supported_p (vectype, opt_type); +} + +/* Return the matched IFN from the current active matcher if any. */ + +internal_fn +complex_operations_pattern::get_ifn () +{ + if (!this->m_patt) + return IFN_LAST; + + return this->m_patt->get_ifn (); +} + +/******************************************************************************* + * Pattern matching definitions + ******************************************************************************/ + +#define SLP_PATTERN(x) &x::create +vect_pattern_decl_t slp_patterns[] +{ + /* For least amount of back-tracking and more efficient matching + order patterns from the largest to the smallest. Especially if they + overlap in what they can detect. */ + + /* FMA overlaps with MUL but is the longer sequence. Because we're in post + order traversal we can't match FMA if included in + complex_operations_pattern so must be checked on it's own. */ + SLP_PATTERN (complex_operations_pattern), +}; +#undef SLP_PATTERN + +/* Set the number of SLP pattern matchers available. */ +size_t num__slp_patterns = sizeof(slp_patterns)/sizeof(vect_pattern_decl_t); --