From: Tamar Christina <Tamar.Christina@arm.com>
To: Richard Biener <rguenther@suse.de>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
nd <nd@arm.com>, "ook@ucw.cz" <ook@ucw.cz>,
"hongtao.liu@intel.com" <hongtao.liu@intel.com>
Subject: Re: [PATCH] middle-end: Support complex Addition
Date: Thu, 10 Dec 2020 16:59:03 +0000 [thread overview]
Message-ID: <20201210165901.GA22156@arm.com> (raw)
In-Reply-To: <nycvar.YFH.7.76.2012031347410.4836@elmra.sevgm.obk>
[-- Attachment #1: Type: text/plain, Size: 373198 bytes --]
Hi All,
This patch adds support for
* Complex Addition with rotation of 90 and 270.
Addition with rotation of the second argument around the Argand plane.
Supported rotations are 90 and 180.
c = a + (b * I) and c = a + (b * I * I * I)
For the full code I have pushed a branch at refs/users/tnfchris/heads/complex-numbers.
As a side note, I still needed to set
STMT_SLP_TYPE (call_stmt_info) = pure_slp;
as the new hybrid detection code only runs for loop aware SLP.
Bootstrapped Regtested on aarch64-none-linux-gnu (AArch64 and SVE) and no
issues, regtested arm-none-linux-gnueabihf (AArch32 and MVE).
I will commit the patch under the previous approval. Note that this patch has
two small changes to handle some ICEs but they are trivial so I assume OK still
counts:
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 72bbec4b45d..52757add0e3 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -2698,9 +2698,13 @@ again:
STMT_SLP_TYPE (stmt_info) = loop_vect;
if (STMT_VINFO_IN_PATTERN_P (stmt_info))
{
+ stmt_vec_info pattern_stmt_info
+ = STMT_VINFO_RELATED_STMT (stmt_info);
+ if (STMT_VINFO_SLP_VECT_ONLY (pattern_stmt_info))
+ STMT_VINFO_IN_PATTERN_P (stmt_info) = false;
+
gimple *pattern_def_seq = STMT_VINFO_PATTERN_DEF_SEQ (stmt_info);
- stmt_info = STMT_VINFO_RELATED_STMT (stmt_info);
- STMT_SLP_TYPE (stmt_info) = loop_vect;
+ STMT_SLP_TYPE (pattern_stmt_info) = loop_vect;
for (gimple_stmt_iterator pi = gsi_start (pattern_def_seq);
and
+ else if (all_loads.length () == 1)
+ {
+ retval.first = kind;
+ retval.second = all_loads[0];
+ }
at the end of linear_loads_p to handle the case of one child where things
wouldn't be combined.
Thanks,
Tamar
gcc/ChangeLog:
* tree-vect-slp-patterns.c: New file.
* Makefile.in: Add it.
* doc/passes.texi: Document it.
* internal-fn.def (COMPLEX_ADD_ROT90, COMPLEX_ADD_ROT270): New.
* optabs.def (cadd90_optab, cadd270_optab): New.
* doc/md.texi: Document them.
* tree-vect-slp.c:
(vect_free_slp_instance, vect_create_new_slp_node): Export.
(vect_match_slp_patterns_2, vect_match_slp_patterns): New.
(vect_analyze_slp): Use it.
* tree-vectorizer.h (vect_free_slp_tree): Export.
(enum _complex_operation): Forward declare.
(class vect_pattern): New
gcc/testsuite/ChangeLog:
* lib/target-supports.exp
(check_effective_target_arm_v8_3a_complex_neon_ok_nocache): Fix it.
(check_effective_target_vect_complex_add_byte
,check_effective_target_vect_complex_add_int
,check_effective_target_vect_complex_add_short
,check_effective_target_vect_complex_add_long
,check_effective_target_vect_complex_add_half
,check_effective_target_vect_complex_add_float
,check_effective_target_vect_complex_add_double): New.
* gcc.dg/vect/complex/bb-slp-complex-add-byte.c: New test.
* gcc.dg/vect/complex/bb-slp-complex-add-int.c: New test.
* gcc.dg/vect/complex/bb-slp-complex-add-long.c: New test.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-byte.c: New test.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-int.c: New test.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-long.c: New test.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-short.c: New test.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-byte.c: New test.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-int.c: New test.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-long.c: New test.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-short.c: New test.
* gcc.dg/vect/complex/bb-slp-complex-add-short.c: New test.
* gcc.dg/vect/complex/bb-slp-complex-add-unsigned-byte.c: New test.
* gcc.dg/vect/complex/bb-slp-complex-add-unsigned-int.c: New test.
* gcc.dg/vect/complex/bb-slp-complex-add-unsigned-long.c: New test.
* gcc.dg/vect/complex/bb-slp-complex-add-unsigned-short.c: New test.
* gcc.dg/vect/complex/complex-add-pattern-template.c: New test.
* gcc.dg/vect/complex/complex-add-template.c: New test.
* gcc.dg/vect/complex/complex-operations-run.c: New test.
* gcc.dg/vect/complex/complex-operations.c: New test.
* gcc.dg/vect/complex/complex.exp: New test.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-double.c: New test.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-float.c: New test.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-half-float.c: New test.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-double.c: New test.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-float.c: New test.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-half-float.c: New test.
* gcc.dg/vect/complex/fast-math-complex-add-double.c: New test.
* gcc.dg/vect/complex/fast-math-complex-add-float.c: New test.
* gcc.dg/vect/complex/fast-math-complex-add-half-float.c: New test.
* gcc.dg/vect/complex/fast-math-complex-add-pattern-double.c: New test.
* gcc.dg/vect/complex/fast-math-complex-add-pattern-float.c: New test.
* gcc.dg/vect/complex/fast-math-complex-add-pattern-half-float.c: New test.
* gcc.dg/vect/complex/vect-complex-add-byte.c: New test.
* gcc.dg/vect/complex/vect-complex-add-int.c: New test.
* gcc.dg/vect/complex/vect-complex-add-long.c: New test.
* gcc.dg/vect/complex/vect-complex-add-pattern-byte.c: New test.
* gcc.dg/vect/complex/vect-complex-add-pattern-int.c: New test.
* gcc.dg/vect/complex/vect-complex-add-pattern-long.c: New test.
* gcc.dg/vect/complex/vect-complex-add-pattern-short.c: New test.
* gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-byte.c: New test.
* gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-int.c: New test.
* gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-long.c: New test.
* gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-short.c: New test.
* gcc.dg/vect/complex/vect-complex-add-short.c: New test.
* gcc.dg/vect/complex/vect-complex-add-unsigned-byte.c: New test.
* gcc.dg/vect/complex/vect-complex-add-unsigned-int.c: New test.
* gcc.dg/vect/complex/vect-complex-add-unsigned-long.c: New test.
* gcc.dg/vect/complex/vect-complex-add-unsigned-short.c: New test.
The 12/03/2020 13:02, Richard Biener wrote:
> On Thu, 3 Dec 2020, Tamar Christina wrote:
>
> > Hi Richi,
> >
> > Thanks for the reviews, I believe I have addressed all your feedback.
> >
> > If you are happy with this version I will respin the MUL etc patches
> > and have them all done by end of next week along with the respin of
> > the back-end patches.
> >
> > For GCC-12 I would like to do this properly so will get back to you
> > on how you'd like to see it.
>
> (few) comments inline
>
> > Regards,
> > Tamar
> >
> > diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> > index 778ec09c75d9af1cb9f2d5e7582b948c0397db65..d80657b089829fa30cede8bcfe036dda0ec06682 100644
> > --- a/gcc/Makefile.in
> > +++ b/gcc/Makefile.in
> > @@ -1646,6 +1646,7 @@ OBJS = \
> > tree-vect-loop.o \
> > tree-vect-loop-manip.o \
> > tree-vect-slp.o \
> > + tree-vect-slp-patterns.o \
> > tree-vectorizer.o \
> > tree-vector-builder.o \
> > tree-vrp.o \
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > index da8c9a283dd42e2b3078ed5f370a37180ee0b538..2a030a1d7373cd2b5837aa1c99936a6a4e4e1480 100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -6154,6 +6154,54 @@ floating-point mode.
> >
> > This pattern is not allowed to @code{FAIL}.
> >
> > +@cindex @code{cadd90@var{m}3} instruction pattern
> > +@item @samp{cadd90@var{m}3}
> > +Perform vector add and subtract on even/odd number pairs. The operation being
> > +matched is semantically described as
> > +
> > +@smallexample
> > + for (int i = 0; i < N; i += 2)
> > + @{
> > + c[i] = a[i] - b[i+1];
> > + c[i+1] = a[i+1] + b[i];
> > + @}
> > +@end smallexample
> > +
> > +This operation is semantically equivalent to performing a vector addition of
> > +complex numbers in operand 1 with operand 2 rotated by 90 degrees around
> > +the argand plane and storing the result in operand 0.
> > +
> > +In GCC lane ordering the real part of the number must be in the even lanes with
> > +the imaginary part in the odd lanes.
> > +
> > +The operation is only supported for vector modes @var{m}.
> > +
> > +This pattern is not allowed to @code{FAIL}.
> > +
> > +@cindex @code{cadd270@var{m}3} instruction pattern
> > +@item @samp{cadd270@var{m}3}
> > +Perform vector add and subtract on even/odd number pairs. The operation being
> > +matched is semantically described as
> > +
> > +@smallexample
> > + for (int i = 0; i < N; i += 2)
> > + @{
> > + c[i] = a[i] + b[i+1];
> > + c[i+1] = a[i+1] - b[i];
> > + @}
> > +@end smallexample
> > +
> > +This operation is semantically equivalent to performing a vector addition of
> > +complex numbers in operand 1 with operand 2 rotated by 270 degrees around
> > +the argand plane and storing the result in operand 0.
> > +
> > +In GCC lane ordering the real part of the number must be in the even lanes with
> > +the imaginary part in the odd lanes.
> > +
> > +The operation is only supported for vector modes @var{m}.
> > +
> > +This pattern is not allowed to @code{FAIL}.
> > +
> > @cindex @code{ffs@var{m}2} instruction pattern
> > @item @samp{ffs@var{m}2}
> > Store into operand 0 one plus the index of the least significant 1-bit
> > diff --git a/gcc/doc/passes.texi b/gcc/doc/passes.texi
> > index a5ae4143a8c1293e674b499120372ee5fe5c412b..c86df5cd843084a5b7933ef99a23386891a7b0c1 100644
> > --- a/gcc/doc/passes.texi
> > +++ b/gcc/doc/passes.texi
> > @@ -709,7 +709,8 @@ loop.
> > The pass is implemented in @file{tree-vectorizer.c} (the main driver),
> > @file{tree-vect-loop.c} and @file{tree-vect-loop-manip.c} (loop specific parts
> > and general loop utilities), @file{tree-vect-slp} (loop-aware SLP
> > -functionality), @file{tree-vect-stmts.c} and @file{tree-vect-data-refs.c}.
> > +functionality), @file{tree-vect-stmts.c}, @file{tree-vect-data-refs.c} and
> > +@file{tree-vect-slp-patterns.c} containing the SLP pattern matcher.
> > Analysis of data references is in @file{tree-data-ref.c}.
> >
> > SLP Vectorization. This pass performs vectorization of straight-line code. The
> > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> > index 310d37aa53819791b5df1683afca831f08e5892a..33c54be1e158ddea25c4cd6b1148df8cf4a509b5 100644
> > --- a/gcc/internal-fn.def
> > +++ b/gcc/internal-fn.def
> > @@ -277,6 +277,9 @@ DEF_INTERNAL_FLT_FN (SCALB, ECF_CONST, scalb, binary)
> > DEF_INTERNAL_FLT_FLOATN_FN (FMIN, ECF_CONST, fmin, binary)
> > DEF_INTERNAL_FLT_FLOATN_FN (FMAX, ECF_CONST, fmax, binary)
> > DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONST, xorsign, binary)
> > +DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT90, ECF_CONST, cadd90, binary)
> > +DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary)
> > +
> >
> > /* FP scales. */
> > DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
> > diff --git a/gcc/optabs.def b/gcc/optabs.def
> > index 5607f51e6b4b775a92d1d8ffcd3e9b53e9270d6c..e9727def4dbf941bb9ac8b56f83f8ea0f52b262c 100644
> > --- a/gcc/optabs.def
> > +++ b/gcc/optabs.def
> > @@ -290,6 +290,8 @@ OPTAB_D (atan_optab, "atan$a2")
> > OPTAB_D (atanh_optab, "atanh$a2")
> > OPTAB_D (copysign_optab, "copysign$F$a3")
> > OPTAB_D (xorsign_optab, "xorsign$F$a3")
> > +OPTAB_D (cadd90_optab, "cadd90$a3")
> > +OPTAB_D (cadd270_optab, "cadd270$a3")
> > OPTAB_D (cos_optab, "cos$a2")
> > OPTAB_D (cosh_optab, "cosh$a2")
> > OPTAB_D (exp10_optab, "exp10$a2")
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-byte.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-byte.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..231e2a3fd1053d41d081226172d3c77d7463f7c7
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-byte.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_complex_add_byte } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > +
> > +#define TYPE int8_t
> > +#define N 16
> > +#include <stdint.h>
> > +#include "complex-add-template.c"
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 4 "slp1" } } */
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "slp1" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-int.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-int.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..57c1bbd8f8b47294f6fc1ab6ee5979083ac6db7a
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-int.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_complex_add_int } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > +
> > +#define TYPE int32_t
> > +#define N 16
> > +#include <stdint.h>
> > +#include "complex-add-template.c"
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 4 "slp1" } } */
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "slp1" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-long.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-long.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..0669f2e286e8e09f9694e549c05412b0713207c9
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-long.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_complex_add_long } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > +
> > +#define TYPE int64_t
> > +#define N 16
> > +#include <stdint.h>
> > +#include "complex-add-template.c"
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 4 "slp1" } } */
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "slp1" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-byte.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-byte.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..6cb99238f45416166d5e463dbb70b559f2350b56
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-byte.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_complex_add_byte } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > +
> > +#define TYPE int8_t
> > +#define N 16
> > +#include <stdint.h>
> > +#include "complex-add-pattern-template.c"
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 4 "slp1" } } */
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "slp1" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-int.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-int.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..2c8adfa2298c7cc8aac95ed1c1a6075828e6d1a0
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-int.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_complex_add_int } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > +
> > +#define TYPE int32_t
> > +#define N 16
> > +#include <stdint.h>
> > +#include "complex-add-pattern-template.c"
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 4 "slp1" } } */
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "slp1" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-long.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-long.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..7ad175dd45a2a70a0857fbf7078187f27e65b41f
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-long.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_complex_add_long } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > +
> > +#define TYPE int64_t
> > +#define N 16
> > +#include <stdint.h>
> > +#include "complex-add-pattern-template.c"
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 4 "slp1" } } */
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "slp1" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-short.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-short.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..5da06316e34425313a0df23ba9a2da942d045405
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-short.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_complex_add_short } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > +
> > +#define TYPE int16_t
> > +#define N 16
> > +#include <stdint.h>
> > +#include "complex-add-pattern-template.c"
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 4 "slp1" } } */
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "slp1" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-byte.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-byte.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..a582c93f67e16b560c9688dbbf04f4c4171c3bab
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-byte.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_complex_add_byte } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > +
> > +#define TYPE uint8_t
> > +#define N 16
> > +#include <stdint.h>
> > +#include "complex-add-pattern-template.c"
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 4 "slp1" } } */
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "slp1" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-int.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-int.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..f857f30e0c269f6fc00b29e8b5e6ef4b4ceca47a
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-int.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_complex_add_int } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > +
> > +#define TYPE uint32_t
> > +#define N 16
> > +#include <stdint.h>
> > +#include "complex-add-pattern-template.c"
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 4 "slp1" } } */
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "slp1" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-long.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-long.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..14fd224a00d55efd43d48847363c119ec6b2ba66
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-long.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_complex_add_long } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > +
> > +#define TYPE uint64_t
> > +#define N 16
> > +#include <stdint.h>
> > +#include "complex-add-pattern-template.c"
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 4 "slp1" } } */
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "slp1" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-short.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-short.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..b3b8095cfad91cb13dad30fdabb4c26e7a9d455e
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-short.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_complex_add_short } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > +
> > +#define TYPE uint16_t
> > +#define N 16
> > +#include <stdint.h>
> > +#include "complex-add-pattern-template.c"
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 4 "slp1" } } */
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "slp1" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-short.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-short.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..eba8752c615b752b4dfc75d03327ccc015a67a51
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-short.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_complex_add_short } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > +
> > +#define TYPE int16_t
> > +#define N 16
> > +#include <stdint.h>
> > +#include "complex-add-template.c"
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 4 "slp1" } } */
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "slp1" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-unsigned-byte.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-unsigned-byte.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..bda78ac615c9ceb621cd0b330a5d20aadede4bc4
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-unsigned-byte.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_complex_add_byte } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > +
> > +#define TYPE uint8_t
> > +#define N 16
> > +#include <stdint.h>
> > +#include "complex-add-template.c"
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 4 "slp1" } } */
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "slp1" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-unsigned-int.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-unsigned-int.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..3f53098b06e0022df7ed785a0fae6ce94293801e
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-unsigned-int.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_complex_add_int } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > +
> > +#define TYPE uint32_t
> > +#define N 16
> > +#include <stdint.h>
> > +#include "complex-add-template.c"
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 4 "slp1" } } */
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "slp1" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-unsigned-long.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-unsigned-long.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..132ac477ef8bb99647470fc3008d619357eedeef
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-unsigned-long.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_complex_add_long } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > +
> > +#define TYPE uint64_t
> > +#define N 16
> > +#include <stdint.h>
> > +#include "complex-add-template.c"
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 4 "slp1" } } */
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "slp1" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-unsigned-short.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-unsigned-short.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..d4b44d3e8c936132e96618087e5fa25a6a40cfd5
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-unsigned-short.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_complex_add_short } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > +
> > +#define TYPE uint16_t
> > +#define N 16
> > +#include <stdint.h>
> > +#include "complex-add-template.c"
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 4 "slp1" } } */
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "slp1" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/complex-add-pattern-template.c b/gcc/testsuite/gcc.dg/vect/complex/complex-add-pattern-template.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..e8b8b19d1708673b17564b31d22df3443d667277
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/complex-add-pattern-template.c
> > @@ -0,0 +1,60 @@
> > +void add90 (TYPE a[restrict N], TYPE b[restrict N], TYPE c[restrict N])
> > +{
> > + for (int i=0; i < N; i+=2)
> > + {
> > + c[i] = a[i] - b[i+1];
> > + c[i+1] = a[i+1] + b[i];
> > + }
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 1 "vect" } } */
> > +
> > +void add270 (TYPE a[restrict N], TYPE b[restrict N], TYPE c[restrict N])
> > +{
> > + for (int i=0; i < N; i+=2)
> > + {
> > + c[i] = a[i] + b[i+1];
> > + c[i+1] = a[i+1] - b[i];
> > + }
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "vect" } } */
> > +
> > +void addMixed (TYPE a[restrict N], TYPE b[restrict N], TYPE c[restrict N])
> > +{
> > + for (int i=0; i < N; i+=4)
> > + {
> > + c[i] = a[i] - b[i+1];
> > + c[i+1] = a[i+1] + b[i];
> > + c[i+2] = a[i+2] + b[i+3];
> > + c[i+3] = a[i+3] - b[i+2];
> > + }
> > +}
> > +
> > +void add90HandUnrolled (TYPE a[restrict N], TYPE b[restrict N],
> > + TYPE c[restrict N])
> > +{
> > + for (int i=0; i < (N /2); i+=4)
> > + {
> > + c[i] = a[i] - b[i+1];
> > + c[i+2] = a[i+2] - b[i+3];
> > + c[i+1] = a[i+1] + b[i];
> > + c[i+3] = a[i+3] + b[i+2];
> > + }
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 1 "vect" } } */
> > +
> > +void add90Hybrid (TYPE a[restrict N], TYPE b[restrict N], TYPE c[restrict N],
> > + TYPE d[restrict N])
> > +{
> > + for (int i=0; i < N; i+=2)
> > + {
> > + c[i] = a[i] - b[i+1];
> > + c[i+1] = a[i+1] + b[i];
> > + d[i] = a[i] - b[i];
> > + d[i+1] = a[i+1] - b[i+1];
> > + }
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 2 "vect" } } */
> > \ No newline at end of file
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/complex-add-template.c b/gcc/testsuite/gcc.dg/vect/complex/complex-add-template.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..dc0cca0bc76887632bdb84d148213d2c688d7698
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/complex-add-template.c
> > @@ -0,0 +1,79 @@
> > +#include <complex.h>
> > +
> > +void add0 (TYPE _Complex a[restrict N], TYPE _Complex b[restrict N],
> > + TYPE _Complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] = a[i] + b[i];
> > +}
> > +
> > +void add90snd (TYPE _Complex a[restrict N], TYPE _Complex b[restrict N],
> > + TYPE _Complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] = a[i] + (b[i] * I);
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 1 "vect" } } */
> > +
> > +void add180snd (TYPE _Complex a[restrict N], TYPE _Complex b[restrict N],
> > + TYPE _Complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] = a[i] + (b[i] * I * I);
> > +}
> > +
> > +void add270snd (TYPE _Complex a[restrict N], TYPE _Complex b[restrict N],
> > + TYPE _Complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] = a[i] + (b[i] * I * I * I);
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "vect" } } */
> > +
> > +void add90fst (TYPE _Complex a[restrict N], TYPE _Complex b[restrict N],
> > + TYPE _Complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] = (a[i] * I) + b[i];
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 1 "vect" } } */
> > +
> > +void add180fst (TYPE _Complex a[restrict N], TYPE _Complex b[restrict N],
> > + TYPE _Complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] = (a[i] * I * I) + b[i];
> > +}
> > +
> > +void add270fst (TYPE _Complex a[restrict N], TYPE _Complex b[restrict N],
> > + TYPE _Complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] = (a[i] * I * I * I) + b[i];
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "vect" } } */
> > +
> > +void addconjfst (TYPE _Complex a[restrict N], TYPE _Complex b[restrict N],
> > + TYPE _Complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] = ~a[i] + b[i];
> > +}
> > +
> > +void addconjsnd (TYPE _Complex a[restrict N], TYPE _Complex b[restrict N],
> > + TYPE _Complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] = a[i] + ~b[i];
> > +}
> > +
> > +void addconjboth (TYPE _Complex a[restrict N], TYPE _Complex b[restrict N],
> > + TYPE _Complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] = ~a[i] + ~b[i];
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/complex-operations-run.c b/gcc/testsuite/gcc.dg/vect/complex/complex-operations-run.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..a0348a7041ca384104bc5ab688d941c14e5b7381
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/complex-operations-run.c
> > @@ -0,0 +1,103 @@
> > +/* { dg-do run } */
> > +/* { dg-require-effective-target vect_complex_add_double } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > +
> > +#include <stdio.h>
> > +#include <complex.h>
> > +#include <string.h>
> > +#include <float.h>
> > +#include <math.h>
> > +
> > +#define PREF old
> > +#pragma GCC push_options
> > +#pragma GCC optimize ("no-tree-vectorize")
> > +# include "complex-operations.c"
> > +#pragma GCC pop_options
> > +#undef PREF
> > +
> > +#define PREF new
> > +# include "complex-operations.c"
> > +#undef PREF
> > +
> > +#define TYPE double
> > +#define TYPE2 double
> > +#define EP pow(2, -45)
> > +
> > +#define xstr(s) str(s)
> > +#define str(s) #s
> > +
> > +#define FCMP(A, B) \
> > + ((fabs (creal (A) - creal (B)) <= EP) && (fabs (cimag (A) - cimag (B)) <= EP))
> > +
> > +#define CMP(A, B) \
> > + (FCMP(A,B) ? "PASS" : "FAIL")
> > +
> > +#define COMPARE(A,B) \
> > + memset (&c1, 0, sizeof (c1)); \
> > + memset (&c2, 0, sizeof (c2)); \
> > + A; B; \
> > + if (!FCMP(c1[0],c2[0]) || !FCMP(c1[1], c2[1])) \
> > + { \
> > + printf ("=> %s vs %s\n", xstr (A), xstr (B)); \
> > + printf ("%a\n", creal (c1[0]) - creal (c2[0])); \
> > + printf ("%a\n", cimag (c1[1]) - cimag (c2[1])); \
> > + printf ("%.2f+%.2fI == %.2f+%.2fI (%s)\n", creal (c1[0]), cimag (c1[0]), creal (c2[0]), cimag (c2[0]), CMP (c1[0], c2[0])); \
> > + printf ("%.2f+%.2fI == %.2f+%.2fI (%s)\n", creal (c1[1]), cimag (c1[1]), creal (c2[1]), cimag (c2[1]), CMP (c1[1], c2[1])); \
> > + printf ("\n"); \
> > + __builtin_abort (); \
> > + }
> > +
> > +int main ()
> > +{
> > + TYPE2 complex a[] = { 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I };
> > + TYPE complex b[] = { 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I };
> > + TYPE complex c2[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
> > + TYPE complex c1[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
> > + TYPE diff1, diff2;
> > +
> > + COMPARE(fma0_old(a, b, c1), fma0_new(a, b, c2));
> > + COMPARE(fma90_old(a, b, c1), fma90_new(a, b, c2));
> > + COMPARE(fma180_old(a, b, c1), fma180_new(a, b, c2));
> > + COMPARE(fma270_old(a, b, c1), fma270_new(a, b, c2));
> > + COMPARE(fma0_snd_old(a, b, c1), fma0_snd_new(a, b, c2));
> > + COMPARE(fma90_snd_old(a, b, c1), fma90_snd_new(a, b, c2));
> > + COMPARE(fma180_snd_old(a, b, c1), fma180_snd_new(a, b, c2));
> > + COMPARE(fma270_snd_old(a, b, c1), fma270_snd_new(a, b, c2));
> > + COMPARE(fma_conj_first_old(a, b, c1), fma_conj_first_new(a, b, c2));
> > + COMPARE(fma_conj_second_old(a, b, c1), fma_conj_second_new(a, b, c2));
> > + COMPARE(fma_conj_both_old(a, b, c1), fma_conj_both_new(a, b, c2));
> > + COMPARE(fms0_old(a, b, c1), fms0_new(a, b, c2));
> > + COMPARE(fms90_old(a, b, c1), fms90_new(a, b, c2));
> > + COMPARE(fms180_old(a, b, c1), fms180_new(a, b, c2));
> > + COMPARE(fms270_old(a, b, c1), fms270_new(a, b, c2));
> > + COMPARE(fms0_snd_old(a, b, c1), fms0_snd_new(a, b, c2));
> > + COMPARE(fms90_snd_old(a, b, c1), fms90_snd_new(a, b, c2));
> > + COMPARE(fms180_snd_old(a, b, c1), fms180_snd_new(a, b, c2));
> > + COMPARE(fms270_snd_old(a, b, c1), fms270_snd_new(a, b, c2));
> > + COMPARE(fms_conj_first_old(a, b, c1), fms_conj_first_new(a, b, c2));
> > + COMPARE(fms_conj_second_old(a, b, c1), fms_conj_second_new(a, b, c2));
> > + COMPARE(fms_conj_both_old(a, b, c1), fms_conj_both_new(a, b, c2));
> > + COMPARE(mul0_old(a, b, c1), mul0_new(a, b, c2));
> > + COMPARE(mul90_old(a, b, c1), mul90_new(a, b, c2));
> > + COMPARE(mul180_old(a, b, c1), mul180_new(a, b, c2));
> > + COMPARE(mul270_old(a, b, c1), mul270_new(a, b, c2));
> > + COMPARE(mul0_snd_old(a, b, c1), mul0_snd_new(a, b, c2));
> > + COMPARE(mul90_snd_old(a, b, c1), mul90_snd_new(a, b, c2));
> > + COMPARE(mul180_snd_old(a, b, c1), mul180_snd_new(a, b, c2));
> > + COMPARE(mul270_snd_old(a, b, c1), mul270_snd_new(a, b, c2));
> > + COMPARE(mul_conj_first_old(a, b, c1), mul_conj_first_new(a, b, c2));
> > + COMPARE(mul_conj_second_old(a, b, c1), mul_conj_second_new(a, b, c2));
> > + COMPARE(mul_conj_both_old(a, b, c1), mul_conj_both_new(a, b, c2));
> > + COMPARE(add0_old(a, b, c1), add0_new(a, b, c2));
> > + COMPARE(add90_old(a, b, c1), add90_new(a, b, c2));
> > + COMPARE(add180_old(a, b, c1), add180_new(a, b, c2));
> > + COMPARE(add270_old(a, b, c1), add270_new(a, b, c2));
> > + COMPARE(add0_snd_old(a, b, c1), add0_snd_new(a, b, c2));
> > + COMPARE(add90_snd_old(a, b, c1), add90_snd_new(a, b, c2));
> > + COMPARE(add180_snd_old(a, b, c1), add180_snd_new(a, b, c2));
> > + COMPARE(add270_snd_old(a, b, c1), add270_snd_new(a, b, c2));
> > + COMPARE(add_conj_first_old(a, b, c1), add_conj_first_new(a, b, c2));
> > + COMPARE(add_conj_second_old(a, b, c1), add_conj_second_new(a, b, c2));
> > + COMPARE(add_conj_both_old(a, b, c1), add_conj_both_new(a, b, c2));
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/complex-operations.c b/gcc/testsuite/gcc.dg/vect/complex/complex-operations.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..fdce995481d23c6a536293c8ee59eaf9ca9239bf
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/complex-operations.c
> > @@ -0,0 +1,358 @@
> > +#include <stdio.h>
> > +#include <complex.h>
> > +
> > +#ifndef PREF
> > +#define PREF c
> > +#endif
> > +
> > +#define FX(N,P) P ## _ ## N
> > +#define MK(N,P) FX(P,N)
> > +
> > +#define N 32
> > +#define TYPE double
> > +
> > +// ------ FMA
> > +
> > +// Complex FMA instructions rotating the result
> > +
> > +__attribute__((noinline,noipa))
> > +void MK(fma0, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] += a[i] * b[i];
> > +}
> > +
> > +__attribute__((noinline,noipa))
> > +void MK(fma90, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] += a[i] * b[i] * I;
> > +}
> > +
> > +__attribute__((noinline,noipa))
> > +void MK(fma180, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] += a[i] * b[i] * I * I;
> > +}
> > +
> > +__attribute__((noinline,noipa))
> > +void MK(fma270, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] += a[i] * b[i] * I * I * I;
> > +}
> > +
> > +// Complex FMA instructions rotating the second parameter.
> > +
> > +
> > +__attribute__((noinline,noipa))
> > +void MK(fma0_snd, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] += a[i] * b[i];
> > +}
> > +
> > +__attribute__((noinline,noipa))
> > +void MK(fma90_snd, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] += a[i] * (b[i] * I);
> > +}
> > +
> > +__attribute__((noinline,noipa))
> > +void MK(fma180_snd, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] += a[i] * (b[i] * I * I);
> > +}
> > +
> > +__attribute__((noinline,noipa))
> > +void MK(fma270_snd, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] += a[i] * (b[i] * I * I * I);
> > +}
> > +
> > +// Complex FMA instructions with conjucated values.
> > +
> > +
> > +__attribute__((noinline,noipa))
> > +void MK(fma_conj_first, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] += conj (a[i]) * b[i];
> > +}
> > +
> > +__attribute__((noinline,noipa))
> > +void MK(fma_conj_second, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] += a[i] * conj (b[i]);
> > +}
> > +
> > +__attribute__((noinline,noipa))
> > +void MK(fma_conj_both, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] += conj (a[i]) * conj (b[i]);
> > +}
> > +
> > +// ----- FMS
> > +
> > +// Complex FMS instructions rotating the result
> > +
> > +__attribute__((noinline,noipa))
> > +void MK(fms0, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] -= a[i] * b[i];
> > +}
> > +
> > +__attribute__((noinline,noipa))
> > +void MK(fms90, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] -= a[i] * b[i] * I;
> > +}
> > +
> > +__attribute__((noinline,noipa))
> > +void MK(fms180, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] -= a[i] * b[i] * I * I;
> > +}
> > +
> > +__attribute__((noinline,noipa))
> > +void MK(fms270, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] -= a[i] * b[i] * I * I * I;
> > +}
> > +
> > +// Complex FMS instructions rotating the second parameter.
> > +
> > +__attribute__((noinline,noipa))
> > +void MK(fms0_snd, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] -= a[i] * b[i];
> > +}
> > +
> > +__attribute__((noinline,noipa))
> > +void MK(fms90_snd, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] -= a[i] * (b[i] * I);
> > +}
> > +
> > +__attribute__((noinline,noipa))
> > +void MK(fms180_snd, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] -= a[i] * (b[i] * I * I);
> > +}
> > +
> > +__attribute__((noinline,noipa))
> > +void MK(fms270_snd, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] -= a[i] * (b[i] * I * I * I);
> > +}
> > +
> > +// Complex FMS instructions with conjucated values.
> > +
> > +__attribute__((noinline,noipa))
> > +void MK(fms_conj_first, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] -= conj (a[i]) * b[i];
> > +}
> > +
> > +__attribute__((noinline,noipa))
> > +void MK(fms_conj_second, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] -= a[i] * conj (b[i]);
> > +}
> > +
> > +__attribute__((noinline,noipa))
> > +void MK(fms_conj_both, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] -= conj (a[i]) * conj (b[i]);
> > +}
> > +
> > +
> > +// ----- MUL
> > +
> > +// Complex MUL instructions rotating the result
> > +
> > +__attribute__((noinline,noipa))
> > +void MK(mul0, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] = a[i] * b[i];
> > +}
> > +
> > +__attribute__((noinline,noipa))
> > +void MK(mul90, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] = a[i] * b[i] * I;
> > +}
> > +
> > +__attribute__((noinline,noipa))
> > +void MK(mul180, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] = a[i] * b[i] * I * I;
> > +}
> > +
> > +__attribute__((noinline,noipa))
> > +void MK(mul270, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] = a[i] * b[i] * I * I * I;
> > +}
> > +
> > +// Complex MUL instructions rotating the second parameter.
> > +
> > +__attribute__((noinline,noipa))
> > +void MK(mul0_snd, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] = a[i] * b[i];
> > +}
> > +
> > +__attribute__((noinline,noipa))
> > +void MK(mul90_snd, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] = a[i] * (b[i] * I);
> > +}
> > +
> > +__attribute__((noinline,noipa))
> > +void MK(mul180_snd, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] = a[i] * (b[i] * I * I);
> > +}
> > +
> > +__attribute__((noinline,noipa))
> > +void MK(mul270_snd, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] = a[i] * (b[i] * I * I * I);
> > +}
> > +
> > +// Complex FMS instructions with conjucated values.
> > +
> > +__attribute__((noinline,noipa))
> > +void MK(mul_conj_first, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] = conj (a[i]) * b[i];
> > +}
> > +
> > +__attribute__((noinline,noipa))
> > +void MK(mul_conj_second, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] = a[i] * conj (b[i]);
> > +}
> > +
> > +__attribute__((noinline,noipa))
> > +void MK(mul_conj_both, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] = conj (a[i]) * conj (b[i]);
> > +}
> > +
> > +
> > +// ----- ADD
> > +
> > +// Complex ADD instructions rotating the result
> > +
> > +__attribute__((noinline,noipa))
> > +void MK(add0, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] = a[i] + b[i];
> > +}
> > +
> > +__attribute__((noinline,noipa))
> > +void MK(add90, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] = (a[i] + b[i]) * I;
> > +}
> > +
> > +__attribute__((noinline,noipa))
> > +void MK(add180, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] = (a[i] + b[i]) * I * I;
> > +}
> > +
> > +__attribute__((noinline,noipa))
> > +void MK(add270, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] = (a[i] + b[i]) * I * I * I;
> > +}
> > +
> > +// Complex ADD instructions rotating the second parameter.
> > +
> > +__attribute__((noinline,noipa))
> > +void MK(add0_snd, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] = a[i] + b[i];
> > +}
> > +
> > +__attribute__((noinline,noipa))
> > +void MK(add90_snd, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] = a[i] + (b[i] * I);
> > +}
> > +
> > +__attribute__((noinline,noipa))
> > +void MK(add180_snd, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] = a[i] + (b[i] * I * I);
> > +}
> > +
> > +__attribute__((noinline,noipa))
> > +void MK(add270_snd, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] = a[i] + (b[i] * I * I * I);
> > +}
> > +
> > +// Complex ADD instructions with conjucated values.
> > +
> > +__attribute__((noinline,noipa))
> > +void MK(add_conj_first, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] = conj (a[i]) + b[i];
> > +}
> > +
> > +__attribute__((noinline,noipa))
> > +void MK(add_conj_second, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] = a[i] + conj (b[i]);
> > +}
> > +
> > +__attribute__((noinline,noipa))
> > +void MK(add_conj_both, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + c[i] = conj (a[i]) + conj (b[i]);
> > +}
> > +
> > +
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/complex.exp b/gcc/testsuite/gcc.dg/vect/complex/complex.exp
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..daeb02820ce3c83af0b5047cc25c7348790e1b8e
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/complex.exp
> > @@ -0,0 +1,20 @@
> > +# Copyright (C) 1997-2020 Free Software Foundation, Inc.
> > +
> > +# This program is free software; you can redistribute it and/or modify
> > +# it under the terms of the GNU General Public License as published by
> > +# the Free Software Foundation; either version 3 of the License, or
> > +# (at your option) any later version.
> > +#
> > +# This program is distributed in the hope that it will be useful,
> > +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > +# GNU General Public License for more details.
> > +#
> > +# You should have received a copy of the GNU General Public License
> > +# along with GCC; see the file COPYING3. If not see
> > +# <http://www.gnu.org/licenses/>.
> > +
> > +# GCC testsuite that uses the `dg.exp' driver.
> > +
> > +# Load support procs.
> > +load_file $srcdir/$subdir/../vect.exp
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-double.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-double.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..9a97d10357741eca73067b41ce7234e87b53a880
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-double.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_complex_add_double } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > +
> > +#define TYPE double
> > +#define N 16
> > +#include "complex-add-template.c"
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 1 "slp1" } } */
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "slp1" } } */
> > +
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-float.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-float.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..63ca9788063f473483064229836e0d0445ebc747
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-float.c
> > @@ -0,0 +1,11 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_complex_add_float } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > +
> > +#define TYPE float
> > +#define N 16
> > +#include "complex-add-template.c"
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 1 "slp1" } } */
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "slp1" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-half-float.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-half-float.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..a6fb4296938112246e98bb45055b7d49df45b5d0
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-half-float.c
> > @@ -0,0 +1,13 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_complex_add_half } */
> > +/* { dg-add-options arm_v8_3a_fp16_complex_neon } */
> > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > +
> > +#define TYPE _Float16
> > +#define N 16
> > +#include "complex-add-template.c"
> > +
> > +/* Vectorization is failing for these cases. They should work but for now ignore. */
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "slp1" { xfail *-*-* } } } */
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 1 "slp1" { xfail *-*-* } } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-double.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-double.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..4c0b9035677f53c792be3e53181eec1e688e0b3e
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-double.c
> > @@ -0,0 +1,11 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_complex_add_double } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > +
> > +#define TYPE double
> > +#define N 16
> > +#include "complex-add-pattern-template.c"
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 1 "slp1" } } */
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "slp1" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-float.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-float.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..18ad35316fbd45b228fd9c1b612590df446cf1a5
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-float.c
> > @@ -0,0 +1,11 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_complex_add_float } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > +
> > +#define TYPE float
> > +#define N 16
> > +#include "complex-add-pattern-template.c"
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 1 "slp1" } } */
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "slp1" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-half-float.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-half-float.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..124b7a7224d3eafe75116adc73debd639e78c1a6
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-half-float.c
> > @@ -0,0 +1,11 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_complex_add_half } */
> > +/* { dg-add-options arm_v8_3a_fp16_complex_neon } */
> > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > +
> > +#define TYPE _Float16
> > +#define N 16
> > +#include "complex-add-pattern-template.c"
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 1 "slp1" } } */
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "slp1" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-double.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-double.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..9b285b4f875aa2e8adc8daeb720c2f21e3dce38d
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-double.c
> > @@ -0,0 +1,11 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_complex_add_double } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > +
> > +#define TYPE double
> > +#define N 200
> > +#include "complex-add-template.c"
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 2 "vect" } } */
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 2 "vect" } } */
> > \ No newline at end of file
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-float.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-float.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..f63d38433e53142eb0cc42968682201c1ad32140
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-float.c
> > @@ -0,0 +1,11 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_complex_add_float } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > +
> > +#define TYPE float
> > +#define N 200
> > +#include "complex-add-template.c"
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 2 "vect" } } */
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 2 "vect" } } */
> > \ No newline at end of file
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-half-float.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-half-float.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..1736ab9037cd555b2d6dffc9b984ede469b9cf84
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-half-float.c
> > @@ -0,0 +1,11 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_complex_add_half } */
> > +/* { dg-add-options arm_v8_3a_fp16_complex_neon } */
> > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > +
> > +#define TYPE _Float16
> > +#define N 200
> > +#include "complex-add-template.c"
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 2 "vect" } } */
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 2 "vect" } } */
> > \ No newline at end of file
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-pattern-double.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-pattern-double.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..6dd621ad1c07138501921241fb37e7f419e963c3
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-pattern-double.c
> > @@ -0,0 +1,11 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_complex_add_double } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > +
> > +#define TYPE double
> > +#define N 200
> > +#include "complex-add-pattern-template.c"
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 4 "vect" } } */
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "vect" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-pattern-float.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-pattern-float.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..e081abbc5f879385cd76d57359eb18e54cce911f
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-pattern-float.c
> > @@ -0,0 +1,11 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_complex_add_float } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > +
> > +#define TYPE float
> > +#define N 200
> > +#include "complex-add-pattern-template.c"
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 4 "vect" } } */
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "vect" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-pattern-half-float.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-pattern-half-float.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..b368e086083c5a4c59a383acce8ec770dca9277c
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-pattern-half-float.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_complex_add_half } */
> > +/* { dg-add-options arm_v8_3a_fp16_complex_neon } */
> > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > +
> > +#define TYPE _Float16
> > +#define N 200
> > +#include "complex-add-pattern-template.c"
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 4 "vect" } } */
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "vect" } } */
> > +
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-byte.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-byte.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..96a82df7db4bdc20fe8ff9050acb3a7159ce3760
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-byte.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_complex_add_byte } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > +
> > +#define TYPE int8_t
> > +#define N 200
> > +#include <stdint.h>
> > +#include "complex-add-template.c"
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 2 "vect" } } */
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 2 "vect" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-int.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-int.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..a69fad33ddd2b660cbd7a38cb0c05cafc472c907
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-int.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_complex_add_int } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > +
> > +#define TYPE int32_t
> > +#define N 200
> > +#include <stdint.h>
> > +#include "complex-add-template.c"
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 2 "vect" } } */
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 2 "vect" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-long.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-long.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..76c885e8aa54fb3019b4ed31835a04010c518137
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-long.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_complex_add_long } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > +
> > +#define TYPE int64_t
> > +#define N 200
> > +#include <stdint.h>
> > +#include "complex-add-template.c"
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 2 "vect" } } */
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 2 "vect" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-byte.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-byte.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..ae35b57c22ad8f9fc5430ae25d0a9e6012aaa4f7
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-byte.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_complex_add_byte } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > +
> > +#define TYPE int8_t
> > +#define N 200
> > +#include <stdint.h>
> > +#include "complex-add-pattern-template.c"
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 4 "vect" } } */
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "vect" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-int.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-int.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..5b3a6911d54624eaaf32d1bb3a4753e8940050c0
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-int.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_complex_add_int } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > +
> > +#define TYPE int32_t
> > +#define N 200
> > +#include <stdint.h>
> > +#include "complex-add-pattern-template.c"
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 4 "vect" } } */
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "vect" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-long.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-long.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..454a0ebef77ce52b71d832cca5af1144cdd80123
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-long.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_complex_add_long } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > +
> > +#define TYPE int64_t
> > +#define N 200
> > +#include <stdint.h>
> > +#include "complex-add-pattern-template.c"
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 4 "vect" } } */
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "vect" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-short.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-short.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..87b96934618a87b63a1660e3ba7a2735f6f65a0a
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-short.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_complex_add_short } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > +
> > +#define TYPE int16_t
> > +#define N 200
> > +#include <stdint.h>
> > +#include "complex-add-pattern-template.c"
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 4 "vect" } } */
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "vect" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-byte.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-byte.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..991c07d24e965fff36d5540bfa90055ca9a34f90
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-byte.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_complex_add_byte } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > +
> > +#define TYPE uint8_t
> > +#define N 200
> > +#include <stdint.h>
> > +#include "complex-add-pattern-template.c"
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 4 "vect" } } */
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "vect" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-int.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-int.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..9c6f9cc5309c7f4eef6b8d0ccdeb1697661f0411
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-int.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_complex_add_int } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > +
> > +#define TYPE uint32_t
> > +#define N 200
> > +#include <stdint.h>
> > +#include "complex-add-pattern-template.c"
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 4 "vect" } } */
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "vect" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-long.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-long.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..f571afd0a509aa659508977f499049fb2a40a9de
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-long.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_complex_add_long } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > +
> > +#define TYPE uint64_t
> > +#define N 200
> > +#include <stdint.h>
> > +#include "complex-add-pattern-template.c"
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 4 "vect" } } */
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "vect" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-short.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-short.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..b1ef5a0a0394470d73d654e3a723c7b842136f44
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-short.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_complex_add_short } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > +
> > +#define TYPE uint16_t
> > +#define N 200
> > +#include <stdint.h>
> > +#include "complex-add-pattern-template.c"
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 4 "vect" } } */
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "vect" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-short.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-short.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..beeb13b21861099011131ca67c63f98192f69b8c
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-short.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_complex_add_short } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > +
> > +#define TYPE int16_t
> > +#define N 200
> > +#include <stdint.h>
> > +#include "complex-add-template.c"
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 2 "vect" } } */
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 2 "vect" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-unsigned-byte.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-unsigned-byte.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..6e0909cebb4cbc04a816399c77b12bce11814ce6
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-unsigned-byte.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_complex_add_byte } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > +
> > +#define TYPE uint8_t
> > +#define N 200
> > +#include <stdint.h>
> > +#include "complex-add-template.c"
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 2 "vect" } } */
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 2 "vect" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-unsigned-int.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-unsigned-int.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..a0949199ce7e509bb47b91f52465d6d5b10146d3
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-unsigned-int.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_complex_add_int } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > +
> > +#define TYPE uint32_t
> > +#define N 200
> > +#include <stdint.h>
> > +#include "complex-add-template.c"
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 2 "vect" } } */
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 2 "vect" } } */
> > \ No newline at end of file
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-unsigned-long.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-unsigned-long.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..af589841f8e13129465ac6119412df7f71dd87e1
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-unsigned-long.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_complex_add_long } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > +
> > +#define TYPE uint64_t
> > +#define N 200
> > +#include <stdint.h>
> > +#include "complex-add-template.c"
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 2 "vect" } } */
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 2 "vect" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-unsigned-short.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-unsigned-short.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..b8ee6c23aa83a6c0a67a907f3a1f8eb3e72cc62a
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-unsigned-short.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_complex_add_short } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > +
> > +#define TYPE uint16_t
> > +#define N 200
> > +#include <stdint.h>
> > +#include "complex-add-template.c"
> > +
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 2 "vect" } } */
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 2 "vect" } } */
> > \ No newline at end of file
> > diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
> > index 22acda2a74fdfa51aebbc311d5cc84763b0ffc63..aed0768056b78f808b0709987cfbdcc0a1a3216f 100644
> > --- a/gcc/testsuite/lib/target-supports.exp
> > +++ b/gcc/testsuite/lib/target-supports.exp
> > @@ -3355,7 +3355,102 @@ proc check_effective_target_vect_int { } {
> > }}]
> > }
> >
> > -# Return 1 if the target supports signed int->float conversion
> > +# Return 1 if the target supports hardware vectorization of complex additions of
> > +# byte, 0 otherwise.
> > +#
> > +# This won't change for different subtargets so cache the result.
> > +
> > +proc check_effective_target_vect_complex_add_byte { } {
> > + return [check_cached_effective_target_indexed vect_complex_add_byte {
> > + expr {
> > + [check_effective_target_aarch64_sve2]
> > + || [check_effective_target_arm_v8_1m_mve_fp_ok]
> > + }}]
> > +}
> > +
> > +# Return 1 if the target supports hardware vectorization of complex additions of
> > +# short, 0 otherwise.
> > +#
> > +# This won't change for different subtargets so cache the result.
> > +
> > +proc check_effective_target_vect_complex_add_short { } {
> > + return [check_cached_effective_target_indexed vect_complex_add_short {
> > + expr {
> > + [check_effective_target_aarch64_sve2]
> > + || [check_effective_target_arm_v8_1m_mve_fp_ok]
> > + }}]
> > +}
> > +
> > +# Return 1 if the target supports hardware vectorization of complex additions of
> > +# int, 0 otherwise.
> > +#
> > +# This won't change for different subtargets so cache the result.
> > +
> > +proc check_effective_target_vect_complex_add_int { } {
> > + return [check_cached_effective_target_indexed vect_complex_add_int {
> > + expr {
> > + [check_effective_target_aarch64_sve2]
> > + || [check_effective_target_arm_v8_1m_mve_fp_ok]
> > + }}]
> > +}
> > +
> > +# Return 1 if the target supports hardware vectorization of complex additions of
> > +# long, 0 otherwise.
> > +#
> > +# This won't change for different subtargets so cache the result.
> > +
> > +proc check_effective_target_vect_complex_add_long { } {
> > + return [check_cached_effective_target_indexed vect_complex_add_long {
> > + expr {
> > + [check_effective_target_aarch64_sve2]
> > + || [check_effective_target_arm_v8_1m_mve_fp_ok]
> > + }}]
> > +}
> > +
> > +# Return 1 if the target supports hardware vectorization of complex additions of
> > +# half, 0 otherwise.
> > +#
> > +# This won't change for different subtargets so cache the result.
> > +
> > +proc check_effective_target_vect_complex_add_half { } {
> > + return [check_cached_effective_target_indexed vect_complex_add_half {
> > + expr {
> > + ([check_effective_target_arm_v8_3a_complex_neon_ok]
> > + && [check_effective_target_arm_v8_2a_fp16_neon_ok])
> > + || [check_effective_target_aarch64_sve2]
> > + || [check_effective_target_arm_v8_1m_mve_fp_ok]
> > + }}]
> > +}
> > +
> > +# Return 1 if the target supports hardware vectorization of complex additions of
> > +# float, 0 otherwise.
> > +#
> > +# This won't change for different subtargets so cache the result.
> > +
> > +proc check_effective_target_vect_complex_add_float { } {
> > + return [check_cached_effective_target_indexed vect_complex_add_float {
> > + expr {
> > + [check_effective_target_arm_v8_3a_complex_neon_ok]
> > + || [check_effective_target_aarch64_sve2]
> > + || [check_effective_target_arm_v8_1m_mve_fp_ok]
> > + }}]
> > +}
> > +
> > +# Return 1 if the target supports hardware vectorization of complex additions of
> > +# double, 0 otherwise.
> > +#
> > +# This won't change for different subtargets so cache the result.
> > +
> > +proc check_effective_target_vect_complex_add_double { } {
> > + return [check_cached_effective_target_indexed vect_complex_add_double {
> > + expr {
> > + [check_effective_target_arm_v8_3a_complex_neon_ok]
> > + || [check_effective_target_aarch64_sve2]
> > + || [check_effective_target_arm_v8_1m_mve_fp_ok]
> > + }}]
> > +}
> > +
> > +# Return 1 if the target supports signed int->float conversion
> > #
> >
> > proc check_effective_target_vect_intfloat_cvt { } {
> > @@ -10374,13 +10469,13 @@ proc check_effective_target_arm_v8_3a_complex_neon_ok_nocache { } {
> > # need to be added to the -march option.
> > foreach flags {"" "-mfloat-abi=softfp -mfpu=auto" "-mfloat-abi=hard -mfpu=auto"} {
> > if { [check_no_compiler_messages_nocache \
> > - arm_v8_3a_complex_neon_ok object {
> > + arm_v8_3a_complex_neon_ok assembly {
> > #if !defined (__ARM_FEATURE_COMPLEX)
> > #error "__ARM_FEATURE_COMPLEX not defined"
> > #endif
> > } "$flags -march=armv8.3-a"] } {
> > set et_arm_v8_3a_complex_neon_flags "$flags -march=armv8.3-a"
> > - return 1
> > + return 1;
> > }
> > }
> >
> > @@ -10400,13 +10495,57 @@ proc add_options_for_arm_v8_3a_complex_neon { flags } {
> > return "$flags $et_arm_v8_3a_complex_neon_flags"
> > }
> >
> > +# Return 1 if the target supports ARMv8.3 Adv.SIMD + FP16 Complex instructions
> > +# instructions, 0 otherwise. The test is valid for ARM and for AArch64.
> > +# Record the command line options needed.
> > +
> > +proc check_effective_target_arm_v8_3a_fp16_complex_neon_ok_nocache { } {
> > + global et_arm_v8_3a_fp16_complex_neon_flags
> > + set et_arm_v8_3a_fp16_complex_neon_flags ""
> > +
> > + if { ![istarget arm*-*-*] && ![istarget aarch64*-*-*] } {
> > + return 0;
> > + }
> > +
> > + # Iterate through sets of options to find the compiler flags that
> > + # need to be added to the -march option.
> > + foreach flags {"" "-mfloat-abi=softfp -mfpu=auto" "-mfloat-abi=hard -mfpu=auto"} {
> > + if { [check_no_compiler_messages_nocache \
> > + arm_v8_3a_fp16_complex_neon_ok assembly {
> > + #if !defined (__ARM_FEATURE_COMPLEX)
> > + #error "__ARM_FEATURE_COMPLEX not defined"
> > + #endif
> > + } "$flags -march=armv8.3-a+fp16"] } {
> > + set et_arm_v8_3a_fp16_complex_neon_flags \
> > + "$flags -march=armv8.3-a+fp16"
> > + return 1;
> > + }
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +proc check_effective_target_arm_v8_3a_fp16_complex_neon_ok { } {
> > + return [check_cached_effective_target arm_v8_3a_fp16_complex_neon_ok \
> > + check_effective_target_arm_v8_3a_fp16_complex_neon_ok_nocache]
> > +}
> > +
> > +proc add_options_for_arm_v8_3a_fp16_complex_neon { flags } {
> > + if { ! [check_effective_target_arm_v8_3a_fp16_complex_neon_ok] } {
> > + return "$flags"
> > + }
> > + global et_arm_v8_3a_fp16_complex_neon_flags
> > + return "$flags $et_arm_v8_3a_fp16_complex_neon_flags"
> > +}
> > +
> > +
> > # Return 1 if the target supports executing AdvSIMD instructions from ARMv8.3
> > # with the complex instruction extension, 0 otherwise. The test is valid for
> > # ARM and for AArch64.
> >
> > proc check_effective_target_arm_v8_3a_complex_neon_hw { } {
> > if { ![check_effective_target_arm_v8_3a_complex_neon_ok] } {
> > - return 0;
> > + return 1;
> > }
> > return [check_runtime arm_v8_3a_complex_neon_hw_available {
> > #include "arm_neon.h"
> > @@ -10431,7 +10570,7 @@ proc check_effective_target_arm_v8_3a_complex_neon_hw { } {
> > : /* No clobbers. */);
> > #endif
> >
> > - return (results[0] == 8 && results[1] == 24) ? 1 : 0;
> > + return (results[0] == 8 && results[1] == 24) ? 0 : 1;
> > }
> > } [add_options_for_arm_v8_3a_complex_neon ""]]
> > }
> > diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> > index f2ce75aac3ebf39a5a6001c5cc33cf94a2942486..3d599b9fb39527229eff01fa9b7f94046bd8adfe 100644
> > --- a/gcc/tree-vect-patterns.c
> > +++ b/gcc/tree-vect-patterns.c
> > @@ -5281,7 +5281,7 @@ const unsigned int NUM_PATTERNS = ARRAY_SIZE (vect_vect_recog_func_ptrs);
> >
> > /* Mark statements that are involved in a pattern. */
> >
> > -static inline void
> > +void
> > vect_mark_pattern_stmts (vec_info *vinfo,
> > stmt_vec_info orig_stmt_info, gimple *pattern_stmt,
> > tree pattern_vectype)
> > diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..7786560b4d6a35555288df7e5b8d2f0238c0abd6
> > --- /dev/null
> > +++ b/gcc/tree-vect-slp-patterns.c
> > @@ -0,0 +1,714 @@
> > +/* SLP - Pattern matcher on SLP trees
> > + Copyright (C) 2020 Free Software Foundation, Inc.
> > +
> > +This file is part of GCC.
> > +
> > +GCC is free software; you can redistribute it and/or modify it under
> > +the terms of the GNU General Public License as published by the Free
> > +Software Foundation; either version 3, or (at your option) any later
> > +version.
> > +
> > +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> > +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> > +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
> > +for more details.
> > +
> > +You should have received a copy of the GNU General Public License
> > +along with GCC; see the file COPYING3. If not see
> > +<http://www.gnu.org/licenses/>. */
> > +
> > +#include "config.h"
> > +#include "system.h"
> > +#include "coretypes.h"
> > +#include "backend.h"
> > +#include "target.h"
> > +#include "rtl.h"
> > +#include "tree.h"
> > +#include "gimple.h"
> > +#include "tree-pass.h"
> > +#include "ssa.h"
> > +#include "optabs-tree.h"
> > +#include "insn-config.h"
> > +#include "recog.h" /* FIXME: for insn_data */
> > +#include "fold-const.h"
> > +#include "stor-layout.h"
> > +#include "gimple-iterator.h"
> > +#include "cfgloop.h"
> > +#include "tree-vectorizer.h"
> > +#include "langhooks.h"
> > +#include "gimple-walk.h"
> > +#include "dbgcnt.h"
> > +#include "tree-vector-builder.h"
> > +#include "vec-perm-indices.h"
> > +#include "gimple-fold.h"
> > +#include "internal-fn.h"
> > +
> > +/* SLP Pattern matching mechanism.
> > +
> > + This extension to the SLP vectorizer allows one to transform the generated SLP
> > + tree based on any pattern. The difference between this and the normal vect
> > + pattern matcher is that unlike the former, this matcher allows you to match
> > + with instructions that do not belong to the same SSA dominator graph.
> > +
> > + The only requirement that this pattern matcher has is that you are only
> > + only allowed to either match an entire group or none.
> > +
> > + The pattern matcher currently only allows you to perform replacements to
> > + internal functions.
> > +
> > + Once the patterns are matched it is one way, these cannot be undone. It is
> > + currently not supported to match patterns recursively.
> > +
> > + To add a new pattern, implement the vect_pattern class and add the type to
> > + slp_patterns.
> > +
> > +*/
> > +
> > +/*******************************************************************************
> > + * vect_pattern class
> > + ******************************************************************************/
> > +
> > +/* Default implementation of recognize that performs matching, validation and
> > + replacement of nodes but that can be overriden if required. */
> > +
> > +static bool
> > +vect_pattern_validate_optab (internal_fn ifn, slp_tree node)
> > +{
> > + tree vectype = SLP_TREE_VECTYPE (node);
> > + if (ifn == IFN_LAST || !vectype)
> > + return false;
> > +
> > + if (dump_enabled_p ())
> > + dump_printf_loc (MSG_NOTE, vect_location,
> > + "Found %s pattern in SLP tree\n",
> > + internal_fn_name (ifn));
> > +
> > + if (direct_internal_fn_supported_p (ifn, vectype, OPTIMIZE_FOR_SPEED))
> > + {
> > + if (dump_enabled_p ())
> > + dump_printf_loc (MSG_NOTE, vect_location,
> > + "Target supports %s vectorization with mode %T\n",
> > + internal_fn_name (ifn), vectype);
> > + }
> > + else
> > + {
> > + if (dump_enabled_p ())
> > + {
> > + if (!vectype)
> > + dump_printf_loc (MSG_NOTE, vect_location,
> > + "Target does not support vector type for %T\n",
> > + SLP_TREE_DEF_TYPE (node));
> > + else
> > + dump_printf_loc (MSG_NOTE, vect_location,
> > + "Target does not support %s for vector type "
> > + "%T\n", internal_fn_name (ifn), vectype);
> > + }
> > + return false;
> > + }
> > + return true;
> > +}
> > +
> > +/*******************************************************************************
> > + * General helper types
> > + ******************************************************************************/
> > +
> > +/* The COMPLEX_OPERATION enum denotes the possible pair of operations that can
> > + be matched when looking for expressions that we are interested matching for
> > + complex numbers addition and mla. */
> > +
> > +typedef enum _complex_operation : unsigned {
> > + PLUS_PLUS,
> > + MINUS_PLUS,
> > + PLUS_MINUS,
> > + MULT_MULT,
> > + CMPLX_NONE
> > +} complex_operation_t;
> > +
> > +/*******************************************************************************
> > + * General helper functions
> > + ******************************************************************************/
> > +
> > +/* Helper function of linear_loads_p that checks to see if the load permutation
> > + is sequential and in monotonically increasing order of loads with no gaps.
> > +*/
> > +
> > +static inline complex_perm_kinds_t
> > +is_linear_load_p (load_permutation_t loads)
> > +{
> > + if (loads.length() == 0)
> > + return PERM_UNKNOWN;
> > +
> > + unsigned load, i;
> > + complex_perm_kinds_t candidates[4]
> > + = { PERM_EVENODD
> > + , PERM_ODDEVEN
> > + , PERM_ODDODD
> > + , PERM_EVENEVEN
> > + };
> > +
> > + int valid_patterns = 4;
> > + FOR_EACH_VEC_ELT_FROM (loads, i, load, 1)
> > + {
> > + if (candidates[0] != PERM_UNKNOWN && load != i)
> > + {
> > + candidates[0] = PERM_UNKNOWN;
> > + valid_patterns--;
> > + }
> > + if (candidates[1] != PERM_UNKNOWN
> > + && load != (i % 2 == 0 ? i + 1 : i - 1))
> > + {
> > + candidates[1] = PERM_UNKNOWN;
> > + valid_patterns--;
> > + }
> > + if (candidates[2] != PERM_UNKNOWN && load != 1)
> > + {
> > + candidates[2] = PERM_UNKNOWN;
> > + valid_patterns--;
> > + }
> > + if (candidates[3] != PERM_UNKNOWN && load != 0)
> > + {
> > + candidates[3] = PERM_UNKNOWN;
> > + valid_patterns--;
> > + }
> > +
> > + if (valid_patterns == 0)
> > + return PERM_UNKNOWN;
> > + }
> > +
> > + for (i = 0; i < sizeof(candidates); i++)
> > + if (candidates[i] != PERM_UNKNOWN)
> > + return candidates[i];
> > +
> > + return PERM_UNKNOWN;
> > +}
> > +
> > +/* Combine complex_perm_kinds A and B into a new permute kind that describes the
> > + resulting operation. */
> > +
> > +static inline complex_perm_kinds_t
> > +vect_merge_perms (complex_perm_kinds_t a, complex_perm_kinds_t b)
> > +{
> > + if (a == b)
> > + return a;
> > +
> > + if (a == PERM_TOP)
> > + return b;
> > +
> > + if (b == PERM_TOP)
> > + return a;
> > +
> > + return PERM_UNKNOWN;
> > +}
> > +
> > +/* Check to see if all loads rooted in ROOT are linear. Linearity is
> > + defined as having no gaps between values loaded. */
> > +
> > +static complex_load_perm_t
> > +linear_loads_p (slp_tree_to_load_perm_map_t *perm_cache, slp_tree root)
> > +{
> > + if (!root)
> > + return std::make_pair (PERM_UNKNOWN, vNULL);
> > +
> > + unsigned i;
> > + complex_load_perm_t *tmp;
> > +
> > + if ((tmp = perm_cache->get (root)) != NULL)
> > + return *tmp;
> > +
> > + complex_load_perm_t retval = std::make_pair (PERM_UNKNOWN, vNULL);
> > + perm_cache->put (root, retval);
> > +
> > + /* If it's a load node, then just read the load permute. */
> > + if (SLP_TREE_LOAD_PERMUTATION (root).exists ())
> > + {
> > + retval.first = is_linear_load_p (SLP_TREE_LOAD_PERMUTATION (root));
> > + retval.second = SLP_TREE_LOAD_PERMUTATION (root);
> > + perm_cache->put (root, retval);
> > + return retval;
> > + }
> > + else if (SLP_TREE_DEF_TYPE (root) != vect_internal_def)
> > + {
> > + retval.first = PERM_TOP;
> > + return retval;
> > + }
> > +
> > + auto_vec<load_permutation_t> all_loads;
> > + complex_perm_kinds_t kind = PERM_TOP;
> > +
> > + slp_tree child;
> > + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (root), i, child)
> > + {
> > + complex_load_perm_t res = linear_loads_p (perm_cache, child);
> > + kind = vect_merge_perms (kind, res.first);
> > + if (kind == PERM_UNKNOWN)
> > + return retval;
> > + all_loads.safe_push (res.second);
> > + }
> > +
> > + if (SLP_TREE_LANE_PERMUTATION (root).exists ())
> > + {
> > + lane_permutation_t perm = SLP_TREE_LANE_PERMUTATION (root);
> > + load_permutation_t nloads;
> > + nloads.create (SLP_TREE_LANES (root));
> > + nloads.quick_grow (SLP_TREE_LANES (root));
> > + for (i = 0; i < SLP_TREE_LANES (root); i++)
> > + nloads[i] = all_loads[perm[i].first][perm[i].second];
> > +
> > + retval.first = kind;
> > + retval.second = nloads;
> > + }
> > +
> > + perm_cache->put (root, retval);
> > + return retval;
> > +}
> > +
> > +
> > +/* This function attempts to make a node rooted in NODE is linear. If the node
> > + if already linear than the node itself is returned in RESULT.
> > +
> > + If the node is not linear then a new VEC_PERM_EXPR node is created with a
> > + lane permute that when applied will make the node linear. If such a
> > + permute cannot be created then FALSE is returned from the function.
> > +
> > + Here linearity is defined as having a sequential, monotically increasing
> > + load position inside the load permute generated by the loads reachable from
> > + NODE. */
> > +
> > +static slp_tree
> > +vect_build_linear_node (slp_tree node)
>
> can you name it vect_build_swap_evenodd_node () since that is what it
> does?
>
> > +{
> > + /* Attempt to linearise the permute. */
> > + vec<std::pair<unsigned, unsigned> > zipped;
> > + zipped.create (SLP_TREE_LANES (node));
> > +
> > + for (unsigned x = 0; x < SLP_TREE_LANES (node); x+=2)
> > + {
> > + zipped.quick_push (std::make_pair (0, x+1));
> > + zipped.quick_push (std::make_pair (0, x));
> > + }
> > +
> > + /* Create the new permute node and store it instead. */
> > + slp_tree vnode = vect_create_new_slp_node (1, VEC_PERM_EXPR);
> > + SLP_TREE_LANE_PERMUTATION (vnode) = zipped;
> > + SLP_TREE_VECTYPE (vnode) = SLP_TREE_VECTYPE (node);
> > + SLP_TREE_CHILDREN (vnode).quick_push (node);
> > + SLP_TREE_REF_COUNT (vnode) = 1;
> > + SLP_TREE_LANES (vnode) = SLP_TREE_LANES (node);
> > + SLP_TREE_REPRESENTATIVE (vnode) = SLP_TREE_REPRESENTATIVE (node);
> > + SLP_TREE_REF_COUNT (node)++;
> > + return vnode;
> > +}
> > +
> > +/* Checks to see of the expression represented by NODE is a gimple assign with
> > + code CODE. */
> > +
> > +static inline bool
> > +vect_match_expression_p (slp_tree node, tree_code code)
> > +{
> > + if (!node
> > + || !SLP_TREE_REPRESENTATIVE (node))
> > + return false;
> > +
> > + gimple* expr = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE (node));
> > + if (!is_gimple_assign (expr)
> > + || gimple_assign_rhs_code (expr) != code)
> > + return false;
> > +
> > + return true;
> > +}
> > +
> > +/* Check if the given lane permute in PERMUTES matches an alternating sequence
> > + of {P0 P1 P0 P1 ...}. This to account for unrolled loops. Further mode
> > + there resulting permute must be linear. */
> > +
> > +static inline bool
> > +vect_check_lane_permute (lane_permutation_t &permutes,
> > + unsigned p0, unsigned p1)
>
> vect_check_evenodd_blend (lane_permutation_t &permutes,
> unsigned even, unsigned odd)
>
> ? It matches { p0[seed], p1[seed+1], p0[seed+2], p1[seed+3], ... }
> though 'seed' could be odd - is it necessary to support seed != 0?
>
> > +{
> > + if (permutes.length () == 0)
> > + return false;
> > +
> > + unsigned val[2] = {p0, p1};
> > + unsigned seed = permutes[0].second;
> > + for (unsigned i = 0; i < permutes.length (); i++)
> > + if (permutes[i].first != val[i % 2]
> > + || permutes[i].second != seed++)
> > + return false;
> > +
> > + return true;
> > +}
> > +
> > +/* This function will match the two gimple expressions representing NODE1 and
> > + NODE2 in parallel and returns the pair operation that represents the two
> > + expressions in the two statements.
> > +
> > + If match is successful then the corresponding complex_operation is
> > + returned and the arguments to the two matched operations are returned in OPS.
> > +
> > + If TWO_OPERANDS it is expected that the LANES of the parent VEC_PERM select
> > + from the two nodes alternatingly.
> > +
> > + If unsuccessful then CMPLX_NONE is returned and OPS is untouched.
> > +
> > + e.g. the following gimple statements
> > +
> > + stmt 0 _39 = _37 + _12;
> > + stmt 1 _6 = _38 - _36;
> > +
> > + will return PLUS_MINUS along with OPS containing {_37, _12, _38, _36}.
> > +*/
> > +
> > +static complex_operation_t
> > +vect_detect_pair_op (slp_tree node1, slp_tree node2, lane_permutation_t &lanes,
> > + bool two_operands = true, vec<slp_tree> *ops = NULL)
> > +{
> > + complex_operation_t result = CMPLX_NONE;
> > +
> > + if (vect_match_expression_p (node1, MINUS_EXPR)
> > + && vect_match_expression_p (node2, PLUS_EXPR)
> > + && (!two_operands || vect_check_lane_permute (lanes, 0, 1)))
> > + result = MINUS_PLUS;
> > + else if (vect_match_expression_p (node1, PLUS_EXPR)
> > + && vect_match_expression_p (node2, MINUS_EXPR)
> > + && (!two_operands || vect_check_lane_permute (lanes, 0, 1)))
> > + result = PLUS_MINUS;
> > + else if (vect_match_expression_p (node1, PLUS_EXPR)
> > + && vect_match_expression_p (node2, PLUS_EXPR))
> > + result = PLUS_PLUS;
> > + else if (vect_match_expression_p (node1, MULT_EXPR)
> > + && vect_match_expression_p (node2, MULT_EXPR))
> > + result = MULT_MULT;
> > +
> > + if (result != CMPLX_NONE && ops != NULL)
> > + {
> > + ops->create (2);
> > + ops->quick_push (node1);
> > + ops->quick_push (node2);
> > + }
> > + return result;
> > +}
> > +
> > +/* Overload of vect_detect_pair_op that matches against the representative
> > + statements in the children of NODE. It is expected that NODE has exactly
> > + two children and when TWO_OPERANDS then NODE must be a VEC_PERM. */
> > +
> > +static complex_operation_t
> > +vect_detect_pair_op (slp_tree node, bool two_operands = true,
> > + vec<slp_tree> *ops = NULL)
> > +{
> > + if (!two_operands && SLP_TREE_CODE (node) == VEC_PERM_EXPR)
> > + return CMPLX_NONE;
> > +
> > + if (SLP_TREE_CHILDREN (node).length () != 2)
> > + return CMPLX_NONE;
> > +
> > + vec<slp_tree> children = SLP_TREE_CHILDREN (node);
> > + lane_permutation_t &lanes = SLP_TREE_LANE_PERMUTATION (node);
> > +
> > + return vect_detect_pair_op (children[0], children[1], lanes, two_operands,
> > + ops);
> > +}
> > +
> > +/*******************************************************************************
> > + * complex_pattern class
> > + ******************************************************************************/
> > +
> > +/* SLP Complex Numbers pattern matching.
> > +
> > + As an example, the following simple loop:
> > +
> > + double a[restrict N]; double b[restrict N]; double c[restrict N];
> > +
> > + for (int i=0; i < N; i+=2)
> > + {
> > + c[i] = a[i] - b[i+1];
> > + c[i+1] = a[i+1] + b[i];
> > + }
> > +
> > + which represents a complex addition on with a rotation of 90* around the
> > + argand plane. i.e. if `a` and `b` were complex numbers then this would be the
> > + same as `a + (b * I)`.
> > +
> > + Here the expressions for `c[i]` and `c[i+1]` are independent but have to be
> > + both recognized in order for the pattern to work. As an SLP tree this is
> > + represented as
> > +
> > + +--------------------------------+
> > + | stmt 0 *_9 = _10; |
> > + | stmt 1 *_15 = _16; |
> > + +--------------------------------+
> > + |
> > + |
> > + v
> > + +--------------------------------+
> > + | stmt 0 _10 = _4 - _8; |
> > + | stmt 1 _16 = _12 + _14; |
> > + | lane permutation { 0[0] 1[1] } |
> > + +--------------------------------+
> > + | |
> > + | |
> > + | |
> > + +-----+ | | +-----+
> > + | | | | | |
> > + +-----| { } |<-----+ +----->| { } --------+
> > + | | | +------------------| | |
> > + | +-----+ | +-----+ |
> > + | | | |
> > + | | | |
> > + | +------|------------------+ |
> > + | | | |
> > + v v v v
> > + +--------------------------+ +--------------------------------+
> > + | stmt 0 _8 = *_7; | | stmt 0 _4 = *_3; |
> > + | stmt 1 _14 = *_13; | | stmt 1 _12 = *_11; |
> > + | load permutation { 1 0 } | | load permutation { 0 1 } |
> > + +--------------------------+ +--------------------------------+
> > +
> > + The pattern matcher allows you to replace both statements 0 and 1 or none at
> > + all. Because this operation is a two operands operation the actual nodes
> > + being replaced are those in the { } nodes. The actual scalar statements
> > + themselves are not replaced or used during the matching but instead the
> > + SLP_TREE_REPRESENTATIVE statements are inspected. You are also allowed to
> > + replace and match on any number of nodes.
> > +
> > + Because the pattern matcher matches on the representative statement for the
> > + SLP node the case of two_operators it allows you to match the children of the
> > + node. This is done using the method `recognize ()`.
> > +
> > +*/
> > +
> > +/* The complex_pattern class contains common code for pattern matchers that work
> > + on complex numbers. These provide functionality to allow de-construction and
> > + validation of sequences depicting/transforming REAL and IMAG pairs. */
> > +
> > +class complex_pattern : public vect_pattern
> > +{
> > + protected:
> > + auto_vec<slp_tree> m_workset;
> > + complex_pattern (slp_tree *node, vec<slp_tree> *m_ops, internal_fn ifn)
> > + : vect_pattern (node, m_ops, ifn)
> > + {
> > + this->m_workset.safe_push (*node);
> > + }
> > +
> > + public:
> > + void build (vec_info *);
> > +
> > + static internal_fn
> > + matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
> > + vec<slp_tree> *);
> > +};
> > +
> > +/* Create a replacement pattern statement for each node in m_node and inserts
> > + the new statement into m_node as the new representative statement. The old
> > + statement is marked as being in a pattern defined by the new statement. The
> > + statement is created as call to internal function IFN with m_num_args
> > + arguments.
> > +
> > + Futhermore the new pattern is also added to the vectorization information
> > + structure VINFO and the old statement STMT_INFO is marked as unused while
> > + the new statement is marked as used and the number of SLP uses of the new
> > + statement is incremented.
> > +
> > + The newly created SLP nodes are marked as SLP only and will be dissolved
> > + if SLP is aborted.
> > +
> > + The newly created gimple call is returned and the BB remains unchanged.
> > +
> > + This default method is designed to only match against simple operands where
> > + all the input and output types are the same.
> > +*/
> > +
> > +void
> > +complex_pattern::build (vec_info *vinfo)
> > +{
> > + stmt_vec_info stmt_info;
> > +
> > + auto_vec<tree> args;
> > + args.create (this->m_num_args);
> > + args.quick_grow_cleared (this->m_num_args);
> > + slp_tree node;
> > + unsigned ix;
> > + stmt_vec_info call_stmt_info;
> > + gcall *call_stmt = NULL;
> > +
> > + /* Now modify the nodes themselves. */
> > + FOR_EACH_VEC_ELT (this->m_workset, ix, node)
> > + {
> > + /* Calculate the location of the statement in NODE to replace. */
> > + stmt_info = SLP_TREE_REPRESENTATIVE (node);
> > + gimple* old_stmt = STMT_VINFO_STMT (stmt_info);
> > + tree lhs_old_stmt = gimple_get_lhs (old_stmt);
> > + tree type = TREE_TYPE (lhs_old_stmt);
> > +
> > + /* Create the argument set for use by gimple_build_call_internal_vec. */
> > + for (unsigned i = 0; i < this->m_num_args; i++)
> > + args[i] = lhs_old_stmt;
> > +
> > + /* Create the new pattern statements. */
> > + call_stmt = gimple_build_call_internal_vec (this->m_ifn, args);
> > + tree var = make_temp_ssa_name (type, call_stmt, "slp_patt");
> > + gimple_call_set_lhs (call_stmt, var);
> > + gimple_set_location (call_stmt, gimple_location (old_stmt));
> > + gimple_call_set_nothrow (call_stmt, true);
> > +
> > + /* Adjust the book-keeping for the new and old statements for use during
> > + SLP. This is required to get the right VF and statement during SLP
> > + analysis. These changes are created after relevancy has been set for
> > + the nodes as such we need to manually update them. Any changes will be
> > + undone if SLP is cancelled. */
> > + call_stmt_info
> > + = vinfo->add_pattern_stmt (call_stmt, stmt_info);
> > +
> > + /* Make sure to mark the representative statement pure_slp and
> > + relevant. */
> > + STMT_VINFO_RELEVANT (call_stmt_info) = vect_used_in_scope;
> > + STMT_SLP_TYPE (call_stmt_info) = pure_slp;
> > +
> > + /* add_pattern_stmt can't be done in vect_mark_pattern_stmts because
> > + the non-SLP pattern matchers already have added the statement to VINFO
> > + by the time it is called. Some of them need to modify the returned
> > + stmt_info. vect_mark_pattern_stmts is called by recog_pattern and it
> > + would increase the size of each pattern with boilerplate code to make
> > + the call there. */
> > + vect_mark_pattern_stmts (vinfo, stmt_info, call_stmt,
> > + SLP_TREE_VECTYPE (node));
> > +
> > + /* Since we are replacing all the statements in the group with the same
> > + thing it doesn't really matter. So just set it every time a new stmt
> > + is created. */
> > + SLP_TREE_REPRESENTATIVE (node) = call_stmt_info;
> > + SLP_TREE_LANE_PERMUTATION (node) = vNULL;
>
> I guess you want to release it?
>
> SLP_TREE_LANE_PERMUTATION (node).release ();
>
>
> > + SLP_TREE_CODE (node) = CALL_EXPR;
> > + }
> > +}
> > +
> > +/*******************************************************************************
> > + * complex_add_pattern class
> > + ******************************************************************************/
> > +
> > +class complex_add_pattern : public complex_pattern
> > +{
> > + protected:
> > + complex_add_pattern (slp_tree *node, vec<slp_tree> *m_ops, internal_fn ifn)
> > + : complex_pattern (node, m_ops, ifn)
> > + {
> > + this->m_num_args = 2;
> > + }
> > +
> > + public:
> > + void build (vec_info *);
> > + static internal_fn
> > + matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
> > + vec<slp_tree> *);
> > +
> > + static vect_pattern*
> > + recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
> > +};
> > +
> > +/* Perform a replacement of the detected complex add pattern with the new
> > + instruction sequences. */
> > +
> > +void
> > +complex_add_pattern::build (vec_info *vinfo)
> > +{
> > + auto_vec<slp_tree> nodes;
> > + slp_tree node = this->m_ops[0];
> > + vec<slp_tree> children = SLP_TREE_CHILDREN (node);
> > +
> > + /* First re-arrange the children. */
> > + nodes.create (children.length ());
> > + nodes.quick_push (children[0]);
> > + nodes.quick_push (vect_build_linear_node (children[1]));
>
> So I expected whether we swap even/odd lanes of children[1] to be
> dependent on the matched complex operation. I guess we expect
> it for all complex_adds.
>
> The rest of the patch looks OK.
>
> Thus, OK with the suggested changes above. I think it's fine to
> commit now before you completed the rest.
>
> Thanks,
> Richard.
>
> > +
> > + SLP_TREE_CHILDREN (*this->m_node).truncate (0);
> > + SLP_TREE_CHILDREN (*this->m_node).safe_splice (nodes);
> > +
> > + complex_pattern::build (vinfo);
> > +}
> > +
> > +/* Pattern matcher for trying to match complex addition pattern in SLP tree.
> > +
> > + If no match is found then IFN is set to IFN_LAST.
> > + This function matches the patterns shaped as:
> > +
> > + c[i] = a[i] - b[i+1];
> > + c[i+1] = a[i+1] + b[i];
> > +
> > + If a match occurred then TRUE is returned, else FALSE. The initial match is
> > + expected to be in OP1 and the initial match operands in args0. */
> > +
> > +internal_fn
> > +complex_add_pattern::matches (complex_operation_t op,
> > + slp_tree_to_load_perm_map_t *perm_cache,
> > + vec<slp_tree> *ops)
> > +{
> > + internal_fn ifn = IFN_LAST;
> > +
> > + /* Find the two components. Rotation in the complex plane will modify
> > + the operations:
> > +
> > + * Rotation 0: + +
> > + * Rotation 90: - +
> > + * Rotation 180: - -
> > + * Rotation 270: + -
> > +
> > + Rotation 0 and 180 can be handled by normal SIMD code, so we don't need
> > + to care about them here. */
> > + if (op == MINUS_PLUS)
> > + ifn = IFN_COMPLEX_ADD_ROT90;
> > + else if (op == PLUS_MINUS)
> > + ifn = IFN_COMPLEX_ADD_ROT270;
> > + else
> > + return ifn;
> > +
> > + /* verify that there is a permute, otherwise this isn't a pattern we
> > + we support. */
> > + gcc_assert (ops->length () == 2);
> > +
> > + vec<slp_tree> children = SLP_TREE_CHILDREN ((*ops)[0]);
> > +
> > + /* First node must be unpermuted. */
> > + if (linear_loads_p (perm_cache, children[0]).first != PERM_EVENODD)
> > + return IFN_LAST;
> > +
> > + /* Second node must be permuted. */
> > + if (linear_loads_p (perm_cache, children[1]).first != PERM_ODDEVEN)
> > + return IFN_LAST;
> > +
> > + return ifn;
> > +}
> > +
> > +/* Attempt to recognize a complex add pattern. */
> > +
> > +vect_pattern*
> > +complex_add_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache,
> > + slp_tree *node)
> > +{
> > + auto_vec<slp_tree> ops;
> > + complex_operation_t op
> > + = vect_detect_pair_op (*node, true, &ops);
> > + internal_fn ifn = complex_add_pattern::matches (op, perm_cache, &ops);
> > + if (!vect_pattern_validate_optab (ifn, *node))
> > + return NULL;
> > +
> > + return new complex_add_pattern (node, &ops, ifn);
> > +}
> > +
> > +/*******************************************************************************
> > + * Pattern matching definitions
> > + ******************************************************************************/
> > +
> > +#define SLP_PATTERN(x) &x::recognize
> > +vect_pattern_decl_t slp_patterns[]
> > +{
> > + /* For least amount of back-tracking and more efficient matching
> > + order patterns from the largest to the smallest. Especially if they
> > + overlap in what they can detect. */
> > +
> > + SLP_PATTERN (complex_add_pattern),
> > +};
> > +#undef SLP_PATTERN
> > +
> > +/* Set the number of SLP pattern matchers available. */
> > +size_t num__slp_patterns = sizeof(slp_patterns)/sizeof(vect_pattern_decl_t);
> > diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
> > index 59a8630f74aaa43e6a4df03fa6745dbc04fa452e..efbc726667614f90b4ae8912e43b402e260e580d 100644
> > --- a/gcc/tree-vect-slp.c
> > +++ b/gcc/tree-vect-slp.c
> > @@ -105,7 +105,7 @@ _slp_tree::~_slp_tree ()
> >
> > /* Recursively free the memory allocated for the SLP tree rooted at NODE. */
> >
> > -static void
> > +void
> > vect_free_slp_tree (slp_tree node)
> > {
> > int i;
> > @@ -148,6 +148,18 @@ vect_free_slp_instance (slp_instance instance)
> >
> > /* Create an SLP node for SCALAR_STMTS. */
> >
> > +slp_tree
> > +vect_create_new_slp_node (unsigned nops, tree_code code)
> > +{
> > + slp_tree node = new _slp_tree;
> > + SLP_TREE_SCALAR_STMTS (node) = vNULL;
> > + SLP_TREE_CHILDREN (node).create (nops);
> > + SLP_TREE_DEF_TYPE (node) = vect_internal_def;
> > + SLP_TREE_CODE (node) = code;
> > + return node;
> > +}
> > +/* Create an SLP node for SCALAR_STMTS. */
> > +
> > static slp_tree
> > vect_create_new_slp_node (slp_tree node,
> > vec<stmt_vec_info> scalar_stmts, unsigned nops)
> > @@ -208,7 +220,7 @@ typedef struct _slp_oprnd_info
> >
> > /* Allocate operands info for NOPS operands, and GROUP_SIZE def-stmts for each
> > operand. */
> > -static vec<slp_oprnd_info>
> > +static vec<slp_oprnd_info>
> > vect_create_oprnd_info (int nops, int group_size)
> > {
> > int i;
> > @@ -1096,7 +1108,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
> > {
> > if (dump_enabled_p ())
> > {
> > - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > "Build SLP failed: different operation "
> > "in stmt %G", stmt);
> > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > @@ -2172,6 +2184,84 @@ calculate_unrolling_factor (poly_uint64 nunits, unsigned int group_size)
> > return exact_div (common_multiple (nunits, group_size), group_size);
> > }
> >
> > +/* Helper function of vect_match_slp_patterns.
> > +
> > + Attempts to match patterns against the slp tree rooted in REF_NODE using
> > + VINFO. Patterns are matched in post-order traversal.
> > +
> > + If matching is successful the value in REF_NODE is updated and returned, if
> > + not then it is returned unchanged. */
> > +
> > +static bool
> > +vect_match_slp_patterns_2 (slp_tree *ref_node, vec_info *vinfo,
> > + slp_tree_to_load_perm_map_t *perm_cache,
> > + hash_set<slp_tree> *visited)
> > +{
> > + unsigned i;
> > + slp_tree node = *ref_node;
> > + bool found_p = false;
> > + if (!node || visited->add (node))
> > + return false;
> > +
> > + slp_tree child;
> > + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
> > + found_p |= vect_match_slp_patterns_2 (&SLP_TREE_CHILDREN (node)[i],
> > + vinfo, perm_cache, visited);
> > +
> > + for (unsigned x = 0; x < num__slp_patterns; x++)
> > + {
> > + vect_pattern *pattern = slp_patterns[x] (perm_cache, ref_node);
> > + if (pattern)
> > + {
> > + pattern->build (vinfo);
> > + delete pattern;
> > + found_p = true;
> > + }
> > + }
> > +
> > + return found_p;
> > +}
> > +
> > +/* Applies pattern matching to the given SLP tree rooted in REF_NODE using
> > + vec_info VINFO.
> > +
> > + The modified tree is returned. Patterns are tried in order and multiple
> > + patterns may match. */
> > +
> > +static bool
> > +vect_match_slp_patterns (slp_instance instance, vec_info *vinfo,
> > + hash_set<slp_tree> *visited,
> > + slp_tree_to_load_perm_map_t *perm_cache,
> > + scalar_stmts_to_slp_tree_map_t * /* bst_map */)
> > +{
> > + DUMP_VECT_SCOPE ("vect_match_slp_patterns");
> > + slp_tree *ref_node = &SLP_INSTANCE_TREE (instance);
> > +
> > + if (dump_enabled_p ())
> > + dump_printf_loc (MSG_NOTE, vect_location,
> > + "Analyzing SLP tree %p for patterns\n",
> > + SLP_INSTANCE_TREE (instance));
> > +
> > + bool found_p
> > + = vect_match_slp_patterns_2 (ref_node, vinfo, perm_cache, visited);
> > +
> > + if (found_p)
> > + {
> > + if (dump_enabled_p ())
> > + {
> > + dump_printf_loc (MSG_NOTE, vect_location,
> > + "Pattern matched SLP tree\n");
> > + vect_print_slp_graph (MSG_NOTE, vect_location, *ref_node);
> > + }
> > + }
> > +
> > + return found_p;
> > +}
> > +
> > +/* Analyze an SLP instance starting from a group of grouped stores. Call
> > + vect_build_slp_tree to build a tree of packed stmts if possible.
> > + Return FALSE if it's impossible to SLP any stmt in the loop. */
> > +
> > static bool
> > vect_analyze_slp_instance (vec_info *vinfo,
> > scalar_stmts_to_slp_tree_map_t *bst_map,
> > @@ -2537,6 +2627,7 @@ vect_analyze_slp (vec_info *vinfo, unsigned max_tree_size)
> > {
> > unsigned int i;
> > stmt_vec_info first_element;
> > + slp_instance instance;
> >
> > DUMP_VECT_SCOPE ("vect_analyze_slp");
> >
> > @@ -2583,6 +2674,13 @@ vect_analyze_slp (vec_info *vinfo, unsigned max_tree_size)
> > slp_inst_kind_reduc_group, max_tree_size);
> > }
> >
> > + hash_set<slp_tree> visited_patterns;
> > + slp_tree_to_load_perm_map_t perm_cache;
> > + /* See if any patterns can be found in the SLP tree. */
> > + FOR_EACH_VEC_ELT (LOOP_VINFO_SLP_INSTANCES (vinfo), i, instance)
> > + vect_match_slp_patterns (instance, vinfo, &visited_patterns, &perm_cache,
> > + bst_map);
> > +
> > /* The map keeps a reference on SLP nodes built, release that. */
> > for (scalar_stmts_to_slp_tree_map_t::iterator it = bst_map->begin ();
> > it != bst_map->end (); ++it)
> > @@ -3902,7 +4000,7 @@ vect_bb_partition_graph (bb_vec_info bb_vinfo)
> > and return it. Do not account defs that are marked in LIFE and
> > update LIFE according to uses of NODE. */
> >
> > -static void
> > +static void
> > vect_bb_slp_scalar_cost (vec_info *vinfo,
> > slp_tree node, vec<bool, va_heap> *life,
> > stmt_vector_for_cost *cost_vec,
> > @@ -3913,7 +4011,7 @@ vect_bb_slp_scalar_cost (vec_info *vinfo,
> > slp_tree child;
> >
> > if (visited.add (node))
> > - return;
> > + return;
> >
> > FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt_info)
> > {
> > @@ -4222,7 +4320,7 @@ vect_slp_analyze_bb_1 (bb_vec_info bb_vinfo, int n_stmts, bool &fatal,
> > {
> > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > "Failed to SLP the basic block.\n");
> > - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > "not vectorized: failed to find SLP opportunities "
> > "in basic block.\n");
> > }
> > @@ -5090,7 +5188,7 @@ vect_transform_slp_perm_load (vec_info *vinfo,
> > if (!analyze_only)
> > {
> > tree mask_vec = NULL_TREE;
> > -
> > +
> > if (! noop_p)
> > mask_vec = vect_gen_perm_mask_checked (vectype, indices);
> >
> > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> > index 0ee4ef32eb2dbe2242327a6ed61c1245a0f59ce6..ec013f0ab27cdcc83b9a68781a0f13bd687d43c3 100644
> > --- a/gcc/tree-vectorizer.h
> > +++ b/gcc/tree-vectorizer.h
> > @@ -27,6 +27,7 @@ typedef class _stmt_vec_info *stmt_vec_info;
> > #include "tree-hash-traits.h"
> > #include "target.h"
> > #include "alloc-pool.h"
> > +#include "internal-fn.h"
> >
> >
> > /* Used for naming of new temporaries. */
> > @@ -115,6 +116,8 @@ typedef hash_map<tree_operand_hash,
> > SLP
> > ************************************************************************/
> > typedef struct _slp_tree *slp_tree;
> > +typedef vec<std::pair<unsigned, unsigned> > lane_permutation_t;
> > +typedef vec<unsigned> load_permutation_t;
> >
> > extern object_allocator<_slp_tree> *slp_tree_pool;
> >
> > @@ -137,11 +140,11 @@ struct _slp_tree {
> >
> > /* Load permutation relative to the stores, NULL if there is no
> > permutation. */
> > - vec<unsigned> load_permutation;
> > + load_permutation_t load_permutation;
> > /* Lane permutation of the operands scalar lanes encoded as pairs
> > of { operand number, lane number }. The number of elements
> > denotes the number of output lanes. */
> > - vec<std::pair<unsigned, unsigned> > lane_permutation;
> > + lane_permutation_t lane_permutation;
> >
> > tree vectype;
> > /* Vectorized stmt/s. */
> > @@ -361,6 +364,7 @@ public:
> > ~vec_info ();
> >
> > stmt_vec_info add_stmt (gimple *);
> > + stmt_vec_info add_pattern_stmt (gimple *, stmt_vec_info);
> > stmt_vec_info lookup_stmt (gimple *);
> > stmt_vec_info lookup_def (tree);
> > stmt_vec_info lookup_single_use (tree);
> > @@ -406,7 +410,7 @@ public:
> >
> > private:
> > stmt_vec_info new_stmt_vec_info (gimple *stmt);
> > - void set_vinfo_for_stmt (gimple *, stmt_vec_info);
> > + void set_vinfo_for_stmt (gimple *, stmt_vec_info, bool = true);
> > void free_stmt_vec_infos ();
> > void free_stmt_vec_info (stmt_vec_info);
> > };
> > @@ -1990,8 +1994,13 @@ extern void duplicate_and_interleave (vec_info *, gimple_seq *, tree,
> > vec<tree>, unsigned int, vec<tree> &);
> > extern int vect_get_place_in_interleaving_chain (stmt_vec_info, stmt_vec_info);
> > extern bool vect_update_shared_vectype (stmt_vec_info, tree);
> > +extern slp_tree vect_create_new_slp_node (unsigned, tree_code);
> > +extern void vect_free_slp_tree (slp_tree);
> >
> > /* In tree-vect-patterns.c. */
> > +extern void
> > +vect_mark_pattern_stmts (vec_info *, stmt_vec_info, gimple *, tree);
> > +
> > /* Pattern recognition functions.
> > Additional pattern recognition functions can (and will) be added
> > in the future. */
> > @@ -2003,4 +2012,84 @@ void vect_free_loop_info_assumptions (class loop *);
> > gimple *vect_loop_vectorized_call (class loop *, gcond **cond = NULL);
> > bool vect_stmt_dominates_stmt_p (gimple *, gimple *);
> >
> > +/* SLP Pattern matcher types, tree-vect-slp-patterns.c. */
> > +
> > +/* Forward declaration of possible two operands operation that can be matched
> > + by the complex numbers pattern matchers. */
> > +enum _complex_operation : unsigned;
> > +
> > +/* All possible load permute values that could result from the partial data-flow
> > + analysis. */
> > +typedef enum _complex_perm_kinds {
> > + PERM_UNKNOWN,
> > + PERM_EVENODD,
> > + PERM_ODDEVEN,
> > + PERM_ODDODD,
> > + PERM_EVENEVEN,
> > + /* Can be combined with any other PERM values. */
> > + PERM_TOP
> > +} complex_perm_kinds_t;
> > +
> > +/* A pair with a load permute and a corresponding complex_perm_kind which gives
> > + information about the load it represents. */
> > +typedef std::pair<complex_perm_kinds_t, load_permutation_t>
> > + complex_load_perm_t;
> > +
> > +/* Cache from nodes to the load permutation they represent. */
> > +typedef hash_map <slp_tree, complex_load_perm_t>
> > + slp_tree_to_load_perm_map_t;
> > +
> > +/* Vector pattern matcher base class. All SLP pattern matchers must inherit
> > + from this type. */
> > +
> > +class vect_pattern
> > +{
> > + protected:
> > + /* The number of arguments that the IFN requires. */
> > + unsigned m_num_args;
> > +
> > + /* The internal function that will be used when a pattern is created. */
> > + internal_fn m_ifn;
> > +
> > + /* The current node being inspected. */
> > + slp_tree *m_node;
> > +
> > + /* The list of operands to be the children for the node produced when the
> > + internal function is created. */
> > + vec<slp_tree> m_ops;
> > +
> > + /* Default constructor where NODE is the root of the tree to inspect. */
> > + vect_pattern (slp_tree *node, vec<slp_tree> *m_ops, internal_fn ifn)
> > + {
> > + this->m_ifn = ifn;
> > + this->m_node = node;
> > + this->m_ops.create (0);
> > + this->m_ops.safe_splice (*m_ops);
> > + }
> > +
> > + public:
> > +
> > + /* Create a new instance of the pattern matcher class of the given type. */
> > + static vect_pattern* recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
> > +
> > + /* Build the pattern from the data collected so far. */
> > + virtual void build (vec_info *) = 0;
> > +
> > + /* Default destructor. */
> > + virtual ~vect_pattern ()
> > + {
> > + this->m_ops.release ();
> > + }
> > +};
> > +
> > +/* Function pointer to create a new pattern matcher from a generic type. */
> > +typedef vect_pattern* (*vect_pattern_decl_t) (slp_tree_to_load_perm_map_t *,
> > + slp_tree *);
> > +
> > +/* List of supported pattern matchers. */
> > +extern vect_pattern_decl_t slp_patterns[];
> > +
> > +/* Number of supported pattern matchers. */
> > +extern size_t num__slp_patterns;
> > +
> > #endif /* GCC_TREE_VECTORIZER_H */
> > diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
> > index b63dda31a0839b094985d306a993503cc00dd7eb..d81774b242569262a51b7be02815acd6d1a6bfd0 100644
> > --- a/gcc/tree-vectorizer.c
> > +++ b/gcc/tree-vectorizer.c
> > @@ -525,6 +525,19 @@ vec_info::add_stmt (gimple *stmt)
> > return res;
> > }
> >
> > +/* Record that STMT belongs to the vectorizable region. Create a new
> > + stmt_vec_info and mark VECINFO as being related and return the new
> > + stmt_vec_info. */
> > +
> > +stmt_vec_info
> > +vec_info::add_pattern_stmt (gimple *stmt, stmt_vec_info stmt_info)
> > +{
> > + stmt_vec_info res = new_stmt_vec_info (stmt);
> > + set_vinfo_for_stmt (stmt, res, false);
> > + STMT_VINFO_RELATED_STMT (res) = stmt_info;
> > + return res;
> > +}
> > +
> > /* If STMT has an associated stmt_vec_info, return that vec_info, otherwise
> > return null. It is safe to call this function on any statement, even if
> > it might not be part of the vectorizable region. */
> > @@ -702,12 +715,12 @@ vec_info::new_stmt_vec_info (gimple *stmt)
> > /* Associate STMT with INFO. */
> >
> > void
> > -vec_info::set_vinfo_for_stmt (gimple *stmt, stmt_vec_info info)
> > +vec_info::set_vinfo_for_stmt (gimple *stmt, stmt_vec_info info, bool check_ro)
> > {
> > unsigned int uid = gimple_uid (stmt);
> > if (uid == 0)
> > {
> > - gcc_assert (!stmt_vec_info_ro);
> > + gcc_assert (!check_ro || !stmt_vec_info_ro);
> > gcc_checking_assert (info);
> > uid = stmt_vec_infos.length () + 1;
> > gimple_set_uid (stmt, uid);
> >
> >
> > The 11/27/2020 10:30, Richard Biener wrote:
> > > On Thu, 26 Nov 2020, Tamar Christina wrote:
> > >
> > > > Hi Richi,
> > > >
> > > > > -----Original Message-----
> > > > > From: Richard Biener <rguenther@suse.de>
> > > > > Sent: Tuesday, November 24, 2020 2:15 PM
> > > > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; ook@ucw.cz;
> > > > > hongtao.liu@intel.com
> > > > > Subject: RE: [PATCH] middle-end: Support complex Addition
> > > > >
> > > > > On Tue, 24 Nov 2020, Tamar Christina wrote:
> > > > >
> > > > > >
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Richard Biener <rguenther@suse.de>
> > > > > > > Sent: Tuesday, November 24, 2020 12:24 PM
> > > > > > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > > > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; ook@ucw.cz;
> > > > > > > hongtao.liu@intel.com
> > > > > > > Subject: RE: [PATCH] middle-end: Support complex Addition
> > > > > > >
> > > > > > > On Tue, 24 Nov 2020, Tamar Christina wrote:
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: Richard Biener <rguenther@suse.de>
> > > > > > > > > Sent: Tuesday, November 24, 2020 10:54 AM
> > > > > > > > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > > > > > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; ook@ucw.cz;
> > > > > > > > > hongtao.liu@intel.com
> > > > > > > > > Subject: RE: [PATCH] middle-end: Support complex Addition
> > > > > > > > >
> > > > > > > > > On Tue, 24 Nov 2020, Richard Biener wrote:
> > > > > > > > >
> > > > > > > > > > On Mon, 23 Nov 2020, Tamar Christina wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Richi,
> > > > > > > > > > >
> > > > > > > > > > > > -----Original Message-----
> > > > > > > > > > > > From: Richard Biener <rguenther@suse.de>
> > > > > > > > > > > > Sent: Monday, November 23, 2020 3:51 PM
> > > > > > > > > > > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > > > > > > > > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; ook@ucw.cz;
> > > > > > > > > > > > hongtao.liu@intel.com
> > > > > > > > > > > > Subject: Re: [PATCH] middle-end: Support complex Addition
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, 23 Nov 2020, Tamar Christina wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi All,
> > > > > > > > > > > > >
> > > > > > > > > > > > > This patch adds support for
> > > > > > > > > > > > >
> > > > > > > > > > > > > * Complex Addition with rotation of 90 and 270.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Addition with rotation of the second argument around the
> > > > > > > Argand
> > > > > > > > > plane.
> > > > > > > > > > > > > Supported rotations are 90 and 180.
> > > > > > > > > > > > >
> > > > > > > > > > > > > c = a + (b * I) and c = a + (b * I * I * I)
> > > > > > > > > > > > >
> > > > > > > > > > > > > For the full code I have pushed a branch at
> > > > > > > > > > > > refs/users/tnfchris/heads/complex-numbers.
> > > > > > > > > > > > >
> > > > > > > > > > > > > As a side note, I still needed to set
> > > > > > > > > > > > >
> > > > > > > > > > > > > STMT_SLP_TYPE (call_stmt_info) = pure_slp;
> > > > > > > > > > > > >
> > > > > > > > > > > > > as the new hybrid detection code only runs for loop aware SLP.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no
> > > > > issues,
> > > > > > > > > but
> > > > > > > > > > > > > sorting out the testcases as TCL is processed before the CPP..
> > > > > > > > > > > > >
> > > > > > > > > > > > > Ok for master?
> > > > > > > > > > > >
> > > > > > > > > > > > So I failed to apply this patch (and after manual fixup build).
> > > > > > > > > > > > I went ahead and checked out the branch, patching the tree
> > > > > with
> > > > > > > > > > > > x86 support for cadd90 with -msse3 or -mavx2 using the
> > > > > attached
> > > > > > > > > > > > patch.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > It requires a patch you have previously approved pending the rest
> > > > > so
> > > > > > > it's
> > > > > > > > > not committed yet ?
> > > > > > > > > >
> > > > > > > > > > Ah, I missed that.
> > > > > > > > > >
> > > > > > > > > > > > For
> > > > > > > > > > > >
> > > > > > > > > > > > double c[1024], b[1024], a[1024];
> > > > > > > > > > > >
> > > > > > > > > > > > void foo ()
> > > > > > > > > > > > {
> > > > > > > > > > > > for (int i = 0; i < 512; ++i)
> > > > > > > > > > > > {
> > > > > > > > > > > > c[2*i] = a[2*i] - b[2*i+1];
> > > > > > > > > > > > c[2*i+1] = a[2*i+1] + b[2*i];
> > > > > > > > > > > > }
> > > > > > > > > > > > }
> > > > > > > > > > > >
> > > > > > > > > > > > I then see
> > > > > > > > > > > >
> > > > > > > > > > > > t.c:5:21: note: Analyzing SLP tree 0x39c0010 for patterns
> > > > > > > > > > > > t.c:5:21: note: Found COMPLEX_ADD_ROT90 pattern in SLP
> > > > > tree
> > > > > > > > > > > > t.c:5:21: note: Target supports COMPLEX_ADD_ROT90
> > > > > > > vectorization
> > > > > > > > > with
> > > > > > > > > > > > mode vector(2) double
> > > > > > > > > > > > t.c:5:21: note: Pattern matched SLP tree
> > > > > > > > > > > > t.c:5:21: note: node 0x39c0010 (max_nunits=2, refcnt=2)
> > > > > > > > > > > > t.c:5:21: note: op template: c[_1] = _5;
> > > > > > > > > > > > t.c:5:21: note: stmt 0 c[_1] = _5;
> > > > > > > > > > > > t.c:5:21: note: stmt 1 c[_3] = _8;
> > > > > > > > > > > > t.c:5:21: note: children 0x39c0080
> > > > > > > > > > > > t.c:5:21: note: node 0x39c0080 (max_nunits=2, refcnt=2)
> > > > > > > > > > > > t.c:5:21: note: op template: slp_patt_29
> > > > > = .COMPLEX_ADD_ROT90
> > > > > > > (_5,
> > > > > > > > > _5);
> > > > > > > > > > > > t.c:5:21: note: stmt 0 _5 = _2 - _4;
> > > > > > > > > > > > t.c:5:21: note: stmt 1 _8 = _6 + _7;
> > > > > > > > > > > > t.c:5:21: note: lane permutation { 0[0] 1[1] }
> > > > > > > > > > > > t.c:5:21: note: children 0x39c00f0 0x39c02b0
> > > > > > > > > > > > t.c:5:21: note: node 0x39c00f0 (max_nunits=2, refcnt=2)
> > > > > > > > > > > > t.c:5:21: note: op template: _2 = a[_1];
> > > > > > > > > > > > t.c:5:21: note: stmt 0 _2 = a[_1];
> > > > > > > > > > > > t.c:5:21: note: stmt 1 _6 = a[_3];
> > > > > > > > > > > > t.c:5:21: note: load permutation { 0 1 }
> > > > > > > > > > > > t.c:5:21: note: node 0x39c02b0 (max_nunits=1, refcnt=1)
> > > > > > > > > > > > t.c:5:21: note: op: VEC_PERM_EXPR
> > > > > > > > > > > > t.c:5:21: note: { }
> > > > > > > > > > > > t.c:5:21: note: lane permutation { 0[1] 0[0] }
> > > > > > > > > > > > t.c:5:21: note: children 0x39c0160
> > > > > > > > > > > > t.c:5:21: note: node 0x39c0160 (max_nunits=2, refcnt=2)
> > > > > > > > > > > > t.c:5:21: note: op template: _4 = b[_3];
> > > > > > > > > > > > t.c:5:21: note: stmt 0 _4 = b[_3];
> > > > > > > > > > > > t.c:5:21: note: stmt 1 _7 = b[_1];
> > > > > > > > > > > > t.c:5:21: note: load permutation { 1 0 }
> > > > > > > > > > > >
> > > > > > > > > > > > I'm confused about the lane permutation in
> > > > > > > > > the .COMPLEX_ADD_ROT90
> > > > > > > > > > > > node (I guess this permutation is simply ignored by code-
> > > > > > > generation).
> > > > > > > > > > > > Should it not be there?
> > > > > > > > > > >
> > > > > > > > > > > Yes, I had completely missed that. I forgot to blank it out.
> > > > > > > > > >
> > > > > > > > > > Btw, in this context
> > > > > > > > > >
> > > > > > > > > > /* Unfortunately still need this on the new pattern because non-
> > > > > > > loop
> > > > > > > > > > SLP
> > > > > > > > > > doesn't call vect_detect_hybrid_slp so it never updates it. */
> > > > > > > > > > STMT_SLP_TYPE (call_stmt_info) = pure_slp;
> > > > > > > > > >
> > > > > > > > > > this isnt' about the hybrid marker but about vect_mark_slp_stmts
> > > > > > > > > > which marks all stmts participating in the SLP graph with pure_slp
> > > > > > > > > > which only marks SLP_TREE_SCALAR_STMTS but not
> > > > > > > > > SLP_TREE_REPRESENTATIVE.
> > > > > > > > > > I think that's OK and thus the above setting of pure_slp is OK as well,
> > > > > > > > > > just the comment is off. Maybe make it "Make sure to mark the
> > > > > > > > > > representative statement pure_slp and relevant".
> > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Otherwise the outcome is now as expected. Permute
> > > > > optimization
> > > > > > > > > > > > later produces
> > > > > > > > > > > >
> > > > > > > > > > > > t.c:5:21: note: node 0x39c0080 (max_nunits=2, refcnt=1)
> > > > > > > > > > > > t.c:5:21: note: op template: slp_patt_29
> > > > > = .COMPLEX_ADD_ROT90
> > > > > > > (_5,
> > > > > > > > > _5);
> > > > > > > > > > > > t.c:5:21: note: stmt 0 _5 = _2 - _4;
> > > > > > > > > > > > t.c:5:21: note: stmt 1 _8 = _6 + _7;
> > > > > > > > > > > > t.c:5:21: note: lane permutation { 0[0] 1[1] }
> > > > > > > > > > > > t.c:5:21: note: children 0x39c00f0 0x39c02b0
> > > > > > > > > > > > ...
> > > > > > > > > > > > t.c:5:21: note: node 0x39c02b0 (max_nunits=1, refcnt=1)
> > > > > > > > > > > > t.c:5:21: note: op: VEC_PERM_EXPR
> > > > > > > > > > > > t.c:5:21: note: { }
> > > > > > > > > > > > t.c:5:21: note: lane permutation { 0[0] 0[1] }
> > > > > > > > > > > > t.c:5:21: note: children 0x39c0160
> > > > > > > > > > > > t.c:5:21: note: node 0x39c0160 (max_nunits=2, refcnt=1)
> > > > > > > > > > > > t.c:5:21: note: op template: _4 = b[_3];
> > > > > > > > > > > > t.c:5:21: note: stmt 0 _7 = b[_1];
> > > > > > > > > > > > t.c:5:21: note: stmt 1 _4 = b[_3];
> > > > > > > > > > > >
> > > > > > > > > > > > where the noop permute is correctly costed (and thus is just a
> > > > > > > > > > > > cosmetic annoyance):
> > > > > > > > > > > >
> > > > > > > > > > > > 0x3a13870 a[_1] 1 times vector_load costs 12 in body
> > > > > > > > > > > > 0x3a13870 b[_1] 1 times vector_load costs 12 in body
> > > > > > > > > > > > 0x3a13870 <unknown> 0 times vec_perm costs 0 in body
> > > > > > > > > > > > 0x3a13870 .COMPLEX_ADD_ROT90 (_5, _5) 1 times vector_stmt
> > > > > > > costs
> > > > > > > > > 12 in
> > > > > > > > > > > > body
> > > > > > > > > > > > 0x3a13870 _5 1 times vector_store costs 12 in body
> > > > > > > > > > > >
> > > > > > > > > > > > Code generated is also superior (-msse3):
> > > > > > > > > > > >
> > > > > > > > > > > > .L2:
> > > > > > > > > > > > movapd a(%rax), %xmm0
> > > > > > > > > > > > addsubpd b(%rax), %xmm0
> > > > > > > > > > > > addq $16, %rax
> > > > > > > > > > > > movaps %xmm0, c-16(%rax)
> > > > > > > > > > > > cmpq $8192, %rax
> > > > > > > > > > > > jne .L2
> > > > > > > > > > > >
> > > > > > > > > > > > compared to GCC 10 where we have an extra permute
> > > > > > > > > > > >
> > > > > > > > > > > > .L2:
> > > > > > > > > > > > movapd b(%rax), %xmm0
> > > > > > > > > > > > movapd a(%rax), %xmm1
> > > > > > > > > > > > addq $16, %rax
> > > > > > > > > > > > shufpd $1, %xmm0, %xmm0
> > > > > > > > > > > > addsubpd %xmm0, %xmm1
> > > > > > > > > > > > movaps %xmm1, c-16(%rax)
> > > > > > > > > > > > cmpq $8192, %rax
> > > > > > > > > > > > jne .L2
> > > > > > > > > > > >
> > > > > > > > > > > > which of course makes me wonder whether I have done the x86
> > > > > > > > > > > > support correctly. Ah, I have not. The x86 instructions
> > > > > > > > > > > > do not embed the even/odd lane swap, they just do the mixed
> > > > > > > > > > > > sign operation. So for those we'd need additional optabs
> > > > > > > > > > > > and patterns then.
> > > > > > > > > > > >
> > > > > > > > > > > > So I see the branch contains only the complex add so I'm
> > > > > > > > > > > > going through the changes there:
> > > > > > > > > > >
> > > > > > > > > > > Yes I'm still updating MUL, FMA and FMS are tiny extensions to
> > > > > MUL.
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > /* Create an SLP node for SCALAR_STMTS. */
> > > > > > > > > > > >
> > > > > > > > > > > > -static slp_tree
> > > > > > > > > > > > +slp_tree
> > > > > > > > > > > > vect_create_new_slp_node (slp_tree node,
> > > > > > > > > > > > vec<stmt_vec_info> scalar_stmts, unsigned nops)
> > > > > > > > > > > > {
> > > > > > > > > > > > SLP_TREE_SCALAR_STMTS (node) = scalar_stmts;
> > > > > > > > > > > > SLP_TREE_CHILDREN (node).create (nops);
> > > > > > > > > > > > SLP_TREE_DEF_TYPE (node) = vect_internal_def;
> > > > > > > > > > > > - SLP_TREE_REPRESENTATIVE (node) = scalar_stmts[0];
> > > > > > > > > > > > - SLP_TREE_LANES (node) = scalar_stmts.length ();
> > > > > > > > > > > > + if (scalar_stmts.exists ())
> > > > > > > > > > > > + {
> > > > > > > > > > > > + SLP_TREE_REPRESENTATIVE (node) = scalar_stmts[0];
> > > > > > > > > > > > + SLP_TREE_LANES (node) = scalar_stmts.length ();
> > > > > > > > > > > > + }
> > > > > > > > > > > > return node;
> > > > > > > > > > > > }
> > > > > > > > > > > >
> > > > > > > > > > > > so I don't like that very much, I guess we instead want a
> > > > > > > > > > > >
> > > > > > > > > > > > vect_create_new_perm_node (slp_node node, nops)
> > > > > > > > > > > >
> > > > > > > > > > > > which can pre-fill SLP_TREE_CODE.
> > > > > > > > > > > >
> > > > > > > > > > > > You add testsuite/gcc.dg/vect/complex/ but there's neither an
> > > > > > > > > > > > .exp file in it nor is it sourced from vect.exp - I suppose
> > > > > > > > > > > > some bits are missing here on the branch?
> > > > > > > > > > >
> > > > > > > > > > > Ugg, sorry... I forgot a git add...
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > +typedef enum _complex_operation : unsigned {
> > > > > > > > > > > >
> > > > > > > > > > > > uh, oh - C++ I don't know. Is : unsigned required?
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > It requires an enum base, so either enum E : int or enum class E,
> > > > > > > > > > > which apparently defaults to int.
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > +/* Check to see if all loads rooted in ROOT are linear. Linearity
> > > > > is
> > > > > > > > > > > > + defined as having no gaps between values loaded. */
> > > > > > > > > > > >
> > > > > > > > > > > > what is actually returned?
> > > > > > > > > > >
> > > > > > > > > > > It returns the load permute that the node being inspected would
> > > > > > > > > produce.
> > > > > > > > > > > Or rather, it shows how the data flows through the tree rooted at
> > > > > that
> > > > > > > > > node.
> > > > > > > > > > >
> > > > > > > > > > > It's used a to determine if the operation being done does the
> > > > > > > odd/even
> > > > > > > > > lane
> > > > > > > > > > > swapping. This becomes more important for MUL as I need to
> > > > > > > distinguish
> > > > > > > > > between
> > > > > > > > > > > a conjucate and a rotation. Both of which produce just a negate
> > > > > node,
> > > > > > > > > but what they
> > > > > > > > > > > negate determines what the operation is.
> > > > > > > > > >
> > > > > > > > > > So it basically computes what optimize_slp does in its dataflow of
> > > > > > > > > > permutes? But you do
> > > > > > > > > >
> > > > > > > > > > auto_vec<load_permutation_t> all_loads;
> > > > > > > > > > bool is_perm = SLP_TREE_LANE_PERMUTATION (root).exists ();
> > > > > > > > > >
> > > > > > > > > > slp_tree child;
> > > > > > > > > > FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (root), i, child)
> > > > > > > > > > {
> > > > > > > > > > loads = linear_loads_p (perm_cache, child, linear);
> > > > > > > > > > if ((!*linear && !is_perm) || !loads.exists ())
> > > > > > > > > > return loads;
> > > > > > > > > >
> > > > > > > > > > so when there's a branch in the SLP graph and either one is
> > > > > > > > > > not linear you return the permute on that branch? Or if there
> > > > > > > > > > isn't any permute on one branch you return that. Whatever comes
> > > > > > > > > > first? The code misses at least comments explaining on what
> > > > > > > > > > it computes for the root of a SLP subgraph (note the graph can
> > > > > > > > > > now be cyclic as to where I don't really see how that is handled
> > > > > > > > > > here). (**)
> > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > +static load_permutation_t
> > > > > > > > > > > > +linear_loads_p (slp_tree_to_load_perm_map_t *perm_cache,
> > > > > > > > > slp_tree
> > > > > > > > > > > > root,
> > > > > > > > > > > > + bool *linear)
> > > > > > > > > > > > +{
> > > > > > > > > > > > ...
> > > > > > > > > > > > + else if (SLP_TREE_DEF_TYPE (root) == vect_external_def)
> > > > > > > > > > > > + {
> > > > > > > > > > > > + loads.create (SLP_TREE_LANES (root));
> > > > > > > > > > > >
> > > > > > > > > > > > it's weird that you need to dig into vect_external_defs - if the
> > > > > > > > > > > > vectorizer for whatever reason decided to not make the defs
> > > > > > > internal
> > > > > > > > > > > > you shouldn't pick them up here?
> > > > > > > > > > >
> > > > > > > > > > > I do so because for the purposes of these instructions you need to
> > > > > > > have
> > > > > > > > > an
> > > > > > > > > > > alternating sequence. If you say have the same externals { _a , _a }
> > > > > > > that
> > > > > > > > > operation
> > > > > > > > > > > isn't what the instruction expects. Accepting random externals
> > > > > was
> > > > > > > also
> > > > > > > > > causing ICEs
> > > > > > > > > > > when compiling SPECFP 2017 but didn't look too deeply into this as
> > > > > I
> > > > > > > > > couldn't convince
> > > > > > > > > > > myself that it should match these.
> > > > > > > > > >
> > > > > > > > > > Did you actually run into a testcase with external loads?
> > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > + typedef const std::pair<unsigned, unsigned>* cmp_t;
> > > > > > > > > > > > + zipped.qsort ([](const void *a, const void *b) -> int
> > > > > > > > > > > > + { return (int)((cmp_t)a)->first - (int)((cmp_t)b)->first; });
> > > > > > > > > > > >
> > > > > > > > > > > > are we supposed to use lambdas? I guess not.
> > > > > > > > > > >
> > > > > > > > > > > Oh.. wasn't aware lambdas weren't allowed.. I'll make it a function.
> > > > > > > > > >
> > > > > > > > > > Jakub says lambdas are OK, so whatever pleases you more.
> > > > > > > > > >
> > > > > > > > > > (**) so here you are computing a permute to undo that very exact
> > > > > > > > > > permute you discovered earlier - but I don't see how that
> > > > > discovered
> > > > > > > > > > permute is reality?
> > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Anyway, I wonder why we need to make the SLP children
> > > > > "linear"
> > > > > > > > > > > > in the first place?
> > > > > > > > > > >
> > > > > > > > > > > Because the instruction does the permute internally.
> > > > > > > > > > > It really is reflecting complex arithmetic.
> > > > > > > > > >
> > > > > > > > > > Yes, I understand.
> > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > That said, I wonder whether the x86 pattern here is more
> > > > > sensible
> > > > > > > > > > > > since if you have a sequence of complex adds I'm not sure your
> > > > > > > > > > > > "linear verifier" gets things optimal? That is, in case this
> > > > > > > > > > > > is not single complex operations but in Ca + Cb Cb ends up
> > > > > > > > > > > > a complex expression. If the ARM complex vector operation
> > > > > > > > > > > > swaps even/odd lanes of the second operand then wouldn't it
> > > > > > > > > > > > be better (and easier) to match
> > > > > > > > > > > >
> > > > > > > > > > > > a0 = b0 - c0;
> > > > > > > > > > > > a1 = b1 - c1;
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I assume the second one should be a +?
> > > > > > > > > >
> > > > > > > > > > Yes, sorry.
> > > > > > > > > >
> > > > > > > > > > > > as
> > > > > > > > > > > >
> > > > > > > > > > > > a = cadd90 (b, perm(c, { 1, 0}))
> > > > > > > > > > > >
> > > > > > > > > > > > and make the "anticipated" permute of the second operand
> > > > > part
> > > > > > > > > > > > of the actual pattern and to be eventually optimized by
> > > > > > > > > > > > permute optimization? Because it's still cheaper than
> > > > > > > > > > > > what we have from the two-operator handling, namely
> > > > > > > > > > > > add, subtract and permute. The SLP trees pasted above
> > > > > > > > > > > > do suggest that you add the anticipated permute operation
> > > > > > > > > > > > so I wonder whether all the linearization is just premature here?
> > > > > > > > > > >
> > > > > > > > > > > Consider add270:
> > > > > > > > > > >
> > > > > > > > > > > for (int i=0; i < N; i++)
> > > > > > > > > > > c[i] = a[i] + (b[i] * I * I * I);
> > > > > > > > > > >
> > > > > > > > > > > note: Final SLP tree for instance 0x4461b30:
> > > > > > > > > > > note: node 0x436c9c0 (max_nunits=4, refcnt=2)
> > > > > > > > > > > note: op template: REALPART_EXPR <*_10> = _23;
> > > > > > > > > > > note: stmt 0 REALPART_EXPR <*_10> = _23;
> > > > > > > > > > > note: stmt 1 IMAGPART_EXPR <*_10> = _4;
> > > > > > > > > > > note: children 0x436ca38
> > > > > > > > > > > note: node 0x436ca38 (max_nunits=4, refcnt=2)
> > > > > > > > > > > note: op: VEC_PERM_EXPR
> > > > > > > > > > > note: stmt 0 _23 = _6 + _13;
> > > > > > > > > > > note: stmt 1 _4 = _12 - _7;
> > > > > > > > > > > note: lane permutation { 0[0] 1[1] }
> > > > > > > > > > > note: children 0x436cba0 0x436cc18
> > > > > > > > > > > note: node 0x436cba0 (max_nunits=1, refcnt=1)
> > > > > > > > > > > note: op template: _23 = _6 + _13;
> > > > > > > > > > > note: { }
> > > > > > > > > > > note: children 0x436cab0 0x436cb28
> > > > > > > > > > > note: node 0x436cab0 (max_nunits=4, refcnt=3)
> > > > > > > > > > > note: op template: _13 = REALPART_EXPR <*_3>;
> > > > > > > > > > > note: stmt 0 _13 = REALPART_EXPR <*_3>;
> > > > > > > > > > > note: stmt 1 _12 = IMAGPART_EXPR <*_3>;
> > > > > > > > > > > note: load permutation { 0 1 }
> > > > > > > > > > > note: node 0x436cb28 (max_nunits=4, refcnt=3)
> > > > > > > > > > > note: op template: _6 = IMAGPART_EXPR <*_5>;
> > > > > > > > > > > note: stmt 0 _6 = IMAGPART_EXPR <*_5>;
> > > > > > > > > > > note: stmt 1 _7 = REALPART_EXPR <*_5>;
> > > > > > > > > > > note: load permutation { 1 0 }
> > > > > > > > > > > note: node 0x436cc18 (max_nunits=1, refcnt=1)
> > > > > > > > > > > note: op template: _4 = _12 - _7;
> > > > > > > > > > > note: { }
> > > > > > > > > > > note: children 0x436cab0 0x436cb28
> > > > > > > > > > >
> > > > > > > > > > > and add_conj:
> > > > > > > > > > >
> > > > > > > > > > > for (int i=0; i < N; i++)
> > > > > > > > > > > c[i] = a[i] + conjf (b[i]);
> > > > > > > > > > >
> > > > > > > > > > > note: Final SLP tree for instance 0x4fbf5a0:
> > > > > > > > > > > note: node 0x505d910 (max_nunits=4, refcnt=2)
> > > > > > > > > > > note: op template: REALPART_EXPR <*_8> = _23;
> > > > > > > > > > > note: stmt 0 REALPART_EXPR <*_8> = _23;
> > > > > > > > > > > note: stmt 1 IMAGPART_EXPR <*_8> = _4;
> > > > > > > > > > > note: children 0x505d988
> > > > > > > > > > > note: node 0x505d988 (max_nunits=4, refcnt=2)
> > > > > > > > > > > note: op: VEC_PERM_EXPR
> > > > > > > > > > > note: stmt 0 _23 = _11 + _20;
> > > > > > > > > > > note: stmt 1 _4 = _10 - _19;
> > > > > > > > > > > note: lane permutation { 0[0] 1[1] }
> > > > > > > > > > > note: children 0x505daf0 0x505db68
> > > > > > > > > > > note: node 0x505daf0 (max_nunits=1, refcnt=1)
> > > > > > > > > > > note: op template: _23 = _11 + _20;
> > > > > > > > > > > note: { }
> > > > > > > > > > > note: children 0x505da00 0x505da78
> > > > > > > > > > > note: node 0x505da00 (max_nunits=4, refcnt=3)
> > > > > > > > > > > note: op template: _11 = REALPART_EXPR <*_3>;
> > > > > > > > > > > note: stmt 0 _11 = REALPART_EXPR <*_3>;
> > > > > > > > > > > note: stmt 1 _10 = IMAGPART_EXPR <*_3>;
> > > > > > > > > > > note: load permutation { 0 1 }
> > > > > > > > > > > note: node 0x505da78 (max_nunits=4, refcnt=3)
> > > > > > > > > > > note: op template: _20 = REALPART_EXPR <*_5>;
> > > > > > > > > > > note: stmt 0 _20 = REALPART_EXPR <*_5>;
> > > > > > > > > > > note: stmt 1 _19 = IMAGPART_EXPR <*_5>;
> > > > > > > > > > > note: load permutation { 0 1 }
> > > > > > > > > > > note: node 0x505db68 (max_nunits=1, refcnt=1)
> > > > > > > > > > > note: op template: _4 = _10 - _19;
> > > > > > > > > > > note: { }
> > > > > > > > > > > note: children 0x505da00 0x505da78
> > > > > > > > > > >
> > > > > > > > > > > These are virtually identical. Aside from the first one having a
> > > > > > > permute in
> > > > > > > > > > > 0x436cb28 being {1, 0} and the one in 0x505da78 being {0, 1}. But
> > > > > they
> > > > > > > > > > > are quite different operations. (in fact the conj case seems to
> > > > > match
> > > > > > > what
> > > > > > > > > x86 has).
> > > > > > > > > > >
> > > > > > > > > > > So the problem with not checking the permutes is that you would
> > > > > > > treat
> > > > > > > > > both of these
> > > > > > > > > > > the same and emit the instruction with the permute. Which
> > > > > would
> > > > > > > > > produce correct
> > > > > > > > > > > code but not necessarily efficient code.
> > > > > > > > > > >
> > > > > > > > > > > Swapping a {0, 1} permute is trivial, but accepting it means
> > > > > accepting
> > > > > > > any
> > > > > > > > > random permute
> > > > > > > > > > > where either the permute requires a general permute operation
> > > > > (TBL)
> > > > > > > > > which we cost quite
> > > > > > > > > > > high due to it's impact on register allocation and the fact it requires
> > > > > an
> > > > > > > > > index register to be
> > > > > > > > > > > loaded from memory.
> > > > > > > > > >
> > > > > > > > > > Hmm. With having all these subtly different operations natively
> > > > > > > available
> > > > > > > > > > this indeed complicates things. But then given a even/odd
> > > > > plus/minus
> > > > > > > > > > operation without a way to infer what permutation we are looking
> > > > > at
> > > > > > > > > > is there a good choice as to which of the even/odd lane instructions
> > > > > we
> > > > > > > > > > want to match? It sounds add_conj it should be, no?
> > > > > > > > > >
> > > > > > > > > > That said, it looks like a ordering issue with the permute
> > > > > optimization
> > > > > > > > > > phase to me.
> > > > > > > > > >
> > > > > > > > > > So if we go with some heuristic then what you try to do is figure
> > > > > > > > > > if one of the operands of the pattern matched operation is already
> > > > > > > > > > perfectly linear. For the operand the instruction can do a
> > > > > permutation
> > > > > > > > > > the exact permute cannot matter since you don't seem to compute
> > > > > an
> > > > > > > > > > exact permute but emit the "anticipated" one and leave the rest to
> > > > > > > > > > be (hopefully) optimized later. The important part (cost-wise)
> > > > > seems
> > > > > > > > > > to be to not anticipate a permute where there is none.
> > > > > > > > > >
> > > > > > > > > > > This means we will likely end up rejecting such cases based on cost
> > > > > > > alone
> > > > > > > > > and no longer
> > > > > > > > > > > vectorize in these cases.
> > > > > > > > > >
> > > > > > > > > > Is that so? Without matching any pattern you'd have a vector plus
> > > > > and
> > > > > > > > > > a vector minus and then a tbl combining both?
> > > > > > > > > >
> > > > > > > > > > > The other case is when I don't even know how to make it "fit" in
> > > > > the
> > > > > > > > > instruction. Consider:
> > > > > > > > > > >
> > > > > > > > > > > for (int i=0; i < N; i+=2)
> > > > > > > > > > > {
> > > > > > > > > > > c[i] = a[i] - b[i];
> > > > > > > > > > > c[i+1] = a[i+1] + b[i];
> > > > > > > > > > > }
> > > > > > > > > > >
> > > > > > > > > > > Which becomes
> > > > > > > > > > >
> > > > > > > > > > > note: Final SLP tree for instance 0x44e25a0:
> > > > > > > > > > > note: node 0x45703e0 (max_nunits=2, refcnt=2)
> > > > > > > > > > > note: op template: *_7 = _8;
> > > > > > > > > > > note: stmt 0 *_7 = _8;
> > > > > > > > > > > note: stmt 1 *_13 = _14;
> > > > > > > > > > > note: children 0x4570458
> > > > > > > > > > > note: node 0x4570458 (max_nunits=2, refcnt=2)
> > > > > > > > > > > note: op: VEC_PERM_EXPR
> > > > > > > > > > > note: stmt 0 _8 = _4 - _6;
> > > > > > > > > > > note: stmt 1 _14 = _6 + _12;
> > > > > > > > > > > note: lane permutation { 0[0] 1[1] }
> > > > > > > > > > > note: children 0x45705c0 0x4570638
> > > > > > > > > > > note: node 0x45705c0 (max_nunits=1, refcnt=1)
> > > > > > > > > > > note: op template: _8 = _4 - _6;
> > > > > > > > > > > note: { }
> > > > > > > > > > > note: children 0x45704d0 0x4570548
> > > > > > > > > > > note: node 0x45704d0 (max_nunits=2, refcnt=3)
> > > > > > > > > > > note: op template: _4 = *_3;
> > > > > > > > > > > note: stmt 0 _4 = *_3;
> > > > > > > > > > > note: stmt 1 _12 = *_11;
> > > > > > > > > > > note: load permutation { 0 1 }
> > > > > > > > > > > note: node 0x4570548 (max_nunits=2, refcnt=3)
> > > > > > > > > > > note: op template: _6 = *_5;
> > > > > > > > > > > note: stmt 0 _6 = *_5;
> > > > > > > > > > > note: stmt 1 _6 = *_5;
> > > > > > > > > > > note: load permutation { 0 0 }
> > > > > > > > > > > note: node 0x4570638 (max_nunits=1, refcnt=1)
> > > > > > > > > > > note: op template: _14 = _6 + _12;
> > > > > > > > > > > note: { }
> > > > > > > > > > > note: children 0x45704d0 0x4570548
> > > > > > > > > > >
> > > > > > > > > > > Which I would need to work out on pen and paper to see if it can
> > > > > > > even
> > > > > > > > > work
> > > > > > > > > > > With the instruction.. (we generate quite awful code for this atm
> > > > > with
> > > > > > > > > float).
> > > > > > > > > >
> > > > > > > > > > Well, clearly the simple-minded match would add a perm node in
> > > > > > > > > > front of the b[i] load one and the permute optimization phase
> > > > > > > > > > would currently not elide it as no-op (or maybe it does, surely
> > > > > > > > > > it could).
> > > > > > > > > >
> > > > > > > > > > > So the problem here is I can't go back to the old code should
> > > > > costing
> > > > > > > > > become
> > > > > > > > > > > very expensive because of the permute it would need to insert.
> > > > > > > > > > >
> > > > > > > > > > > So I needed somewhat to reject the cases I know wouldn't
> > > > > generate
> > > > > > > > > good code.
> > > > > > > > > >
> > > > > > > > > > Yes - I think we do need to know the pattern is an obvious
> > > > > > > improvement
> > > > > > > > > > to the non-pattern state. But I think it should always be due to the
> > > > > > > > > > removed add or subtract instruction? Or are the complex
> > > > > instructions
> > > > > > > > > > more expensive than a single add or subtract?
> > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > How would we name the x86 instruction patterns which
> > > > > implement
> > > > > > > > > > > >
> > > > > > > > > > > > a[i] = b[i] - c[i];
> > > > > > > > > > > > a[i+1] = b[i+1] + c[i+1];
> > > > > > > > > > > >
> > > > > > > > > > > > ? Those do not implement a full complex operation AFAICS
> > > > > > > > > > > > so would we name them plusminus<mode>3 and
> > > > > > > minusplus<mode>3
> > > > > > > > > > > > and fmas<mode>4, fmsa<mode>4? They'd be the prefered
> > > > > match
> > > > > > > > > > > > (no anticipated permute necessary)?
> > > > > > > > > > >
> > > > > > > > > > > Yes, that makes sense. If the instructions have no expectations of
> > > > > a
> > > > > > > > > > > permute.
> > > > > > > > > > >
> > > > > > > > > > > So the difficult part here is I don't know how to find the right
> > > > > balance.
> > > > > > > > > > > You're right in that we should be able to accept the add_conj case
> > > > > and
> > > > > > > > > > > Just emit a permute there, as we have a single instruction for that
> > > > > > > > > permute.
> > > > > > > > > > >
> > > > > > > > > > > I also agree with you that it shouldn't be doing "costing" so early
> > > > > on,
> > > > > > > > > > > But if I don't do so, my only choices here are that it turns out to be
> > > > > > > cheap
> > > > > > > > > to do so WIN,
> > > > > > > > > > > or it turns out to be expensive to do and we fail vectorization
> > > > > entirely
> > > > > > > > > (well the loop vectorizer
> > > > > > > > > > > would probably try without SLP enabled and generate
> > > > > *something*,
> > > > > > > but
> > > > > > > > > > > the non-loop SLP is a bit out of luck..).
> > > > > > > > > > >
> > > > > > > > > > > If only there was a way to compare the costs for the non pattern
> > > > > > > > > matched tree vs the
> > > > > > > > > > > pattern matched one. But that would be quite a big addition at
> > > > > this
> > > > > > > point.
> > > > > > > > > >
> > > > > > > > > > But what matters is of course the cost after permute optimization
> > > > > did
> > > > > > > > > > its work.
> > > > > > > > > >
> > > > > > > > > > So I wonder if we can match cadd_conj during pattern matching and
> > > > > > > > > > wire turning that into cadd90/270 during optimize_slp when we
> > > > > know
> > > > > > > > > > the permute that is coming along the child? Yes, that would put
> > > > > > > > > > knowledge of all of it into that point but thinking of this as
> > > > > > > > > > all doable in a separate pattern matching (without re-implementing
> > > > > > > > > > all of the permute optimization) doesn't look like it will work?
> > > > > > > > > >
> > > > > > > > > > That is, when materializing a permute on a cadd_conj child we
> > > > > > > > > > can instead turn it into a cadd90/270? We probably need to turn
> > > > > > > > > > the materialization loop into an ordered one based on the RPO
> > > > > > > > > > order computed earlier.
> > > > > > > > > >
> > > > > > > > > > And if we just match cadd_conj (and the variant with even/odd
> > > > > > > > > > swapped) we could do this directly during SLP discovery as well
> > > > > > > > > > where we handle two_operators. Do you have
> > > > > > > > > >
> > > > > > > > > > Now the question is of course how this interacts with mul and fma/s
> > > > > > > > > > but I guess it's always the adds that introduce all the variants.
> > > > > > > > > > The mla/mls patterns have a comment
> > > > > > > > > >
> > > > > > > > > > +;; The complex mla/mls operations always need to expand to two
> > > > > > > > > > instructions.
> > > > > > > > > > +;; The first operation does half the computation and the second
> > > > > does
> > > > > > > the
> > > > > > > > > > +;; remainder. Because of this, expand early.
> > > > > > > > > >
> > > > > > > > > > so what are the building blocks there? It makes it sound like
> > > > > > > > > > this is a widening multiplication or so?
> > > > > > > > >
> > > > > > > > > So following up myself after reading the ARM docs regarding to
> > > > > > > > > those. It seems this is about FCMLA where two of those can be
> > > > > > > > > used to perform full complex multiplication. I think we want
> > > > > > > > > to model the individual FCMLA operations and not the complex
> > > > > > > > > multiplication itself and also expose the FCMLAs as optabs,
> > > > > > > > > not complex multiplication. There seem to be four variants
> > > > > > > > > (as opposed to the two cadd ones).
> > > > > > > > >
> > > > > > > > > rot '00'
> > > > > > > > > a[2*i] += b[2*i] * c[2*i]
> > > > > > > > > a[2*i+1] += b[2*i] * c[2*i+1]
> > > > > > > > >
> > > > > > > > > rot '01'
> > > > > > > > > a[2*i] += b[2*i+1] * -c[2*i+1]
> > > > > > > > > a[2*i+1] += b[2*i+1] * c[2*i]
> > > > > > > > >
> > > > > > > > > rot '10'
> > > > > > > > > a[2*i] += b[2*i] * -c[2*i]
> > > > > > > > > a[2*i+1] += b[2*i] * -c[2*i+1]
> > > > > > > > >
> > > > > > > > > rot '11'
> > > > > > > > > a[2*i] += b[2*i+1] * c[2*i+1]
> > > > > > > > > a[2*i+1] += b[2*i+1] * -c[2*i]
> > > > > > > > >
> > > > > > > > > where in practice we'll see the negate handled by turning
> > > > > > > > > the add into a subtract which then means the thing to
> > > > > > > > > pattern match is scalar by vector multiplication? Again
> > > > > > > > > "which" scalar (lane) against which permute of the other
> > > > > > > > > vector is going to interact with permute optimizations.
> > > > > > > > > But that leaves us with almost nothing special from a regular
> > > > > > > > > multiplication - the mixed sign operation will be the add
> > > > > > > > > again ...
> > > > > > > >
> > > > > > > > But the difficulty here is that you need to have both calculations of
> > > > > > > > Rot '00' and rot '01' for instance work together, not in parallel.
> > > > > > > >
> > > > > > > > That is, you have to have Rot '00' go before Rot '01' so the accumulation
> > > > > > > > value is correct. You also don't want the vectorizer to think it needs a
> > > > > load
> > > > > > > > duplicate. Since e.g. rot '00' only uses b[2*i] it would need a lane perm
> > > > > > > loading
> > > > > > > > [0 0] and you don't want to materialize that.
> > > > > > > >
> > > > > > > > Partially due to the costing, but also it's really hard to undo permutes in
> > > > > RTL.
> > > > > > > >
> > > > > > > > So I don't think treating them as separate instructions is the best thing
> > > > > here.
> > > > > > > >
> > > > > > > > If instead it's treated like semantically what you want to do this gives
> > > > > me
> > > > > > > some
> > > > > > > > freedom to operate. E.g. We don't have these instructions in NEON
> > > > > and
> > > > > > > SVE1 on
> > > > > > > > integers. But for certain modes they're easy to emulate.
> > > > > > > >
> > > > > > > > It's a lot easier for a target just to have to implement COMPLEX_MUL
> > > > > > > rather than
> > > > > > > > the AArch64 semantics for these instructions.
> > > > > > >
> > > > > > > That's true but we'd leave using the instructions for cases where it
> > > > > > > really only does "half" of the COMPLEX_MUL on the plate. On x86
> > > > > > > the instructions are again even less capable by omitting the permute
> > > > > > > and just doing even/odd plus-minus FMA variants.
> > > > > >
> > > > > > But I really don't see how this can be done.
> > > > > >
> > > > > > If you only have half the operation, you end up with:
> > > > > >
> > > > > > rot '00'
> > > > > > a[2*i] += b[2*i] * c[2*i]
> > > > > > a[2*i+1] += b[2*i] * c[2*i+1]
> > > > > >
> > > > > > Which afaik can't be done on the scalar pattern matcher because when
> > > > > seeing a[2*i] you'd
> > > > > > need to know about a[2*i+1].
> > > > > >
> > > > > > If you do it after SLP construction that's a completely different tree than
> > > > > you'd get from
> > > > > > COMPLEX_MUL. So this form should never appear in your SLP tree if you
> > > > > have both operations
> > > > > > to form a valid complex operation.
> > > > >
> > > > > But the sequence of adds/mults and permutes can appear outside of
> > > > > complex
> > > > > context. And the SLP pattern matcher would miss the above even though
> > > > > there's a 1:1 instruction available because it only looks for the
> > > > > combination of two instructions which make up a complex multiplication.
> > > > >
> > > > > > > So looking at the patch again I see
> > > > > > >
> > > > > > > void
> > > > > > > complex_pattern::build (slp_tree_to_load_perm_map_t *perm_cache,
> > > > > > > vec_info *vinfo)
> > > > > > > {
> > > > > > > ...
> > > > > > > FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), ix, tmp)
> > > > > > > {
> > > > > > > slp_tree vnode = NULL;
> > > > > > > if (vect_slp_make_linear (perm_cache, node, tmp, &vnode))
> > > > > > > nodes.safe_push (vnode);
> > > > > > >
> > > > > > > so we're relying on an exact precise lane order being detected
> > > > > > > by the linear stuff rather than it being a heuristic.
> > > > > > >
> > > > > > > I'd have materialized the very specific reverse permute
> > > > > > > anticipated by the actual chosen complex IFN on the second
> > > > > > > operand. That's never going to be incorrect then, at most
> > > > > > > sub-optimal. Your variant might be incorrect and also
> > > > > > > sub-optimal (you still rely on permute optimization to
> > > > > > > cancel the linearization permute).
> > > > > >
> > > > > > But to materialize the reverse permute I'd have to know the
> > > > > > Original permute. But what happens if you have both ADDSUB
> > > > > > And COMPLEX_ADD? For SVE for instance we can easily emulate
> > > > > > ADDSUB using predication, which is likely cheaper if the data requires
> > > > > > no permutation..
> > > > >
> > > > > Well, if the pattern is a CADD90 then there is a specific permute
> > > > > done on the second operand by this very pattern operation. You
> > > > > then insert the reverse on the edge to the second operand.
> > > > >
> > > > > The permute is specified by the CADD90, not by whatever permute
> > > > > arrives because you have to reflect what CADD90 does to the
> > > > > rest of the SLP tree. No?
> > > > >
> > > > > > > I hope the specific review comments do not get lost in the
> > > > > > > thread discussion the general approach of matching the ARM
> > > > > > > complex ops ;)
> > > > > > >
> > > > > >
> > > > > > I usually extract them into one place before I start working.
> > > > > >
> > > > > > But atm I'm stuck a bit as I don't think we've agreed on an approach
> > > > > > that would also work for MUL and MLA.
> > > > >
> > > > > Honestly I don't have a good idea that is guaranteed to work. I think
> > > > > you showed that your approach works to the extent you tested it and
> > > > > thus this is the way forward if we want to make GCC 11. As said on
> > > > > IRC these patterns somewhat feel like they need a global [permute]
> > > > > optimization framework rather than a local pattern matching
> > > > > (the Intel x86 vector extensions with just even/odd lane negates
> > > > > would be pure local matches).
> > > > >
> > > > > So below are some more comments on the lane tracking.
> > > > >
> > > > > static load_permutation_t
> > > > > linear_loads_p (slp_tree_to_load_perm_map_t *perm_cache, slp_tree root,
> > > > > bool *linear)
> > > > > {
> > > > > ...
> > > > > if ((tmp = perm_cache->get (root)) != NULL)
> > > > > {
> > > > > *linear = is_linear_load_p (*tmp);
> > > > > return *tmp;
> > > > > }
> > > > >
> > > > > it would be nice to avoid is_linear_load_p on cached entries which
> > > > > means we'd like to reflect it in the cache itself. From what I
> > > > > understand (but what is not documented), the following holds for
> > > > > cache entries:
> > > > >
> > > > > vNULL - nothing is known about 'node' (but we've visited it), kind of
> > > > > VARYING
> > > > > lperm - the lanes are permuted according to lperm in 'node' based on
> > > > > some unknown node(s)
> > > > >
> > > > > we could make the cache entry a std::pair<enum, load_permutation_t>
> > > > > with the enum denoting UNKNOWN, LINEAR, and PERMUTED where for the
> > > > > first two the load_permutation_t could be vNULL.
> > > > >
> > > > > /* If it's a load node, then just read the load permute. */
> > > > > if (SLP_TREE_LOAD_PERMUTATION (root).exists ())
> > > > > {
> > > > > loads = SLP_TREE_LOAD_PERMUTATION (root);
> > > > > perm_cache->put (root, loads);
> > > > > if (!is_linear_load_p (loads))
> > > > > return loads;
> > > > >
> > > > > since loads are terminal we can always return 'loads' and
> > > > > init *linear from is_linear_load_p (loads)?
> > > > >
> > > > > And as said elsewhere I'd simply treat vect_external_defs and
> > > > >
> > > > > else if (SLP_TREE_DEF_TYPE (root) != vect_internal_def)
> > > > > return vNULL;
> > > > >
> > > > > as linear. Alternatively there could be a fourth state, VTOP,
> > > > > meaning to merge with any other state (aka, we can permute
> > > > > externals as we wish at no cost).
> > > > >
> > > > > slp_tree child;
> > > > > FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (root), i, child)
> > > > > {
> > > > > loads = linear_loads_p (perm_cache, child, linear);
> > > > > if ((!*linear && !is_perm) || !loads.exists ())
> > > > > return loads;
> > > > >
> > > > > all_loads.safe_push (loads);
> > > > > }
> > > > >
> > > > > at merges we have to treat any UNKNOWN child by returning
> > > > > UNKNOWN, any VTOP child we can ignore (or pass on if all
> > > > > are VTOP), and both LINEAR and PERMUTED needs to match
> > > > > for all children to do anything sensible, otherwise we
> > > > > need to fall back to UNKNOWN.
> > > > >
> > > > > Unless we can conservatively treat UNKNOWN as LINEAR
> > > > > (just assume we're starting a new vector here).
> > > > >
> > > > > if (is_perm)
> > > > > {
> > > > >
> > > > > and this then permutes the common lane state.
> > > > >
> > > > > Since you pre-load with vNULL anything participating in
> > > > > cycles will drop to UNKNOWN, but I guess that's fine.
> > > > >
> > > > > I think we should not need vect_slp_make_linear at all.
> > > > > Each and every pattern recognized has a specific intrinsic
> > > > > permute we have to reflect - the permute analysis above
> > > > > is just to decide which of the patterns we want to choose - it
> > > > > can be seen as a heuristic (for anything external or for
> > > > > the case we go from UNKNOWN to newly LINEAR).
> > > >
> > > > Don't I still need it? Albeit a simplified form since I need to
> > > > materialize the inverse permute?
> > > >
> > > > But what It doesn't need to do anymore is re-analyze the permute?
> > >
> > > Yes, you don't need vect_slp_make_linear or do any analysis,
> > > you simply based on the chosen IFN materialize the inverse
> > > permute as to what the IFN does internally. So if
> > > cadd90 internally permutes operand 2 as { 1, 0 } then you
> > > emit the inverse on the child node (which is also { 1, 0 } here).
> > > The idea is of course that the internal permute and the emitted
> > > cancel.
> > >
> > > Richard.
> > >
> > > > Regards,
> > > > Tamar
> > > >
> > > > >
> > > > > That would leave the multiplication case were you want to
> > > > > merge the splat uses { a[0], a[0] } and { a[1], a[1] }.
> > > > > But there it's the same as with the intrinsic permutes
> > > > > we model - we have a extract so we anticipate a merge.
> > > > > Sth like
> > > > >
> > > > > note: node 0x4779a00 (max_nunits=4, refcnt=2)
> > > > > note: op template: _9 = IMAGPART_EXPR <*_3>;
> > > > > note: stmt 0 _9 = IMAGPART_EXPR <*_3>;
> > > > > note: stmt 1 _9 = IMAGPART_EXPR <*_3>;
> > > > > note: load permutation { 1 1 }
> > > > > note: node 0x4779898 (max_nunits=4, refcnt=2)
> > > > > note: op template: _10 = REALPART_EXPR <*_3>;
> > > > > note: stmt 0 _10 = REALPART_EXPR <*_3>;
> > > > > note: stmt 1 _10 = REALPART_EXPR <*_3>;
> > > > > note: load permutation { 0 0 }
> > > > >
> > > > > add:
> > > > >
> > > > > node
> > > > > op: VEC_PERM_EXPR
> > > > > lane permutation { 0[0], 1[1] }
> > > > > children 0x4779898 0x4779a00
> > > > >
> > > > > note when both children are external this will currently
> > > > > break so we do have to "fold" that by simplifying it to
> > > > > a new external node extracting from the appropriate lanes.
> > > > >
> > > > > So I think it should work doing it this way. Then the
> > > > > most simplistic lane permute analysis would just look
> > > > > at the node itself and treat any not load-permuted node
> > > > > as LINEAR while returning the actual load permutation
> > > > > for load-permuted nodes.
> > > > >
> > > > > Richard.
> > > > >
> > > > >
> > > > > > Regards,
> > > > > > Tamar
> > > > > >
> > > > > > > Richard.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Tamar
> > > > > > > >
> > > > > > > > >
> > > > > > > > > It might be feasible to handle the case of the SLP children
> > > > > > > > > being loads themselves in the pattern matching process but
> > > > > > > > > I guess you've run into more complex situations since you
> > > > > > > > > implemented that "propagation" stuff? The testcases included
> > > > > > > > > on the branch seem to be simple direct cases of the ops operating
> > > > > > > > > on memory.
> > > > > > > > >
> > > > > > > > > So in the end matching the ARM operations boils down to
> > > > > > > > > exactly tracing participating lanes which sounds more like
> > > > > > > > > a dataflow problem rather than a simple (local) pattern matching
> > > > > > > > > one.
> > > > > > > > >
> > > > > > > > > Meh.
> > > > > > > > >
> > > > > > > > > Richard.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > Unfortunately
> > > > > > > > > > the patterns are half regular RTL and half unspec so they don't
> > > > > > > > > > really specify what is done semantically :/ It would be nice
> > > > > > > > > > if the patches with the aarch64 backend changes would be on
> > > > > > > > > > trunk already ... (on the branch I don't see anything related
> > > > > > > > > > to add_conj for example)
> > > > > > > > > >
> > > > > > > > > > Btw, do you have any real-world cases that we want to optimize
> > > > > > > > > > where there's more than a single to-be-matched operation
> > > > > > > > > > operating on memory?
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Richard.
> > > > > > > > > >
> > > > > > > > > > > Regards,
> > > > > > > > > > > Tamar
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks (I hope we can simplify stuff further),
> > > > > > > > > > > > Richard.
> > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > Tamar
> > > > > > > > > > > > >
> > > > > > > > > > > > > gcc/ChangeLog:
> > > > > > > > > > > > >
> > > > > > > > > > > > > * tree-vect-slp-patterns.c: New file.
> > > > > > > > > > > > > * Makefile.in: Add it.
> > > > > > > > > > > > > * doc/passes.texi: Document it.
> > > > > > > > > > > > > * internal-fn.def (COMPLEX_ADD_ROT90,
> > > > > > > COMPLEX_ADD_ROT270):
> > > > > > > > > > > > New.
> > > > > > > > > > > > > * optabs.def (cadd90_optab, cadd270_optab): New.
> > > > > > > > > > > > > * doc/md.texi: Document them.
> > > > > > > > > > > > > * tree-vect-slp.c:
> > > > > > > > > > > > > (vect_free_slp_instance, vect_create_new_slp_node):
> > > > > > > Export.
> > > > > > > > > > > > > (vect_match_slp_patterns_2, vect_match_slp_patterns):
> > > > > > > New.
> > > > > > > > > > > > > (vect_analyze_slp): Use it.
> > > > > > > > > > > > > * tree-vectorizer.h (vect_free_slp_tree): Export.
> > > > > > > > > > > > > (enum _complex_operation): Forward declare.
> > > > > > > > > > > > > (class vect_pattern): New
> > > > > > > > > > > > >
> > > > > > > > > > > > > gcc/testsuite/ChangeLog:
> > > > > > > > > > > > >
> > > > > > > > > > > > > * lib/target-supports.exp
> > > > > > > > > > > > >
> > > > > > > (check_effective_target_arm_v8_3a_complex_neon_ok_nocache):
> > > > > > > > > > > > Fix it.
> > > > > > > > > > > > > (check_effective_target_vect_complex_add_byte
> > > > > > > > > > > > > ,check_effective_target_vect_complex_add_int
> > > > > > > > > > > > > ,check_effective_target_vect_complex_add_short
> > > > > > > > > > > > > ,check_effective_target_vect_complex_add_long
> > > > > > > > > > > > > ,check_effective_target_vect_complex_add_half
> > > > > > > > > > > > > ,check_effective_target_vect_complex_add_float
> > > > > > > > > > > > > ,check_effective_target_vect_complex_add_double): New.
> > > > > > > > > > > > > * gcc.dg/vect/complex/bb-slp-complex-add-byte.c: New
> > > > > test.
> > > > > > > > > > > > > * gcc.dg/vect/complex/bb-slp-complex-add-int.c: New
> > > > > test.
> > > > > > > > > > > > > * gcc.dg/vect/complex/bb-slp-complex-add-long.c: New
> > > > > test.
> > > > > > > > > > > > > * gcc.dg/vect/complex/bb-slp-complex-add-pattern-
> > > > > byte.c:
> > > > > > > New
> > > > > > > > > test.
> > > > > > > > > > > > > * gcc.dg/vect/complex/bb-slp-complex-add-pattern-int.c:
> > > > > > > New
> > > > > > > > > test.
> > > > > > > > > > > > > * gcc.dg/vect/complex/bb-slp-complex-add-pattern-
> > > > > long.c:
> > > > > > > New
> > > > > > > > > test.
> > > > > > > > > > > > > * gcc.dg/vect/complex/bb-slp-complex-add-pattern-
> > > > > short.c:
> > > > > > > New
> > > > > > > > > test.
> > > > > > > > > > > > > * gcc.dg/vect/complex/bb-slp-complex-add-pattern-
> > > > > > > unsigned-
> > > > > > > > > byte.c:
> > > > > > > > > > > > New test.
> > > > > > > > > > > > > * gcc.dg/vect/complex/bb-slp-complex-add-pattern-
> > > > > > > unsigned-
> > > > > > > > > int.c:
> > > > > > > > > > > > New test.
> > > > > > > > > > > > > * gcc.dg/vect/complex/bb-slp-complex-add-pattern-
> > > > > > > unsigned-
> > > > > > > > > long.c:
> > > > > > > > > > > > New test.
> > > > > > > > > > > > > * gcc.dg/vect/complex/bb-slp-complex-add-pattern-
> > > > > > > unsigned-
> > > > > > > > > short.c:
> > > > > > > > > > > > New test.
> > > > > > > > > > > > > * gcc.dg/vect/complex/bb-slp-complex-add-short.c: New
> > > > > test.
> > > > > > > > > > > > > * gcc.dg/vect/complex/bb-slp-complex-add-unsigned-
> > > > > byte.c:
> > > > > > > > > New test.
> > > > > > > > > > > > > * gcc.dg/vect/complex/bb-slp-complex-add-unsigned-
> > > > > int.c:
> > > > > > > New
> > > > > > > > > test.
> > > > > > > > > > > > > * gcc.dg/vect/complex/bb-slp-complex-add-unsigned-
> > > > > long.c:
> > > > > > > > > New test.
> > > > > > > > > > > > > * gcc.dg/vect/complex/bb-slp-complex-add-unsigned-
> > > > > short.c:
> > > > > > > > > New
> > > > > > > > > > > > test.
> > > > > > > > > > > > > * gcc.dg/vect/complex/complex-add-pattern-template.c:
> > > > > > > New
> > > > > > > > > test.
> > > > > > > > > > > > > * gcc.dg/vect/complex/complex-add-template.c: New
> > > > > test.
> > > > > > > > > > > > > * gcc.dg/vect/complex/complex-operations-run.c: New
> > > > > test.
> > > > > > > > > > > > > * gcc.dg/vect/complex/complex-operations.c: New test.
> > > > > > > > > > > > > * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-
> > > > > > > double.c:
> > > > > > > > > New
> > > > > > > > > > > > test.
> > > > > > > > > > > > > * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-
> > > > > float.c:
> > > > > > > > > New
> > > > > > > > > > > > test.
> > > > > > > > > > > > > * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-
> > > > > half-
> > > > > > > > > float.c:
> > > > > > > > > > > > New test.
> > > > > > > > > > > > > * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-
> > > > > > > pattern-
> > > > > > > > > > > > double.c: New test.
> > > > > > > > > > > > > * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-
> > > > > > > pattern-
> > > > > > > > > float.c:
> > > > > > > > > > > > New test.
> > > > > > > > > > > > > * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-
> > > > > > > pattern-
> > > > > > > > > half-
> > > > > > > > > > > > float.c: New test.
> > > > > > > > > > > > > * gcc.dg/vect/complex/fast-math-complex-add-double.c:
> > > > > > > New
> > > > > > > > > test.
> > > > > > > > > > > > > * gcc.dg/vect/complex/fast-math-complex-add-float.c:
> > > > > New
> > > > > > > test.
> > > > > > > > > > > > > * gcc.dg/vect/complex/fast-math-complex-add-half-
> > > > > float.c:
> > > > > > > New
> > > > > > > > > test.
> > > > > > > > > > > > > * gcc.dg/vect/complex/fast-math-complex-add-pattern-
> > > > > > > double.c:
> > > > > > > > > New
> > > > > > > > > > > > test.
> > > > > > > > > > > > > * gcc.dg/vect/complex/fast-math-complex-add-pattern-
> > > > > > > float.c:
> > > > > > > > > New
> > > > > > > > > > > > test.
> > > > > > > > > > > > > * gcc.dg/vect/complex/fast-math-complex-add-pattern-
> > > > > half-
> > > > > > > > > float.c:
> > > > > > > > > > > > New test.
> > > > > > > > > > > > > * gcc.dg/vect/complex/vect-complex-add-byte.c: New
> > > > > test.
> > > > > > > > > > > > > * gcc.dg/vect/complex/vect-complex-add-int.c: New test.
> > > > > > > > > > > > > * gcc.dg/vect/complex/vect-complex-add-long.c: New
> > > > > test.
> > > > > > > > > > > > > * gcc.dg/vect/complex/vect-complex-add-pattern-byte.c:
> > > > > > > New
> > > > > > > > > test.
> > > > > > > > > > > > > * gcc.dg/vect/complex/vect-complex-add-pattern-int.c:
> > > > > New
> > > > > > > test.
> > > > > > > > > > > > > * gcc.dg/vect/complex/vect-complex-add-pattern-long.c:
> > > > > > > New
> > > > > > > > > test.
> > > > > > > > > > > > > * gcc.dg/vect/complex/vect-complex-add-pattern-short.c:
> > > > > > > New
> > > > > > > > > test.
> > > > > > > > > > > > > * gcc.dg/vect/complex/vect-complex-add-pattern-
> > > > > unsigned-
> > > > > > > > > byte.c:
> > > > > > > > > > > > New test.
> > > > > > > > > > > > > * gcc.dg/vect/complex/vect-complex-add-pattern-
> > > > > unsigned-
> > > > > > > int.c:
> > > > > > > > > New
> > > > > > > > > > > > test.
> > > > > > > > > > > > > * gcc.dg/vect/complex/vect-complex-add-pattern-
> > > > > unsigned-
> > > > > > > > > long.c:
> > > > > > > > > > > > New test.
> > > > > > > > > > > > > * gcc.dg/vect/complex/vect-complex-add-pattern-
> > > > > unsigned-
> > > > > > > > > short.c:
> > > > > > > > > > > > New test.
> > > > > > > > > > > > > * gcc.dg/vect/complex/vect-complex-add-short.c: New
> > > > > test.
> > > > > > > > > > > > > * gcc.dg/vect/complex/vect-complex-add-unsigned-
> > > > > byte.c:
> > > > > > > New
> > > > > > > > > test.
> > > > > > > > > > > > > * gcc.dg/vect/complex/vect-complex-add-unsigned-int.c:
> > > > > > > New
> > > > > > > > > test.
> > > > > > > > > > > > > * gcc.dg/vect/complex/vect-complex-add-unsigned-long.c:
> > > > > > > New
> > > > > > > > > test.
> > > > > > > > > > > > > * gcc.dg/vect/complex/vect-complex-add-unsigned-
> > > > > short.c:
> > > > > > > New
> > > > > > > > > test.
> > > > > > > > > > > > >
> > > > > > > > > > > > > --- inline copy of patch --
> > > > > > > > > > > > > diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 778ec09c75d9af1cb9f2d5e7582b948c0397db65..d80657b089829fa30cede8bcf
> > > > > > > > > > > > e036dda0ec06682 100644
> > > > > > > > > > > > > --- a/gcc/Makefile.in
> > > > > > > > > > > > > +++ b/gcc/Makefile.in
> > > > > > > > > > > > > @@ -1646,6 +1646,7 @@ OBJS = \
> > > > > > > > > > > > > tree-vect-loop.o \
> > > > > > > > > > > > > tree-vect-loop-manip.o \
> > > > > > > > > > > > > tree-vect-slp.o \
> > > > > > > > > > > > > + tree-vect-slp-patterns.o \
> > > > > > > > > > > > > tree-vectorizer.o \
> > > > > > > > > > > > > tree-vector-builder.o \
> > > > > > > > > > > > > tree-vrp.o \
> > > > > > > > > > > > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > da8c9a283dd42e2b3078ed5f370a37180ee0b538..2a030a1d7373cd2b5837aa1c
> > > > > > > > > > > > 99936a6a4e4e1480 100644
> > > > > > > > > > > > > --- a/gcc/doc/md.texi
> > > > > > > > > > > > > +++ b/gcc/doc/md.texi
> > > > > > > > > > > > > @@ -6154,6 +6154,54 @@ floating-point mode.
> > > > > > > > > > > > >
> > > > > > > > > > > > > This pattern is not allowed to @code{FAIL}.
> > > > > > > > > > > > >
> > > > > > > > > > > > > +@cindex @code{cadd90@var{m}3} instruction pattern
> > > > > > > > > > > > > +@item @samp{cadd90@var{m}3}
> > > > > > > > > > > > > +Perform vector add and subtract on even/odd number pairs.
> > > > > > > The
> > > > > > > > > > > > operation being
> > > > > > > > > > > > > +matched is semantically described as
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +@smallexample
> > > > > > > > > > > > > + for (int i = 0; i < N; i += 2)
> > > > > > > > > > > > > + @{
> > > > > > > > > > > > > + c[i] = a[i] - b[i+1];
> > > > > > > > > > > > > + c[i+1] = a[i+1] + b[i];
> > > > > > > > > > > > > + @}
> > > > > > > > > > > > > +@end smallexample
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +This operation is semantically equivalent to performing a
> > > > > vector
> > > > > > > > > addition
> > > > > > > > > > > > of
> > > > > > > > > > > > > +complex numbers in operand 1 with operand 2 rotated by 90
> > > > > > > > > degrees
> > > > > > > > > > > > around
> > > > > > > > > > > > > +the argand plane and storing the result in operand 0.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +In GCC lane ordering the real part of the number must be in
> > > > > the
> > > > > > > > > even
> > > > > > > > > > > > lanes with
> > > > > > > > > > > > > +the imaginary part in the odd lanes.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +The operation is only supported for vector modes @var{m}.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +This pattern is not allowed to @code{FAIL}.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +@cindex @code{cadd270@var{m}3} instruction pattern
> > > > > > > > > > > > > +@item @samp{cadd270@var{m}3}
> > > > > > > > > > > > > +Perform vector add and subtract on even/odd number pairs.
> > > > > > > The
> > > > > > > > > > > > operation being
> > > > > > > > > > > > > +matched is semantically described as
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +@smallexample
> > > > > > > > > > > > > + for (int i = 0; i < N; i += 2)
> > > > > > > > > > > > > + @{
> > > > > > > > > > > > > + c[i] = a[i] + b[i+1];
> > > > > > > > > > > > > + c[i+1] = a[i+1] - b[i];
> > > > > > > > > > > > > + @}
> > > > > > > > > > > > > +@end smallexample
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +This operation is semantically equivalent to performing a
> > > > > vector
> > > > > > > > > addition
> > > > > > > > > > > > of
> > > > > > > > > > > > > +complex numbers in operand 1 with operand 2 rotated by
> > > > > 270
> > > > > > > > > degrees
> > > > > > > > > > > > around
> > > > > > > > > > > > > +the argand plane and storing the result in operand 0.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +In GCC lane ordering the real part of the number must be in
> > > > > the
> > > > > > > > > even
> > > > > > > > > > > > lanes with
> > > > > > > > > > > > > +the imaginary part in the odd lanes.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +The operation is only supported for vector modes @var{m}.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +This pattern is not allowed to @code{FAIL}.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > @cindex @code{ffs@var{m}2} instruction pattern
> > > > > > > > > > > > > @item @samp{ffs@var{m}2}
> > > > > > > > > > > > > Store into operand 0 one plus the index of the least significant
> > > > > 1-
> > > > > > > bit
> > > > > > > > > > > > > diff --git a/gcc/doc/passes.texi b/gcc/doc/passes.texi
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > a5ae4143a8c1293e674b499120372ee5fe5c412b..c86df5cd843084a5b7933ef99
> > > > > > > > > > > > a23386891a7b0c1 100644
> > > > > > > > > > > > > --- a/gcc/doc/passes.texi
> > > > > > > > > > > > > +++ b/gcc/doc/passes.texi
> > > > > > > > > > > > > @@ -709,7 +709,8 @@ loop.
> > > > > > > > > > > > > The pass is implemented in @file{tree-vectorizer.c} (the main
> > > > > > > driver),
> > > > > > > > > > > > > @file{tree-vect-loop.c} and @file{tree-vect-loop-manip.c}
> > > > > (loop
> > > > > > > > > specific
> > > > > > > > > > > > parts
> > > > > > > > > > > > > and general loop utilities), @file{tree-vect-slp} (loop-aware
> > > > > SLP
> > > > > > > > > > > > > -functionality), @file{tree-vect-stmts.c} and @file{tree-vect-
> > > > > data-
> > > > > > > > > refs.c}.
> > > > > > > > > > > > > +functionality), @file{tree-vect-stmts.c}, @file{tree-vect-data-
> > > > > > > refs.c}
> > > > > > > > > and
> > > > > > > > > > > > > +@file{tree-vect-slp-patterns.c} containing the SLP pattern
> > > > > > > matcher.
> > > > > > > > > > > > > Analysis of data references is in @file{tree-data-ref.c}.
> > > > > > > > > > > > >
> > > > > > > > > > > > > SLP Vectorization. This pass performs vectorization of
> > > > > straight-
> > > > > > > line
> > > > > > > > > code.
> > > > > > > > > > > > The
> > > > > > > > > > > > > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 310d37aa53819791b5df1683afca831f08e5892a..33c54be1e158ddea25c4cd6b1
> > > > > > > > > > > > 148df8cf4a509b5 100644
> > > > > > > > > > > > > --- a/gcc/internal-fn.def
> > > > > > > > > > > > > +++ b/gcc/internal-fn.def
> > > > > > > > > > > > > @@ -277,6 +277,9 @@ DEF_INTERNAL_FLT_FN (SCALB,
> > > > > > > ECF_CONST,
> > > > > > > > > scalb,
> > > > > > > > > > > > binary)
> > > > > > > > > > > > > DEF_INTERNAL_FLT_FLOATN_FN (FMIN, ECF_CONST, fmin,
> > > > > > > binary)
> > > > > > > > > > > > > DEF_INTERNAL_FLT_FLOATN_FN (FMAX, ECF_CONST, fmax,
> > > > > > > binary)
> > > > > > > > > > > > > DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONST, xorsign,
> > > > > > > binary)
> > > > > > > > > > > > > +DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT90,
> > > > > ECF_CONST,
> > > > > > > > > cadd90,
> > > > > > > > > > > > binary)
> > > > > > > > > > > > > +DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270,
> > > > > > > ECF_CONST,
> > > > > > > > > > > > cadd270, binary)
> > > > > > > > > > > > > +
> > > > > > > > > > > > >
> > > > > > > > > > > > > /* FP scales. */
> > > > > > > > > > > > > DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
> > > > > > > > > > > > > diff --git a/gcc/optabs.def b/gcc/optabs.def
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 5607f51e6b4b775a92d1d8ffcd3e9b53e9270d6c..e9727def4dbf941bb9ac8b56f
> > > > > > > > > > > > 83f8ea0f52b262c 100644
> > > > > > > > > > > > > --- a/gcc/optabs.def
> > > > > > > > > > > > > +++ b/gcc/optabs.def
> > > > > > > > > > > > > @@ -290,6 +290,8 @@ OPTAB_D (atan_optab, "atan$a2")
> > > > > > > > > > > > > OPTAB_D (atanh_optab, "atanh$a2")
> > > > > > > > > > > > > OPTAB_D (copysign_optab, "copysign$F$a3")
> > > > > > > > > > > > > OPTAB_D (xorsign_optab, "xorsign$F$a3")
> > > > > > > > > > > > > +OPTAB_D (cadd90_optab, "cadd90$a3")
> > > > > > > > > > > > > +OPTAB_D (cadd270_optab, "cadd270$a3")
> > > > > > > > > > > > > OPTAB_D (cos_optab, "cos$a2")
> > > > > > > > > > > > > OPTAB_D (cosh_optab, "cosh$a2")
> > > > > > > > > > > > > OPTAB_D (exp10_optab, "exp10$a2")
> > > > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-
> > > > > complex-
> > > > > > > add-
> > > > > > > > > byte.c
> > > > > > > > > > > > b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-
> > > > > byte.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 0000000000000000000000000000000000000000..3b1e0837a323364c55094240b
> > > > > > > > > > > > 21dcc4938fa37c2
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-
> > > > > add-
> > > > > > > byte.c
> > > > > > > > > > > > > @@ -0,0 +1,9 @@
> > > > > > > > > > > > > +/* { dg-do compile } */
> > > > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_byte } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define TYPE int8_t
> > > > > > > > > > > > > +#define N 16
> > > > > > > > > > > > > +#include <stdint.h>
> > > > > > > > > > > > > +#include "complex-add-template.c"
> > > > > > > > > > > > > \ No newline at end of file
> > > > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-
> > > > > complex-
> > > > > > > add-
> > > > > > > > > int.c
> > > > > > > > > > > > b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-int.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 0000000000000000000000000000000000000000..33d3d13d629bb831272609c48
> > > > > > > > > > > > 4c78e6d19a7b930
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-
> > > > > add-
> > > > > > > int.c
> > > > > > > > > > > > > @@ -0,0 +1,9 @@
> > > > > > > > > > > > > +/* { dg-do compile } */
> > > > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_int } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define TYPE int32_t
> > > > > > > > > > > > > +#define N 16
> > > > > > > > > > > > > +#include <stdint.h>
> > > > > > > > > > > > > +#include "complex-add-template.c"
> > > > > > > > > > > > > \ No newline at end of file
> > > > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-
> > > > > complex-
> > > > > > > add-
> > > > > > > > > long.c
> > > > > > > > > > > > b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-
> > > > > long.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 0000000000000000000000000000000000000000..54d0f1d6864c41fc656eeb1af3
> > > > > > > > > > > > 2736ad37dcf381
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-
> > > > > add-
> > > > > > > long.c
> > > > > > > > > > > > > @@ -0,0 +1,9 @@
> > > > > > > > > > > > > +/* { dg-do compile } */
> > > > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_long } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define TYPE int64_t
> > > > > > > > > > > > > +#define N 16
> > > > > > > > > > > > > +#include <stdint.h>
> > > > > > > > > > > > > +#include "complex-add-template.c"
> > > > > > > > > > > > > \ No newline at end of file
> > > > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-
> > > > > complex-
> > > > > > > add-
> > > > > > > > > > > > pattern-byte.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-
> > > > > > > complex-
> > > > > > > > > add-
> > > > > > > > > > > > pattern-byte.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 0000000000000000000000000000000000000000..fac77f7b626c985e4b033818a1
> > > > > > > > > > > > 0f126784d5a9a6
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-
> > > > > add-
> > > > > > > > > pattern-
> > > > > > > > > > > > byte.c
> > > > > > > > > > > > > @@ -0,0 +1,9 @@
> > > > > > > > > > > > > +/* { dg-do compile } */
> > > > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_byte } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define TYPE int8_t
> > > > > > > > > > > > > +#define N 16
> > > > > > > > > > > > > +#include <stdint.h>
> > > > > > > > > > > > > +#include "complex-add-pattern-template.c"
> > > > > > > > > > > > > \ No newline at end of file
> > > > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-
> > > > > complex-
> > > > > > > add-
> > > > > > > > > > > > pattern-int.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-
> > > > > complex-
> > > > > > > > > add-
> > > > > > > > > > > > pattern-int.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 0000000000000000000000000000000000000000..41a836c10c8f2f45a521912186
> > > > > > > > > > > > ab8ac5393f69fd
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-
> > > > > add-
> > > > > > > > > pattern-
> > > > > > > > > > > > int.c
> > > > > > > > > > > > > @@ -0,0 +1,9 @@
> > > > > > > > > > > > > +/* { dg-do compile } */
> > > > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_int } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define TYPE int32_t
> > > > > > > > > > > > > +#define N 16
> > > > > > > > > > > > > +#include <stdint.h>
> > > > > > > > > > > > > +#include "complex-add-pattern-template.c"
> > > > > > > > > > > > > \ No newline at end of file
> > > > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-
> > > > > complex-
> > > > > > > add-
> > > > > > > > > > > > pattern-long.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-
> > > > > > > complex-
> > > > > > > > > add-
> > > > > > > > > > > > pattern-long.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 0000000000000000000000000000000000000000..175f51c46d125578520b5205c8
> > > > > > > > > > > > 6ca8a836174a2f
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-
> > > > > add-
> > > > > > > > > pattern-
> > > > > > > > > > > > long.c
> > > > > > > > > > > > > @@ -0,0 +1,9 @@
> > > > > > > > > > > > > +/* { dg-do compile } */
> > > > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_long } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define TYPE int64_t
> > > > > > > > > > > > > +#define N 16
> > > > > > > > > > > > > +#include <stdint.h>
> > > > > > > > > > > > > +#include "complex-add-pattern-template.c"
> > > > > > > > > > > > > \ No newline at end of file
> > > > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-
> > > > > complex-
> > > > > > > add-
> > > > > > > > > > > > pattern-short.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-
> > > > > > > complex-
> > > > > > > > > add-
> > > > > > > > > > > > pattern-short.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 0000000000000000000000000000000000000000..c4fe72712a4d90bb5e89e6f6b
> > > > > > > > > > > > 2359029715c0bd8
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-
> > > > > add-
> > > > > > > > > pattern-
> > > > > > > > > > > > short.c
> > > > > > > > > > > > > @@ -0,0 +1,9 @@
> > > > > > > > > > > > > +/* { dg-do compile } */
> > > > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_short }
> > > > > */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define TYPE int16_t
> > > > > > > > > > > > > +#define N 16
> > > > > > > > > > > > > +#include <stdint.h>
> > > > > > > > > > > > > +#include "complex-add-pattern-template.c"
> > > > > > > > > > > > > \ No newline at end of file
> > > > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-
> > > > > complex-
> > > > > > > add-
> > > > > > > > > > > > pattern-unsigned-byte.c
> > > > > b/gcc/testsuite/gcc.dg/vect/complex/bb-
> > > > > > > slp-
> > > > > > > > > > > > complex-add-pattern-unsigned-byte.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 0000000000000000000000000000000000000000..534a4201d54f73e0419c99a599
> > > > > > > > > > > > 55900b473107c8
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-
> > > > > add-
> > > > > > > > > pattern-
> > > > > > > > > > > > unsigned-byte.c
> > > > > > > > > > > > > @@ -0,0 +1,9 @@
> > > > > > > > > > > > > +/* { dg-do compile } */
> > > > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_byte } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define TYPE uint8_t
> > > > > > > > > > > > > +#define N 16
> > > > > > > > > > > > > +#include <stdint.h>
> > > > > > > > > > > > > +#include "complex-add-pattern-template.c"
> > > > > > > > > > > > > \ No newline at end of file
> > > > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-
> > > > > complex-
> > > > > > > add-
> > > > > > > > > > > > pattern-unsigned-int.c
> > > > > b/gcc/testsuite/gcc.dg/vect/complex/bb-
> > > > > > > slp-
> > > > > > > > > > > > complex-add-pattern-unsigned-int.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 0000000000000000000000000000000000000000..9e3cf8062668b87962e0c71710
> > > > > > > > > > > > 579939f950651c
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-
> > > > > add-
> > > > > > > > > pattern-
> > > > > > > > > > > > unsigned-int.c
> > > > > > > > > > > > > @@ -0,0 +1,9 @@
> > > > > > > > > > > > > +/* { dg-do compile } */
> > > > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_int } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define TYPE uint32_t
> > > > > > > > > > > > > +#define N 16
> > > > > > > > > > > > > +#include <stdint.h>
> > > > > > > > > > > > > +#include "complex-add-pattern-template.c"
> > > > > > > > > > > > > \ No newline at end of file
> > > > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-
> > > > > complex-
> > > > > > > add-
> > > > > > > > > > > > pattern-unsigned-long.c
> > > > > b/gcc/testsuite/gcc.dg/vect/complex/bb-
> > > > > > > slp-
> > > > > > > > > > > > complex-add-pattern-unsigned-long.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 0000000000000000000000000000000000000000..398fc94154c88f2f9088910e50c
> > > > > > > > > > > > 3c1d4cc0ce17f
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-
> > > > > add-
> > > > > > > > > pattern-
> > > > > > > > > > > > unsigned-long.c
> > > > > > > > > > > > > @@ -0,0 +1,9 @@
> > > > > > > > > > > > > +/* { dg-do compile } */
> > > > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_long } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define TYPE uint64_t
> > > > > > > > > > > > > +#define N 16
> > > > > > > > > > > > > +#include <stdint.h>
> > > > > > > > > > > > > +#include "complex-add-pattern-template.c"
> > > > > > > > > > > > > \ No newline at end of file
> > > > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-
> > > > > complex-
> > > > > > > add-
> > > > > > > > > > > > pattern-unsigned-short.c
> > > > > b/gcc/testsuite/gcc.dg/vect/complex/bb-
> > > > > > > slp-
> > > > > > > > > > > > complex-add-pattern-unsigned-short.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 0000000000000000000000000000000000000000..7326d29d86c27056705c6287d
> > > > > > > > > > > > a41dd0b85d5cc35
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-
> > > > > add-
> > > > > > > > > pattern-
> > > > > > > > > > > > unsigned-short.c
> > > > > > > > > > > > > @@ -0,0 +1,9 @@
> > > > > > > > > > > > > +/* { dg-do compile } */
> > > > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_short }
> > > > > */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define TYPE uint16_t
> > > > > > > > > > > > > +#define N 16
> > > > > > > > > > > > > +#include <stdint.h>
> > > > > > > > > > > > > +#include "complex-add-pattern-template.c"
> > > > > > > > > > > > > \ No newline at end of file
> > > > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-
> > > > > complex-
> > > > > > > add-
> > > > > > > > > short.c
> > > > > > > > > > > > b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-
> > > > > short.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 0000000000000000000000000000000000000000..c1ce663dc7ab09875a06ad503
> > > > > > > > > > > > 81acc955dfd1fff
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-
> > > > > add-
> > > > > > > > > short.c
> > > > > > > > > > > > > @@ -0,0 +1,9 @@
> > > > > > > > > > > > > +/* { dg-do compile } */
> > > > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_short }
> > > > > */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define TYPE int16_t
> > > > > > > > > > > > > +#define N 16
> > > > > > > > > > > > > +#include <stdint.h>
> > > > > > > > > > > > > +#include "complex-add-template.c"
> > > > > > > > > > > > > \ No newline at end of file
> > > > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-
> > > > > complex-
> > > > > > > add-
> > > > > > > > > > > > unsigned-byte.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-
> > > > > > > complex-
> > > > > > > > > add-
> > > > > > > > > > > > unsigned-byte.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 0000000000000000000000000000000000000000..8d0c817fdae8e6ff6cdc665d6a
> > > > > > > > > > > > 132b4fc322ea61
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-
> > > > > add-
> > > > > > > > > unsigned-
> > > > > > > > > > > > byte.c
> > > > > > > > > > > > > @@ -0,0 +1,9 @@
> > > > > > > > > > > > > +/* { dg-do compile } */
> > > > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_byte } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define TYPE uint8_t
> > > > > > > > > > > > > +#define N 16
> > > > > > > > > > > > > +#include <stdint.h>
> > > > > > > > > > > > > +#include "complex-add-template.c"
> > > > > > > > > > > > > \ No newline at end of file
> > > > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-
> > > > > complex-
> > > > > > > add-
> > > > > > > > > > > > unsigned-int.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-
> > > > > > > complex-
> > > > > > > > > add-
> > > > > > > > > > > > unsigned-int.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 0000000000000000000000000000000000000000..3b08ecd0dd80f949ab88d7e74
> > > > > > > > > > > > 7602bb99fea7acc
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-
> > > > > add-
> > > > > > > > > unsigned-
> > > > > > > > > > > > int.c
> > > > > > > > > > > > > @@ -0,0 +1,9 @@
> > > > > > > > > > > > > +/* { dg-do compile } */
> > > > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_int } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define TYPE uint32_t
> > > > > > > > > > > > > +#define N 16
> > > > > > > > > > > > > +#include <stdint.h>
> > > > > > > > > > > > > +#include "complex-add-template.c"
> > > > > > > > > > > > > \ No newline at end of file
> > > > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-
> > > > > complex-
> > > > > > > add-
> > > > > > > > > > > > unsigned-long.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-
> > > > > > > complex-
> > > > > > > > > add-
> > > > > > > > > > > > unsigned-long.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 0000000000000000000000000000000000000000..4e069ee8297064dcad7447fff6
> > > > > > > > > > > > 012a10a34543e3
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-
> > > > > add-
> > > > > > > > > unsigned-
> > > > > > > > > > > > long.c
> > > > > > > > > > > > > @@ -0,0 +1,9 @@
> > > > > > > > > > > > > +/* { dg-do compile } */
> > > > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_long } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define TYPE uint64_t
> > > > > > > > > > > > > +#define N 16
> > > > > > > > > > > > > +#include <stdint.h>
> > > > > > > > > > > > > +#include "complex-add-template.c"
> > > > > > > > > > > > > \ No newline at end of file
> > > > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-
> > > > > complex-
> > > > > > > add-
> > > > > > > > > > > > unsigned-short.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-
> > > > > > > > > complex-add-
> > > > > > > > > > > > unsigned-short.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 0000000000000000000000000000000000000000..88d21abd3c8ee59901df645cf5
> > > > > > > > > > > > c036c548cc6b1c
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-
> > > > > add-
> > > > > > > > > unsigned-
> > > > > > > > > > > > short.c
> > > > > > > > > > > > > @@ -0,0 +1,9 @@
> > > > > > > > > > > > > +/* { dg-do compile } */
> > > > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_short }
> > > > > */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define TYPE uint16_t
> > > > > > > > > > > > > +#define N 16
> > > > > > > > > > > > > +#include <stdint.h>
> > > > > > > > > > > > > +#include "complex-add-template.c"
> > > > > > > > > > > > > \ No newline at end of file
> > > > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/complex-add-
> > > > > > > pattern-
> > > > > > > > > > > > template.c b/gcc/testsuite/gcc.dg/vect/complex/complex-add-
> > > > > > > > > pattern-
> > > > > > > > > > > > template.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 0000000000000000000000000000000000000000..e8b8b19d1708673b17564b31d
> > > > > > > > > > > > 22df3443d667277
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/complex-add-
> > > > > pattern-
> > > > > > > > > > > > template.c
> > > > > > > > > > > > > @@ -0,0 +1,60 @@
> > > > > > > > > > > > > +void add90 (TYPE a[restrict N], TYPE b[restrict N], TYPE
> > > > > c[restrict
> > > > > > > N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i+=2)
> > > > > > > > > > > > > + {
> > > > > > > > > > > > > + c[i] = a[i] - b[i+1];
> > > > > > > > > > > > > + c[i+1] = a[i+1] + b[i];
> > > > > > > > > > > > > + }
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +/* { dg-final { scan-tree-dump-times
> > > > > > > "stmt.*COMPLEX_ADD_ROT90"
> > > > > > > > > 1
> > > > > > > > > > > > "vect" } } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +void add270 (TYPE a[restrict N], TYPE b[restrict N], TYPE
> > > > > c[restrict
> > > > > > > N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i+=2)
> > > > > > > > > > > > > + {
> > > > > > > > > > > > > + c[i] = a[i] + b[i+1];
> > > > > > > > > > > > > + c[i+1] = a[i+1] - b[i];
> > > > > > > > > > > > > + }
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +/* { dg-final { scan-tree-dump-times
> > > > > > > > > "stmt.*COMPLEX_ADD_ROT270" 1
> > > > > > > > > > > > "vect" } } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +void addMixed (TYPE a[restrict N], TYPE b[restrict N], TYPE
> > > > > > > c[restrict
> > > > > > > > > N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i+=4)
> > > > > > > > > > > > > + {
> > > > > > > > > > > > > + c[i] = a[i] - b[i+1];
> > > > > > > > > > > > > + c[i+1] = a[i+1] + b[i];
> > > > > > > > > > > > > + c[i+2] = a[i+2] + b[i+3];
> > > > > > > > > > > > > + c[i+3] = a[i+3] - b[i+2];
> > > > > > > > > > > > > + }
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +void add90HandUnrolled (TYPE a[restrict N], TYPE b[restrict
> > > > > N],
> > > > > > > > > > > > > + TYPE c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < (N /2); i+=4)
> > > > > > > > > > > > > + {
> > > > > > > > > > > > > + c[i] = a[i] - b[i+1];
> > > > > > > > > > > > > + c[i+2] = a[i+2] - b[i+3];
> > > > > > > > > > > > > + c[i+1] = a[i+1] + b[i];
> > > > > > > > > > > > > + c[i+3] = a[i+3] + b[i+2];
> > > > > > > > > > > > > + }
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +/* { dg-final { scan-tree-dump-times
> > > > > > > "stmt.*COMPLEX_ADD_ROT90"
> > > > > > > > > 1
> > > > > > > > > > > > "vect" } } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +void add90Hybrid (TYPE a[restrict N], TYPE b[restrict N], TYPE
> > > > > > > > > c[restrict N],
> > > > > > > > > > > > > + TYPE d[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i+=2)
> > > > > > > > > > > > > + {
> > > > > > > > > > > > > + c[i] = a[i] - b[i+1];
> > > > > > > > > > > > > + c[i+1] = a[i+1] + b[i];
> > > > > > > > > > > > > + d[i] = a[i] - b[i];
> > > > > > > > > > > > > + d[i+1] = a[i+1] - b[i+1];
> > > > > > > > > > > > > + }
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +/* { dg-final { scan-tree-dump-times
> > > > > > > "stmt.*COMPLEX_ADD_ROT90"
> > > > > > > > > 2
> > > > > > > > > > > > "vect" } } */
> > > > > > > > > > > > > \ No newline at end of file
> > > > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/complex-add-
> > > > > > > > > template.c
> > > > > > > > > > > > b/gcc/testsuite/gcc.dg/vect/complex/complex-add-template.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 0000000000000000000000000000000000000000..afe08e867473695f0a742de330
> > > > > > > > > > > > 944f495bc541d7
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/complex-add-
> > > > > > > template.c
> > > > > > > > > > > > > @@ -0,0 +1,77 @@
> > > > > > > > > > > > > +void add0 (TYPE _Complex a[restrict N], TYPE _Complex
> > > > > > > b[restrict N],
> > > > > > > > > > > > > + TYPE _Complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] = a[i] + b[i];
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +void add90snd (TYPE _Complex a[restrict N], TYPE _Complex
> > > > > > > > > b[restrict N],
> > > > > > > > > > > > > + TYPE _Complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] = a[i] + (b[i] * 1.0i);
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +/* { dg-final { scan-tree-dump-times
> > > > > > > "stmt.*COMPLEX_ADD_ROT90"
> > > > > > > > > 1
> > > > > > > > > > > > "vect" } } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +void add180snd (TYPE _Complex a[restrict N], TYPE _Complex
> > > > > > > > > b[restrict N],
> > > > > > > > > > > > > + TYPE _Complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] = a[i] + (b[i] * 1.0i * 1.0i);
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +void add270snd (TYPE _Complex a[restrict N], TYPE _Complex
> > > > > > > > > b[restrict N],
> > > > > > > > > > > > > + TYPE _Complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] = a[i] + b[i];
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +/* { dg-final { scan-tree-dump-times
> > > > > > > > > "stmt.*COMPLEX_ADD_ROT270" 1
> > > > > > > > > > > > "vect" } } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +void add90fst (TYPE _Complex a[restrict N], TYPE _Complex
> > > > > > > > > b[restrict N],
> > > > > > > > > > > > > + TYPE _Complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] = (a[i] * 1.0i) + b[i];
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +/* { dg-final { scan-tree-dump-times
> > > > > > > "stmt.*COMPLEX_ADD_ROT90"
> > > > > > > > > 1
> > > > > > > > > > > > "vect" } } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +void add180fst (TYPE _Complex a[restrict N], TYPE _Complex
> > > > > > > > > b[restrict N],
> > > > > > > > > > > > > + TYPE _Complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] = (a[i] * 1.0i * 1.0i) + b[i];
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +void add270fst (TYPE _Complex a[restrict N], TYPE _Complex
> > > > > > > > > b[restrict N],
> > > > > > > > > > > > > + TYPE _Complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] = (a[i] * 1.0i * 1.0i * 1.0i) + b[i];
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +/* { dg-final { scan-tree-dump-times
> > > > > > > > > "stmt.*COMPLEX_ADD_ROT270" 1
> > > > > > > > > > > > "vect" } } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +void addconjfst (TYPE _Complex a[restrict N], TYPE _Complex
> > > > > > > > > b[restrict N],
> > > > > > > > > > > > > + TYPE _Complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] = ~a[i] + b[i];
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +void addconjsnd (TYPE _Complex a[restrict N], TYPE
> > > > > _Complex
> > > > > > > > > b[restrict
> > > > > > > > > > > > N],
> > > > > > > > > > > > > + TYPE _Complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] = a[i] + ~b[i];
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +void addconjboth (TYPE _Complex a[restrict N], TYPE
> > > > > _Complex
> > > > > > > > > b[restrict
> > > > > > > > > > > > N],
> > > > > > > > > > > > > + TYPE _Complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] = ~a[i] + ~b[i];
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/complex-
> > > > > > > operations-
> > > > > > > > > run.c
> > > > > > > > > > > > b/gcc/testsuite/gcc.dg/vect/complex/complex-operations-run.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 0000000000000000000000000000000000000000..a0348a7041ca384104bc5ab688
> > > > > > > > > > > > d941c14e5b7381
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/complex-
> > > > > operations-
> > > > > > > run.c
> > > > > > > > > > > > > @@ -0,0 +1,103 @@
> > > > > > > > > > > > > +/* { dg-do run } */
> > > > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_double }
> > > > > */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#include <stdio.h>
> > > > > > > > > > > > > +#include <complex.h>
> > > > > > > > > > > > > +#include <string.h>
> > > > > > > > > > > > > +#include <float.h>
> > > > > > > > > > > > > +#include <math.h>
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define PREF old
> > > > > > > > > > > > > +#pragma GCC push_options
> > > > > > > > > > > > > +#pragma GCC optimize ("no-tree-vectorize")
> > > > > > > > > > > > > +# include "complex-operations.c"
> > > > > > > > > > > > > +#pragma GCC pop_options
> > > > > > > > > > > > > +#undef PREF
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define PREF new
> > > > > > > > > > > > > +# include "complex-operations.c"
> > > > > > > > > > > > > +#undef PREF
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define TYPE double
> > > > > > > > > > > > > +#define TYPE2 double
> > > > > > > > > > > > > +#define EP pow(2, -45)
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define xstr(s) str(s)
> > > > > > > > > > > > > +#define str(s) #s
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define FCMP(A, B) \
> > > > > > > > > > > > > + ((fabs (creal (A) - creal (B)) <= EP) && (fabs (cimag (A) -
> > > > > cimag
> > > > > > > (B))
> > > > > > > > > <= EP))
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define CMP(A, B) \
> > > > > > > > > > > > > + (FCMP(A,B) ? "PASS" : "FAIL")
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define COMPARE(A,B) \
> > > > > > > > > > > > > + memset (&c1, 0, sizeof (c1)); \
> > > > > > > > > > > > > + memset (&c2, 0, sizeof (c2)); \
> > > > > > > > > > > > > + A; B; \
> > > > > > > > > > > > > + if (!FCMP(c1[0],c2[0]) || !FCMP(c1[1], c2[1])) \
> > > > > > > > > > > > > + { \
> > > > > > > > > > > > > + printf ("=> %s vs %s\n", xstr (A), xstr (B)); \
> > > > > > > > > > > > > + printf ("%a\n", creal (c1[0]) - creal (c2[0])); \
> > > > > > > > > > > > > + printf ("%a\n", cimag (c1[1]) - cimag (c2[1])); \
> > > > > > > > > > > > > + printf ("%.2f+%.2fI == %.2f+%.2fI (%s)\n", creal (c1[0]),
> > > > > cimag
> > > > > > > > > (c1[0]),
> > > > > > > > > > > > creal (c2[0]), cimag (c2[0]), CMP (c1[0], c2[0])); \
> > > > > > > > > > > > > + printf ("%.2f+%.2fI == %.2f+%.2fI (%s)\n", creal (c1[1]),
> > > > > cimag
> > > > > > > > > (c1[1]),
> > > > > > > > > > > > creal (c2[1]), cimag (c2[1]), CMP (c1[1], c2[1])); \
> > > > > > > > > > > > > + printf ("\n"); \
> > > > > > > > > > > > > + __builtin_abort (); \
> > > > > > > > > > > > > + }
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +int main ()
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + TYPE2 complex a[] = { 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I,
> > > > > 2.0 +
> > > > > > > 3.5
> > > > > > > > > * I,
> > > > > > > > > > > > 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0
> > > > > +
> > > > > > > 3.5 *
> > > > > > > > > I, 1.0
> > > > > > > > > > > > + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 +
> > > > > 3.5 *
> > > > > > > I,
> > > > > > > > > 1.0 +
> > > > > > > > > > > > 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5
> > > > > * I,
> > > > > > > 1.0
> > > > > > > > > + 3.0
> > > > > > > > > > > > * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I,
> > > > > 1.0
> > > > > > > +
> > > > > > > > > 3.0 * I,
> > > > > > > > > > > > 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I };
> > > > > > > > > > > > > + TYPE complex b[] = { 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I,
> > > > > 2.1 +
> > > > > > > 3.6
> > > > > > > > > * I,
> > > > > > > > > > > > 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1
> > > > > +
> > > > > > > 3.6 *
> > > > > > > > > I, 1.1
> > > > > > > > > > > > + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 +
> > > > > 3.6 *
> > > > > > > I,
> > > > > > > > > 1.1 +
> > > > > > > > > > > > 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6
> > > > > * I,
> > > > > > > 1.1
> > > > > > > > > + 3.1
> > > > > > > > > > > > * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I,
> > > > > 1.1
> > > > > > > +
> > > > > > > > > 3.1 * I,
> > > > > > > > > > > > 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I };
> > > > > > > > > > > > > + TYPE complex c2[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> > > > > 0,
> > > > > > > 0, 0,
> > > > > > > > > 0, 0,
> > > > > > > > > > > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
> > > > > > > > > > > > > + TYPE complex c1[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> > > > > 0,
> > > > > > > 0, 0,
> > > > > > > > > 0, 0,
> > > > > > > > > > > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
> > > > > > > > > > > > > + TYPE diff1, diff2;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + COMPARE(fma0_old(a, b, c1), fma0_new(a, b, c2));
> > > > > > > > > > > > > + COMPARE(fma90_old(a, b, c1), fma90_new(a, b, c2));
> > > > > > > > > > > > > + COMPARE(fma180_old(a, b, c1), fma180_new(a, b, c2));
> > > > > > > > > > > > > + COMPARE(fma270_old(a, b, c1), fma270_new(a, b, c2));
> > > > > > > > > > > > > + COMPARE(fma0_snd_old(a, b, c1), fma0_snd_new(a, b, c2));
> > > > > > > > > > > > > + COMPARE(fma90_snd_old(a, b, c1), fma90_snd_new(a, b,
> > > > > c2));
> > > > > > > > > > > > > + COMPARE(fma180_snd_old(a, b, c1), fma180_snd_new(a, b,
> > > > > > > c2));
> > > > > > > > > > > > > + COMPARE(fma270_snd_old(a, b, c1), fma270_snd_new(a, b,
> > > > > > > c2));
> > > > > > > > > > > > > + COMPARE(fma_conj_first_old(a, b, c1),
> > > > > fma_conj_first_new(a,
> > > > > > > b,
> > > > > > > > > c2));
> > > > > > > > > > > > > + COMPARE(fma_conj_second_old(a, b, c1),
> > > > > > > > > fma_conj_second_new(a, b,
> > > > > > > > > > > > c2));
> > > > > > > > > > > > > + COMPARE(fma_conj_both_old(a, b, c1),
> > > > > > > fma_conj_both_new(a, b,
> > > > > > > > > c2));
> > > > > > > > > > > > > + COMPARE(fms0_old(a, b, c1), fms0_new(a, b, c2));
> > > > > > > > > > > > > + COMPARE(fms90_old(a, b, c1), fms90_new(a, b, c2));
> > > > > > > > > > > > > + COMPARE(fms180_old(a, b, c1), fms180_new(a, b, c2));
> > > > > > > > > > > > > + COMPARE(fms270_old(a, b, c1), fms270_new(a, b, c2));
> > > > > > > > > > > > > + COMPARE(fms0_snd_old(a, b, c1), fms0_snd_new(a, b, c2));
> > > > > > > > > > > > > + COMPARE(fms90_snd_old(a, b, c1), fms90_snd_new(a, b,
> > > > > c2));
> > > > > > > > > > > > > + COMPARE(fms180_snd_old(a, b, c1), fms180_snd_new(a, b,
> > > > > > > c2));
> > > > > > > > > > > > > + COMPARE(fms270_snd_old(a, b, c1), fms270_snd_new(a, b,
> > > > > > > c2));
> > > > > > > > > > > > > + COMPARE(fms_conj_first_old(a, b, c1),
> > > > > fms_conj_first_new(a,
> > > > > > > b,
> > > > > > > > > c2));
> > > > > > > > > > > > > + COMPARE(fms_conj_second_old(a, b, c1),
> > > > > > > > > fms_conj_second_new(a, b,
> > > > > > > > > > > > c2));
> > > > > > > > > > > > > + COMPARE(fms_conj_both_old(a, b, c1),
> > > > > fms_conj_both_new(a,
> > > > > > > b,
> > > > > > > > > c2));
> > > > > > > > > > > > > + COMPARE(mul0_old(a, b, c1), mul0_new(a, b, c2));
> > > > > > > > > > > > > + COMPARE(mul90_old(a, b, c1), mul90_new(a, b, c2));
> > > > > > > > > > > > > + COMPARE(mul180_old(a, b, c1), mul180_new(a, b, c2));
> > > > > > > > > > > > > + COMPARE(mul270_old(a, b, c1), mul270_new(a, b, c2));
> > > > > > > > > > > > > + COMPARE(mul0_snd_old(a, b, c1), mul0_snd_new(a, b, c2));
> > > > > > > > > > > > > + COMPARE(mul90_snd_old(a, b, c1), mul90_snd_new(a, b,
> > > > > c2));
> > > > > > > > > > > > > + COMPARE(mul180_snd_old(a, b, c1), mul180_snd_new(a, b,
> > > > > > > c2));
> > > > > > > > > > > > > + COMPARE(mul270_snd_old(a, b, c1), mul270_snd_new(a, b,
> > > > > > > c2));
> > > > > > > > > > > > > + COMPARE(mul_conj_first_old(a, b, c1),
> > > > > mul_conj_first_new(a,
> > > > > > > b,
> > > > > > > > > c2));
> > > > > > > > > > > > > + COMPARE(mul_conj_second_old(a, b, c1),
> > > > > > > > > mul_conj_second_new(a, b,
> > > > > > > > > > > > c2));
> > > > > > > > > > > > > + COMPARE(mul_conj_both_old(a, b, c1),
> > > > > mul_conj_both_new(a,
> > > > > > > b,
> > > > > > > > > c2));
> > > > > > > > > > > > > + COMPARE(add0_old(a, b, c1), add0_new(a, b, c2));
> > > > > > > > > > > > > + COMPARE(add90_old(a, b, c1), add90_new(a, b, c2));
> > > > > > > > > > > > > + COMPARE(add180_old(a, b, c1), add180_new(a, b, c2));
> > > > > > > > > > > > > + COMPARE(add270_old(a, b, c1), add270_new(a, b, c2));
> > > > > > > > > > > > > + COMPARE(add0_snd_old(a, b, c1), add0_snd_new(a, b, c2));
> > > > > > > > > > > > > + COMPARE(add90_snd_old(a, b, c1), add90_snd_new(a, b,
> > > > > c2));
> > > > > > > > > > > > > + COMPARE(add180_snd_old(a, b, c1), add180_snd_new(a, b,
> > > > > > > c2));
> > > > > > > > > > > > > + COMPARE(add270_snd_old(a, b, c1), add270_snd_new(a, b,
> > > > > > > c2));
> > > > > > > > > > > > > + COMPARE(add_conj_first_old(a, b, c1),
> > > > > add_conj_first_new(a,
> > > > > > > b,
> > > > > > > > > c2));
> > > > > > > > > > > > > + COMPARE(add_conj_second_old(a, b, c1),
> > > > > > > > > add_conj_second_new(a, b,
> > > > > > > > > > > > c2));
> > > > > > > > > > > > > + COMPARE(add_conj_both_old(a, b, c1),
> > > > > add_conj_both_new(a,
> > > > > > > b,
> > > > > > > > > c2));
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/complex-
> > > > > > > operations.c
> > > > > > > > > > > > b/gcc/testsuite/gcc.dg/vect/complex/complex-operations.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 0000000000000000000000000000000000000000..fdce995481d23c6a536293c8ee
> > > > > > > > > > > > 59eaf9ca9239bf
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/complex-
> > > > > operations.c
> > > > > > > > > > > > > @@ -0,0 +1,358 @@
> > > > > > > > > > > > > +#include <stdio.h>
> > > > > > > > > > > > > +#include <complex.h>
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#ifndef PREF
> > > > > > > > > > > > > +#define PREF c
> > > > > > > > > > > > > +#endif
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define FX(N,P) P ## _ ## N
> > > > > > > > > > > > > +#define MK(N,P) FX(P,N)
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define N 32
> > > > > > > > > > > > > +#define TYPE double
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +// ------ FMA
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +// Complex FMA instructions rotating the result
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +__attribute__((noinline,noipa))
> > > > > > > > > > > > > +void MK(fma0, PREF) (TYPE complex a[restrict N], TYPE
> > > > > complex
> > > > > > > > > b[restrict
> > > > > > > > > > > > N], TYPE complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] += a[i] * b[i];
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +__attribute__((noinline,noipa))
> > > > > > > > > > > > > +void MK(fma90, PREF) (TYPE complex a[restrict N], TYPE
> > > > > complex
> > > > > > > > > > > > b[restrict N], TYPE complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] += a[i] * b[i] * I;
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +__attribute__((noinline,noipa))
> > > > > > > > > > > > > +void MK(fma180, PREF) (TYPE complex a[restrict N], TYPE
> > > > > > > complex
> > > > > > > > > > > > b[restrict N], TYPE complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] += a[i] * b[i] * I * I;
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +__attribute__((noinline,noipa))
> > > > > > > > > > > > > +void MK(fma270, PREF) (TYPE complex a[restrict N], TYPE
> > > > > > > complex
> > > > > > > > > > > > b[restrict N], TYPE complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] += a[i] * b[i] * I * I * I;
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +// Complex FMA instructions rotating the second parameter.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +__attribute__((noinline,noipa))
> > > > > > > > > > > > > +void MK(fma0_snd, PREF) (TYPE complex a[restrict N], TYPE
> > > > > > > complex
> > > > > > > > > > > > b[restrict N], TYPE complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] += a[i] * b[i];
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +__attribute__((noinline,noipa))
> > > > > > > > > > > > > +void MK(fma90_snd, PREF) (TYPE complex a[restrict N], TYPE
> > > > > > > > > complex
> > > > > > > > > > > > b[restrict N], TYPE complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] += a[i] * (b[i] * I);
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +__attribute__((noinline,noipa))
> > > > > > > > > > > > > +void MK(fma180_snd, PREF) (TYPE complex a[restrict N],
> > > > > TYPE
> > > > > > > > > complex
> > > > > > > > > > > > b[restrict N], TYPE complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] += a[i] * (b[i] * I * I);
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +__attribute__((noinline,noipa))
> > > > > > > > > > > > > +void MK(fma270_snd, PREF) (TYPE complex a[restrict N],
> > > > > TYPE
> > > > > > > > > complex
> > > > > > > > > > > > b[restrict N], TYPE complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] += a[i] * (b[i] * I * I * I);
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +// Complex FMA instructions with conjucated values.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +__attribute__((noinline,noipa))
> > > > > > > > > > > > > +void MK(fma_conj_first, PREF) (TYPE complex a[restrict N],
> > > > > TYPE
> > > > > > > > > complex
> > > > > > > > > > > > b[restrict N], TYPE complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] += conj (a[i]) * b[i];
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +__attribute__((noinline,noipa))
> > > > > > > > > > > > > +void MK(fma_conj_second, PREF) (TYPE complex a[restrict
> > > > > N],
> > > > > > > TYPE
> > > > > > > > > > > > complex b[restrict N], TYPE complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] += a[i] * conj (b[i]);
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +__attribute__((noinline,noipa))
> > > > > > > > > > > > > +void MK(fma_conj_both, PREF) (TYPE complex a[restrict N],
> > > > > TYPE
> > > > > > > > > complex
> > > > > > > > > > > > b[restrict N], TYPE complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] += conj (a[i]) * conj (b[i]);
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +// ----- FMS
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +// Complex FMS instructions rotating the result
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +__attribute__((noinline,noipa))
> > > > > > > > > > > > > +void MK(fms0, PREF) (TYPE complex a[restrict N], TYPE
> > > > > complex
> > > > > > > > > b[restrict
> > > > > > > > > > > > N], TYPE complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] -= a[i] * b[i];
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +__attribute__((noinline,noipa))
> > > > > > > > > > > > > +void MK(fms90, PREF) (TYPE complex a[restrict N], TYPE
> > > > > complex
> > > > > > > > > > > > b[restrict N], TYPE complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] -= a[i] * b[i] * I;
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +__attribute__((noinline,noipa))
> > > > > > > > > > > > > +void MK(fms180, PREF) (TYPE complex a[restrict N], TYPE
> > > > > > > complex
> > > > > > > > > > > > b[restrict N], TYPE complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] -= a[i] * b[i] * I * I;
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +__attribute__((noinline,noipa))
> > > > > > > > > > > > > +void MK(fms270, PREF) (TYPE complex a[restrict N], TYPE
> > > > > > > complex
> > > > > > > > > > > > b[restrict N], TYPE complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] -= a[i] * b[i] * I * I * I;
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +// Complex FMS instructions rotating the second parameter.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +__attribute__((noinline,noipa))
> > > > > > > > > > > > > +void MK(fms0_snd, PREF) (TYPE complex a[restrict N], TYPE
> > > > > > > complex
> > > > > > > > > > > > b[restrict N], TYPE complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] -= a[i] * b[i];
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +__attribute__((noinline,noipa))
> > > > > > > > > > > > > +void MK(fms90_snd, PREF) (TYPE complex a[restrict N], TYPE
> > > > > > > > > complex
> > > > > > > > > > > > b[restrict N], TYPE complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] -= a[i] * (b[i] * I);
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +__attribute__((noinline,noipa))
> > > > > > > > > > > > > +void MK(fms180_snd, PREF) (TYPE complex a[restrict N],
> > > > > TYPE
> > > > > > > > > complex
> > > > > > > > > > > > b[restrict N], TYPE complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] -= a[i] * (b[i] * I * I);
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +__attribute__((noinline,noipa))
> > > > > > > > > > > > > +void MK(fms270_snd, PREF) (TYPE complex a[restrict N],
> > > > > TYPE
> > > > > > > > > complex
> > > > > > > > > > > > b[restrict N], TYPE complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] -= a[i] * (b[i] * I * I * I);
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +// Complex FMS instructions with conjucated values.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +__attribute__((noinline,noipa))
> > > > > > > > > > > > > +void MK(fms_conj_first, PREF) (TYPE complex a[restrict N],
> > > > > TYPE
> > > > > > > > > complex
> > > > > > > > > > > > b[restrict N], TYPE complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] -= conj (a[i]) * b[i];
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +__attribute__((noinline,noipa))
> > > > > > > > > > > > > +void MK(fms_conj_second, PREF) (TYPE complex a[restrict N],
> > > > > > > TYPE
> > > > > > > > > > > > complex b[restrict N], TYPE complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] -= a[i] * conj (b[i]);
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +__attribute__((noinline,noipa))
> > > > > > > > > > > > > +void MK(fms_conj_both, PREF) (TYPE complex a[restrict N],
> > > > > TYPE
> > > > > > > > > complex
> > > > > > > > > > > > b[restrict N], TYPE complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] -= conj (a[i]) * conj (b[i]);
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +// ----- MUL
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +// Complex MUL instructions rotating the result
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +__attribute__((noinline,noipa))
> > > > > > > > > > > > > +void MK(mul0, PREF) (TYPE complex a[restrict N], TYPE
> > > > > complex
> > > > > > > > > b[restrict
> > > > > > > > > > > > N], TYPE complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] = a[i] * b[i];
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +__attribute__((noinline,noipa))
> > > > > > > > > > > > > +void MK(mul90, PREF) (TYPE complex a[restrict N], TYPE
> > > > > complex
> > > > > > > > > > > > b[restrict N], TYPE complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] = a[i] * b[i] * I;
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +__attribute__((noinline,noipa))
> > > > > > > > > > > > > +void MK(mul180, PREF) (TYPE complex a[restrict N], TYPE
> > > > > > > complex
> > > > > > > > > > > > b[restrict N], TYPE complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] = a[i] * b[i] * I * I;
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +__attribute__((noinline,noipa))
> > > > > > > > > > > > > +void MK(mul270, PREF) (TYPE complex a[restrict N], TYPE
> > > > > > > complex
> > > > > > > > > > > > b[restrict N], TYPE complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] = a[i] * b[i] * I * I * I;
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +// Complex MUL instructions rotating the second parameter.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +__attribute__((noinline,noipa))
> > > > > > > > > > > > > +void MK(mul0_snd, PREF) (TYPE complex a[restrict N], TYPE
> > > > > > > complex
> > > > > > > > > > > > b[restrict N], TYPE complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] = a[i] * b[i];
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +__attribute__((noinline,noipa))
> > > > > > > > > > > > > +void MK(mul90_snd, PREF) (TYPE complex a[restrict N], TYPE
> > > > > > > > > complex
> > > > > > > > > > > > b[restrict N], TYPE complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] = a[i] * (b[i] * I);
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +__attribute__((noinline,noipa))
> > > > > > > > > > > > > +void MK(mul180_snd, PREF) (TYPE complex a[restrict N],
> > > > > TYPE
> > > > > > > > > complex
> > > > > > > > > > > > b[restrict N], TYPE complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] = a[i] * (b[i] * I * I);
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +__attribute__((noinline,noipa))
> > > > > > > > > > > > > +void MK(mul270_snd, PREF) (TYPE complex a[restrict N],
> > > > > TYPE
> > > > > > > > > complex
> > > > > > > > > > > > b[restrict N], TYPE complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] = a[i] * (b[i] * I * I * I);
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +// Complex FMS instructions with conjucated values.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +__attribute__((noinline,noipa))
> > > > > > > > > > > > > +void MK(mul_conj_first, PREF) (TYPE complex a[restrict N],
> > > > > TYPE
> > > > > > > > > complex
> > > > > > > > > > > > b[restrict N], TYPE complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] = conj (a[i]) * b[i];
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +__attribute__((noinline,noipa))
> > > > > > > > > > > > > +void MK(mul_conj_second, PREF) (TYPE complex a[restrict N],
> > > > > > > TYPE
> > > > > > > > > > > > complex b[restrict N], TYPE complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] = a[i] * conj (b[i]);
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +__attribute__((noinline,noipa))
> > > > > > > > > > > > > +void MK(mul_conj_both, PREF) (TYPE complex a[restrict N],
> > > > > TYPE
> > > > > > > > > complex
> > > > > > > > > > > > b[restrict N], TYPE complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] = conj (a[i]) * conj (b[i]);
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +// ----- ADD
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +// Complex ADD instructions rotating the result
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +__attribute__((noinline,noipa))
> > > > > > > > > > > > > +void MK(add0, PREF) (TYPE complex a[restrict N], TYPE
> > > > > complex
> > > > > > > > > b[restrict
> > > > > > > > > > > > N], TYPE complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] = a[i] + b[i];
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +__attribute__((noinline,noipa))
> > > > > > > > > > > > > +void MK(add90, PREF) (TYPE complex a[restrict N], TYPE
> > > > > complex
> > > > > > > > > > > > b[restrict N], TYPE complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] = (a[i] + b[i]) * I;
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +__attribute__((noinline,noipa))
> > > > > > > > > > > > > +void MK(add180, PREF) (TYPE complex a[restrict N], TYPE
> > > > > > > complex
> > > > > > > > > > > > b[restrict N], TYPE complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] = (a[i] + b[i]) * I * I;
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +__attribute__((noinline,noipa))
> > > > > > > > > > > > > +void MK(add270, PREF) (TYPE complex a[restrict N], TYPE
> > > > > > > complex
> > > > > > > > > > > > b[restrict N], TYPE complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] = (a[i] + b[i]) * I * I * I;
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +// Complex ADD instructions rotating the second parameter.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +__attribute__((noinline,noipa))
> > > > > > > > > > > > > +void MK(add0_snd, PREF) (TYPE complex a[restrict N], TYPE
> > > > > > > complex
> > > > > > > > > > > > b[restrict N], TYPE complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] = a[i] + b[i];
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +__attribute__((noinline,noipa))
> > > > > > > > > > > > > +void MK(add90_snd, PREF) (TYPE complex a[restrict N], TYPE
> > > > > > > > > complex
> > > > > > > > > > > > b[restrict N], TYPE complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] = a[i] + (b[i] * I);
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +__attribute__((noinline,noipa))
> > > > > > > > > > > > > +void MK(add180_snd, PREF) (TYPE complex a[restrict N],
> > > > > TYPE
> > > > > > > > > complex
> > > > > > > > > > > > b[restrict N], TYPE complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] = a[i] + (b[i] * I * I);
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +__attribute__((noinline,noipa))
> > > > > > > > > > > > > +void MK(add270_snd, PREF) (TYPE complex a[restrict N],
> > > > > TYPE
> > > > > > > > > complex
> > > > > > > > > > > > b[restrict N], TYPE complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] = a[i] + (b[i] * I * I * I);
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +// Complex ADD instructions with conjucated values.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +__attribute__((noinline,noipa))
> > > > > > > > > > > > > +void MK(add_conj_first, PREF) (TYPE complex a[restrict N],
> > > > > TYPE
> > > > > > > > > complex
> > > > > > > > > > > > b[restrict N], TYPE complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] = conj (a[i]) + b[i];
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +__attribute__((noinline,noipa))
> > > > > > > > > > > > > +void MK(add_conj_second, PREF) (TYPE complex a[restrict N],
> > > > > > > TYPE
> > > > > > > > > > > > complex b[restrict N], TYPE complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] = a[i] + conj (b[i]);
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +__attribute__((noinline,noipa))
> > > > > > > > > > > > > +void MK(add_conj_both, PREF) (TYPE complex a[restrict N],
> > > > > TYPE
> > > > > > > > > complex
> > > > > > > > > > > > b[restrict N], TYPE complex c[restrict N])
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + for (int i=0; i < N; i++)
> > > > > > > > > > > > > + c[i] = conj (a[i]) + conj (b[i]);
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +
> > > > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-
> > > > > slp-
> > > > > > > > > complex-
> > > > > > > > > > > > add-double.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-
> > > > > bb-
> > > > > > > slp-
> > > > > > > > > > > > complex-add-double.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 0000000000000000000000000000000000000000..b5c252b176c7c21c9484574edc
> > > > > > > > > > > > 9a56d9d142e13c
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-
> > > > > > > complex-
> > > > > > > > > add-
> > > > > > > > > > > > double.c
> > > > > > > > > > > > > @@ -0,0 +1,8 @@
> > > > > > > > > > > > > +/* { dg-do compile } */
> > > > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_double }
> > > > > */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define TYPE double
> > > > > > > > > > > > > +#define N 16
> > > > > > > > > > > > > +#include "complex-add-template.c"
> > > > > > > > > > > > > \ No newline at end of file
> > > > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-
> > > > > slp-
> > > > > > > > > complex-
> > > > > > > > > > > > add-float.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-
> > > > > slp-
> > > > > > > > > complex-
> > > > > > > > > > > > add-float.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 0000000000000000000000000000000000000000..1a08e00bcede874d6acac9e2e
> > > > > > > > > > > > bece5851c583530
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-
> > > > > > > complex-
> > > > > > > > > add-
> > > > > > > > > > > > float.c
> > > > > > > > > > > > > @@ -0,0 +1,8 @@
> > > > > > > > > > > > > +/* { dg-do compile } */
> > > > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_float } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define TYPE float
> > > > > > > > > > > > > +#define N 16
> > > > > > > > > > > > > +#include "complex-add-template.c"
> > > > > > > > > > > > > \ No newline at end of file
> > > > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-
> > > > > slp-
> > > > > > > > > complex-
> > > > > > > > > > > > add-half-float.c b/gcc/testsuite/gcc.dg/vect/complex/fast-
> > > > > math-
> > > > > > > bb-
> > > > > > > > > slp-
> > > > > > > > > > > > complex-add-half-float.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 0000000000000000000000000000000000000000..e4d5c55c0a88f4ac8d45262ee1
> > > > > > > > > > > > 3632443318931f
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-
> > > > > > > complex-
> > > > > > > > > add-
> > > > > > > > > > > > half-float.c
> > > > > > > > > > > > > @@ -0,0 +1,8 @@
> > > > > > > > > > > > > +/* { dg-do compile } */
> > > > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_half } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_3a_fp16_complex_neon } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define TYPE _Float16
> > > > > > > > > > > > > +#define N 16
> > > > > > > > > > > > > +#include "complex-add-template.c"
> > > > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-
> > > > > slp-
> > > > > > > > > complex-
> > > > > > > > > > > > add-pattern-double.c
> > > > > b/gcc/testsuite/gcc.dg/vect/complex/fast-
> > > > > > > math-
> > > > > > > > > bb-
> > > > > > > > > > > > slp-complex-add-pattern-double.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 0000000000000000000000000000000000000000..6dd3f98a7a52b21f0365cd6c43
> > > > > > > > > > > > 94b20927a6a320
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-
> > > > > > > complex-
> > > > > > > > > add-
> > > > > > > > > > > > pattern-double.c
> > > > > > > > > > > > > @@ -0,0 +1,8 @@
> > > > > > > > > > > > > +/* { dg-do compile } */
> > > > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_double }
> > > > > */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define TYPE double
> > > > > > > > > > > > > +#define N 16
> > > > > > > > > > > > > +#include "complex-add-pattern-template.c"
> > > > > > > > > > > > > \ No newline at end of file
> > > > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-
> > > > > slp-
> > > > > > > > > complex-
> > > > > > > > > > > > add-pattern-float.c b/gcc/testsuite/gcc.dg/vect/complex/fast-
> > > > > > > math-
> > > > > > > > > bb-slp-
> > > > > > > > > > > > complex-add-pattern-float.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 0000000000000000000000000000000000000000..3d02cd455340e9510ae536d8d
> > > > > > > > > > > > 109b39f811743f0
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-
> > > > > > > complex-
> > > > > > > > > add-
> > > > > > > > > > > > pattern-float.c
> > > > > > > > > > > > > @@ -0,0 +1,8 @@
> > > > > > > > > > > > > +/* { dg-do compile } */
> > > > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_float } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define TYPE float
> > > > > > > > > > > > > +#define N 16
> > > > > > > > > > > > > +#include "complex-add-pattern-template.c"
> > > > > > > > > > > > > \ No newline at end of file
> > > > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-
> > > > > slp-
> > > > > > > > > complex-
> > > > > > > > > > > > add-pattern-half-float.c
> > > > > b/gcc/testsuite/gcc.dg/vect/complex/fast-
> > > > > > > > > math-bb-
> > > > > > > > > > > > slp-complex-add-pattern-half-float.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 0000000000000000000000000000000000000000..51dcd2724f51cb2d91f0aa234a
> > > > > > > > > > > > bc39f92275aa42
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-
> > > > > > > complex-
> > > > > > > > > add-
> > > > > > > > > > > > pattern-half-float.c
> > > > > > > > > > > > > @@ -0,0 +1,8 @@
> > > > > > > > > > > > > +/* { dg-do compile } */
> > > > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_half } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_3a_fp16_complex_neon } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define TYPE _Float16
> > > > > > > > > > > > > +#define N 16
> > > > > > > > > > > > > +#include "complex-add-pattern-template.c"
> > > > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-
> > > > > > > complex-
> > > > > > > > > add-
> > > > > > > > > > > > double.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-
> > > > > complex-
> > > > > > > > > add-
> > > > > > > > > > > > double.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 0000000000000000000000000000000000000000..606b8992b4890e4e221315776
> > > > > > > > > > > > 1bfac62f72aa40e
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-
> > > > > > > add-
> > > > > > > > > > > > double.c
> > > > > > > > > > > > > @@ -0,0 +1,8 @@
> > > > > > > > > > > > > +/* { dg-do compile } */
> > > > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_double }
> > > > > */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define TYPE double
> > > > > > > > > > > > > +#define N 200
> > > > > > > > > > > > > +#include "complex-add-template.c"
> > > > > > > > > > > > > \ No newline at end of file
> > > > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-
> > > > > > > complex-
> > > > > > > > > add-
> > > > > > > > > > > > float.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-
> > > > > complex-
> > > > > > > add-
> > > > > > > > > float.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 0000000000000000000000000000000000000000..5c640f0b14107b7cb8ad153597
> > > > > > > > > > > > 5d266e00b1d1b2
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-
> > > > > > > add-
> > > > > > > > > float.c
> > > > > > > > > > > > > @@ -0,0 +1,8 @@
> > > > > > > > > > > > > +/* { dg-do compile } */
> > > > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_float } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define TYPE float
> > > > > > > > > > > > > +#define N 200
> > > > > > > > > > > > > +#include "complex-add-template.c"
> > > > > > > > > > > > > \ No newline at end of file
> > > > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-
> > > > > > > complex-
> > > > > > > > > add-
> > > > > > > > > > > > half-float.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-
> > > > > > > complex-
> > > > > > > > > add-
> > > > > > > > > > > > half-float.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 0000000000000000000000000000000000000000..6111356cbd4a9c86a9356bf674
> > > > > > > > > > > > 70512db44cfed2
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-
> > > > > > > add-
> > > > > > > > > half-
> > > > > > > > > > > > float.c
> > > > > > > > > > > > > @@ -0,0 +1,8 @@
> > > > > > > > > > > > > +/* { dg-do compile } */
> > > > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_half } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_3a_fp16_complex_neon } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define TYPE _Float16
> > > > > > > > > > > > > +#define N 200
> > > > > > > > > > > > > +#include "complex-add-template.c"
> > > > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-
> > > > > > > complex-
> > > > > > > > > add-
> > > > > > > > > > > > pattern-double.c b/gcc/testsuite/gcc.dg/vect/complex/fast-
> > > > > math-
> > > > > > > > > complex-
> > > > > > > > > > > > add-pattern-double.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 0000000000000000000000000000000000000000..00f383d8cfddd1176cf4894ac7f
> > > > > > > > > > > > d4d0ae9bcb297
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-
> > > > > > > add-
> > > > > > > > > > > > pattern-double.c
> > > > > > > > > > > > > @@ -0,0 +1,8 @@
> > > > > > > > > > > > > +/* { dg-do compile } */
> > > > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_double }
> > > > > */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define TYPE double
> > > > > > > > > > > > > +#define N 200
> > > > > > > > > > > > > +#include "complex-add-pattern-template.c"
> > > > > > > > > > > > > \ No newline at end of file
> > > > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-
> > > > > > > complex-
> > > > > > > > > add-
> > > > > > > > > > > > pattern-float.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-
> > > > > > > > > complex-
> > > > > > > > > > > > add-pattern-float.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 0000000000000000000000000000000000000000..ed108b14a3b704819a3c425b4
> > > > > > > > > > > > d19d1103aeb432d
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-
> > > > > > > add-
> > > > > > > > > > > > pattern-float.c
> > > > > > > > > > > > > @@ -0,0 +1,8 @@
> > > > > > > > > > > > > +/* { dg-do compile } */
> > > > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_float } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define TYPE float
> > > > > > > > > > > > > +#define N 200
> > > > > > > > > > > > > +#include "complex-add-pattern-template.c"
> > > > > > > > > > > > > \ No newline at end of file
> > > > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-
> > > > > > > complex-
> > > > > > > > > add-
> > > > > > > > > > > > pattern-half-float.c b/gcc/testsuite/gcc.dg/vect/complex/fast-
> > > > > > > math-
> > > > > > > > > > > > complex-add-pattern-half-float.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 0000000000000000000000000000000000000000..aa239445a6563ea0ee15751a7
> > > > > > > > > > > > f6a989fb1c9d9a7
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-
> > > > > > > add-
> > > > > > > > > > > > pattern-half-float.c
> > > > > > > > > > > > > @@ -0,0 +1,8 @@
> > > > > > > > > > > > > +/* { dg-do compile } */
> > > > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_half } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_3a_fp16_complex_neon } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define TYPE _Float16
> > > > > > > > > > > > > +#define N 200
> > > > > > > > > > > > > +#include "complex-add-pattern-template.c"
> > > > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-
> > > > > add-
> > > > > > > > > byte.c
> > > > > > > > > > > > b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-byte.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 0000000000000000000000000000000000000000..4001f689671e0973b64665e6b
> > > > > > > > > > > > 9ea96c755277fae
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-
> > > > > > > byte.c
> > > > > > > > > > > > > @@ -0,0 +1,9 @@
> > > > > > > > > > > > > +/* { dg-do compile } */
> > > > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_byte } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define TYPE int8_t
> > > > > > > > > > > > > +#define N 200
> > > > > > > > > > > > > +#include <stdint.h>
> > > > > > > > > > > > > +#include "complex-add-template.c"
> > > > > > > > > > > > > \ No newline at end of file
> > > > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-
> > > > > add-
> > > > > > > int.c
> > > > > > > > > > > > b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-int.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 0000000000000000000000000000000000000000..1f006556af09027f22cefe12947
> > > > > > > > > > > > 5bd7e977054a0
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-
> > > > > int.c
> > > > > > > > > > > > > @@ -0,0 +1,9 @@
> > > > > > > > > > > > > +/* { dg-do compile } */
> > > > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_int } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define TYPE int32_t
> > > > > > > > > > > > > +#define N 200
> > > > > > > > > > > > > +#include <stdint.h>
> > > > > > > > > > > > > +#include "complex-add-template.c"
> > > > > > > > > > > > > \ No newline at end of file
> > > > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-
> > > > > add-
> > > > > > > > > long.c
> > > > > > > > > > > > b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-long.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 0000000000000000000000000000000000000000..1e82657abf8316228e13651d1
> > > > > > > > > > > > 11b7d256d0f266f
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-
> > > > > > > long.c
> > > > > > > > > > > > > @@ -0,0 +1,9 @@
> > > > > > > > > > > > > +/* { dg-do compile } */
> > > > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_long } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define TYPE int64_t
> > > > > > > > > > > > > +#define N 200
> > > > > > > > > > > > > +#include <stdint.h>
> > > > > > > > > > > > > +#include "complex-add-template.c"
> > > > > > > > > > > > > \ No newline at end of file
> > > > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-
> > > > > add-
> > > > > > > > > pattern-
> > > > > > > > > > > > byte.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-
> > > > > > > > > pattern-
> > > > > > > > > > > > byte.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 0000000000000000000000000000000000000000..db72e147c9dc4511fb46a0366
> > > > > > > > > > > > 79b7ba77b97ffe3
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-
> > > > > > > > > pattern-
> > > > > > > > > > > > byte.c
> > > > > > > > > > > > > @@ -0,0 +1,9 @@
> > > > > > > > > > > > > +/* { dg-do compile } */
> > > > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_byte } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define TYPE int8_t
> > > > > > > > > > > > > +#define N 200
> > > > > > > > > > > > > +#include <stdint.h>
> > > > > > > > > > > > > +#include "complex-add-pattern-template.c"
> > > > > > > > > > > > > \ No newline at end of file
> > > > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-
> > > > > add-
> > > > > > > > > pattern-
> > > > > > > > > > > > int.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-
> > > > > > > pattern-
> > > > > > > > > int.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 0000000000000000000000000000000000000000..8d350d69ae0eefba073aba8ae
> > > > > > > > > > > > 7b3da4b39c845df
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-
> > > > > > > > > pattern-int.c
> > > > > > > > > > > > > @@ -0,0 +1,9 @@
> > > > > > > > > > > > > +/* { dg-do compile } */
> > > > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_int } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define TYPE int32_t
> > > > > > > > > > > > > +#define N 200
> > > > > > > > > > > > > +#include <stdint.h>
> > > > > > > > > > > > > +#include "complex-add-pattern-template.c"
> > > > > > > > > > > > > \ No newline at end of file
> > > > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-
> > > > > add-
> > > > > > > > > pattern-
> > > > > > > > > > > > long.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-
> > > > > > > > > pattern-
> > > > > > > > > > > > long.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 0000000000000000000000000000000000000000..c8e56cd4f91bc6254a5fb2177b
> > > > > > > > > > > > 1f2484859bcf98
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-
> > > > > > > > > pattern-
> > > > > > > > > > > > long.c
> > > > > > > > > > > > > @@ -0,0 +1,9 @@
> > > > > > > > > > > > > +/* { dg-do compile } */
> > > > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_long } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define TYPE int64_t
> > > > > > > > > > > > > +#define N 200
> > > > > > > > > > > > > +#include <stdint.h>
> > > > > > > > > > > > > +#include "complex-add-pattern-template.c"
> > > > > > > > > > > > > \ No newline at end of file
> > > > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-
> > > > > add-
> > > > > > > > > pattern-
> > > > > > > > > > > > short.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-
> > > > > add-
> > > > > > > > > pattern-
> > > > > > > > > > > > short.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 0000000000000000000000000000000000000000..2c54d756c9b2f54352d6dba97c
> > > > > > > > > > > > cf05d37865cbaa
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-
> > > > > > > > > pattern-
> > > > > > > > > > > > short.c
> > > > > > > > > > > > > @@ -0,0 +1,9 @@
> > > > > > > > > > > > > +/* { dg-do compile } */
> > > > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_short }
> > > > > */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define TYPE int16_t
> > > > > > > > > > > > > +#define N 200
> > > > > > > > > > > > > +#include <stdint.h>
> > > > > > > > > > > > > +#include "complex-add-pattern-template.c"
> > > > > > > > > > > > > \ No newline at end of file
> > > > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-
> > > > > add-
> > > > > > > > > pattern-
> > > > > > > > > > > > unsigned-byte.c b/gcc/testsuite/gcc.dg/vect/complex/vect-
> > > > > > > complex-
> > > > > > > > > add-
> > > > > > > > > > > > pattern-unsigned-byte.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 0000000000000000000000000000000000000000..f54b903aa308a5dc68654b9ffd
> > > > > > > > > > > > 0a0c230f58e4cc
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-
> > > > > > > > > pattern-
> > > > > > > > > > > > unsigned-byte.c
> > > > > > > > > > > > > @@ -0,0 +1,9 @@
> > > > > > > > > > > > > +/* { dg-do compile } */
> > > > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_byte } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define TYPE uint8_t
> > > > > > > > > > > > > +#define N 200
> > > > > > > > > > > > > +#include <stdint.h>
> > > > > > > > > > > > > +#include "complex-add-pattern-template.c"
> > > > > > > > > > > > > \ No newline at end of file
> > > > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-
> > > > > add-
> > > > > > > > > pattern-
> > > > > > > > > > > > unsigned-int.c b/gcc/testsuite/gcc.dg/vect/complex/vect-
> > > > > complex-
> > > > > > > > > add-
> > > > > > > > > > > > pattern-unsigned-int.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 0000000000000000000000000000000000000000..96824f16b821236f5499dcb904
> > > > > > > > > > > > 54e72a1326df5c
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-
> > > > > > > > > pattern-
> > > > > > > > > > > > unsigned-int.c
> > > > > > > > > > > > > @@ -0,0 +1,9 @@
> > > > > > > > > > > > > +/* { dg-do compile } */
> > > > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_int } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define TYPE uint32_t
> > > > > > > > > > > > > +#define N 200
> > > > > > > > > > > > > +#include <stdint.h>
> > > > > > > > > > > > > +#include "complex-add-pattern-template.c"
> > > > > > > > > > > > > \ No newline at end of file
> > > > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-
> > > > > add-
> > > > > > > > > pattern-
> > > > > > > > > > > > unsigned-long.c b/gcc/testsuite/gcc.dg/vect/complex/vect-
> > > > > > > complex-
> > > > > > > > > add-
> > > > > > > > > > > > pattern-unsigned-long.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 0000000000000000000000000000000000000000..8bd9f077b233eaf6e0c4ff4df9
> > > > > > > > > > > > b97c109df7d002
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-
> > > > > > > > > pattern-
> > > > > > > > > > > > unsigned-long.c
> > > > > > > > > > > > > @@ -0,0 +1,9 @@
> > > > > > > > > > > > > +/* { dg-do compile } */
> > > > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_long } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define TYPE uint64_t
> > > > > > > > > > > > > +#define N 200
> > > > > > > > > > > > > +#include <stdint.h>
> > > > > > > > > > > > > +#include "complex-add-pattern-template.c"
> > > > > > > > > > > > > \ No newline at end of file
> > > > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-
> > > > > add-
> > > > > > > > > pattern-
> > > > > > > > > > > > unsigned-short.c b/gcc/testsuite/gcc.dg/vect/complex/vect-
> > > > > > > complex-
> > > > > > > > > add-
> > > > > > > > > > > > pattern-unsigned-short.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 0000000000000000000000000000000000000000..7e5154d73703512dceda39e37
> > > > > > > > > > > > f0ebd0eb7c2e057
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-
> > > > > > > > > pattern-
> > > > > > > > > > > > unsigned-short.c
> > > > > > > > > > > > > @@ -0,0 +1,9 @@
> > > > > > > > > > > > > +/* { dg-do compile } */
> > > > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_short }
> > > > > */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define TYPE uint16_t
> > > > > > > > > > > > > +#define N 200
> > > > > > > > > > > > > +#include <stdint.h>
> > > > > > > > > > > > > +#include "complex-add-pattern-template.c"
> > > > > > > > > > > > > \ No newline at end of file
> > > > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-
> > > > > add-
> > > > > > > > > short.c
> > > > > > > > > > > > b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-short.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 0000000000000000000000000000000000000000..ca0d618b991255f3ba34ee40f
> > > > > > > > > > > > b876fd053e8121b
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-
> > > > > > > short.c
> > > > > > > > > > > > > @@ -0,0 +1,9 @@
> > > > > > > > > > > > > +/* { dg-do compile } */
> > > > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_short }
> > > > > */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define TYPE int16_t
> > > > > > > > > > > > > +#define N 200
> > > > > > > > > > > > > +#include <stdint.h>
> > > > > > > > > > > > > +#include "complex-add-template.c"
> > > > > > > > > > > > > \ No newline at end of file
> > > > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-
> > > > > add-
> > > > > > > > > > > > unsigned-byte.c b/gcc/testsuite/gcc.dg/vect/complex/vect-
> > > > > > > complex-
> > > > > > > > > add-
> > > > > > > > > > > > unsigned-byte.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 0000000000000000000000000000000000000000..925cfc2ea27b0d4ffbdadfb86a
> > > > > > > > > > > > bc5c198f57469d
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-
> > > > > > > > > unsigned-
> > > > > > > > > > > > byte.c
> > > > > > > > > > > > > @@ -0,0 +1,9 @@
> > > > > > > > > > > > > +/* { dg-do compile } */
> > > > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_byte } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define TYPE uint8_t
> > > > > > > > > > > > > +#define N 200
> > > > > > > > > > > > > +#include <stdint.h>
> > > > > > > > > > > > > +#include "complex-add-template.c"
> > > > > > > > > > > > > \ No newline at end of file
> > > > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-
> > > > > add-
> > > > > > > > > > > > unsigned-int.c b/gcc/testsuite/gcc.dg/vect/complex/vect-
> > > > > complex-
> > > > > > > > > add-
> > > > > > > > > > > > unsigned-int.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 0000000000000000000000000000000000000000..6a70c6ebf0586c11a17cb1ad2c
> > > > > > > > > > > > add0d5927c2aca
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-
> > > > > > > > > unsigned-
> > > > > > > > > > > > int.c
> > > > > > > > > > > > > @@ -0,0 +1,9 @@
> > > > > > > > > > > > > +/* { dg-do compile } */
> > > > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_int } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define TYPE uint32_t
> > > > > > > > > > > > > +#define N 200
> > > > > > > > > > > > > +#include <stdint.h>
> > > > > > > > > > > > > +#include "complex-add-template.c"
> > > > > > > > > > > > > \ No newline at end of file
> > > > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-
> > > > > add-
> > > > > > > > > > > > unsigned-long.c b/gcc/testsuite/gcc.dg/vect/complex/vect-
> > > > > > > complex-
> > > > > > > > > add-
> > > > > > > > > > > > unsigned-long.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 0000000000000000000000000000000000000000..084080aeb4386bf41b0e23d0c
> > > > > > > > > > > > 684917b2b0435d1
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-
> > > > > > > > > unsigned-
> > > > > > > > > > > > long.c
> > > > > > > > > > > > > @@ -0,0 +1,9 @@
> > > > > > > > > > > > > +/* { dg-do compile } */
> > > > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_long } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define TYPE uint64_t
> > > > > > > > > > > > > +#define N 200
> > > > > > > > > > > > > +#include <stdint.h>
> > > > > > > > > > > > > +#include "complex-add-template.c"
> > > > > > > > > > > > > \ No newline at end of file
> > > > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-
> > > > > add-
> > > > > > > > > > > > unsigned-short.c b/gcc/testsuite/gcc.dg/vect/complex/vect-
> > > > > > > complex-
> > > > > > > > > add-
> > > > > > > > > > > > unsigned-short.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 0000000000000000000000000000000000000000..1379608a60310fd26b18e3db2
> > > > > > > > > > > > b6294c28bf5bf2e
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-
> > > > > > > > > unsigned-
> > > > > > > > > > > > short.c
> > > > > > > > > > > > > @@ -0,0 +1,9 @@
> > > > > > > > > > > > > +/* { dg-do compile } */
> > > > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_short }
> > > > > */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define TYPE uint16_t
> > > > > > > > > > > > > +#define N 200
> > > > > > > > > > > > > +#include <stdint.h>
> > > > > > > > > > > > > +#include "complex-add-template.c"
> > > > > > > > > > > > > \ No newline at end of file
> > > > > > > > > > > > > diff --git a/gcc/testsuite/lib/target-supports.exp
> > > > > > > > > b/gcc/testsuite/lib/target-
> > > > > > > > > > > > supports.exp
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 22acda2a74fdfa51aebbc311d5cc84763b0ffc63..baa5e4a569263edda2125bd8ac
> > > > > > > > > > > > a6f5b19bbad783 100644
> > > > > > > > > > > > > --- a/gcc/testsuite/lib/target-supports.exp
> > > > > > > > > > > > > +++ b/gcc/testsuite/lib/target-supports.exp
> > > > > > > > > > > > > @@ -3355,7 +3355,102 @@ proc
> > > > > check_effective_target_vect_int
> > > > > > > { } {
> > > > > > > > > > > > > }}]
> > > > > > > > > > > > > }
> > > > > > > > > > > > >
> > > > > > > > > > > > > -# Return 1 if the target supports signed int->float conversion
> > > > > > > > > > > > > +# Return 1 if the target supports hardware vectorization of
> > > > > > > complex
> > > > > > > > > > > > additions of
> > > > > > > > > > > > > +# byte, 0 otherwise.
> > > > > > > > > > > > > +#
> > > > > > > > > > > > > +# This won't change for different subtargets so cache the
> > > > > result.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +proc check_effective_target_vect_complex_add_byte { } {
> > > > > > > > > > > > > + return [check_cached_effective_target_indexed
> > > > > > > > > > > > vect_complex_add_byte {
> > > > > > > > > > > > > + expr {
> > > > > > > > > > > > > + [check_effective_target_aarch64_sve2]
> > > > > > > > > > > > > + ||
> > > > > [check_effective_target_arm_v8_1m_mve_fp_ok]
> > > > > > > > > > > > > + }}]
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +# Return 1 if the target supports hardware vectorization of
> > > > > > > complex
> > > > > > > > > > > > additions of
> > > > > > > > > > > > > +# short, 0 otherwise.
> > > > > > > > > > > > > +#
> > > > > > > > > > > > > +# This won't change for different subtargets so cache the
> > > > > result.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +proc check_effective_target_vect_complex_add_short { } {
> > > > > > > > > > > > > + return [check_cached_effective_target_indexed
> > > > > > > > > > > > vect_complex_add_short {
> > > > > > > > > > > > > + expr {
> > > > > > > > > > > > > + [check_effective_target_aarch64_sve2]
> > > > > > > > > > > > > + ||
> > > > > [check_effective_target_arm_v8_1m_mve_fp_ok]
> > > > > > > > > > > > > + }}]
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +# Return 1 if the target supports hardware vectorization of
> > > > > > > complex
> > > > > > > > > > > > additions of
> > > > > > > > > > > > > +# int, 0 otherwise.
> > > > > > > > > > > > > +#
> > > > > > > > > > > > > +# This won't change for different subtargets so cache the
> > > > > result.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +proc check_effective_target_vect_complex_add_int { } {
> > > > > > > > > > > > > + return [check_cached_effective_target_indexed
> > > > > > > > > > > > vect_complex_add_int {
> > > > > > > > > > > > > + expr {
> > > > > > > > > > > > > + [check_effective_target_aarch64_sve2]
> > > > > > > > > > > > > + ||
> > > > > [check_effective_target_arm_v8_1m_mve_fp_ok]
> > > > > > > > > > > > > + }}]
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +# Return 1 if the target supports hardware vectorization of
> > > > > > > complex
> > > > > > > > > > > > additions of
> > > > > > > > > > > > > +# long, 0 otherwise.
> > > > > > > > > > > > > +#
> > > > > > > > > > > > > +# This won't change for different subtargets so cache the
> > > > > result.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +proc check_effective_target_vect_complex_add_long { } {
> > > > > > > > > > > > > + return [check_cached_effective_target_indexed
> > > > > > > > > > > > vect_complex_add_long {
> > > > > > > > > > > > > + expr {
> > > > > > > > > > > > > + [check_effective_target_aarch64_sve2]
> > > > > > > > > > > > > + ||
> > > > > [check_effective_target_arm_v8_1m_mve_fp_ok]
> > > > > > > > > > > > > + }}]
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +# Return 1 if the target supports hardware vectorization of
> > > > > > > complex
> > > > > > > > > > > > additions of
> > > > > > > > > > > > > +# half, 0 otherwise.
> > > > > > > > > > > > > +#
> > > > > > > > > > > > > +# This won't change for different subtargets so cache the
> > > > > result.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +proc check_effective_target_vect_complex_add_half { } {
> > > > > > > > > > > > > + return [check_cached_effective_target_indexed
> > > > > > > > > > > > vect_complex_add_half {
> > > > > > > > > > > > > + expr {
> > > > > > > > > > > > > +
> > > > > [check_effective_target_arm_v8_3a_complex_neon_ok
> > > > > > > > > > > > > + &&
> > > > > check_effective_target_arm_v8_2a_fp16_neon_ok]
> > > > > > > > > > > > > + || [check_effective_target_aarch64_sve2]
> > > > > > > > > > > > > + ||
> > > > > [check_effective_target_arm_v8_1m_mve_fp_ok]
> > > > > > > > > > > > > + }}]
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +# Return 1 if the target supports hardware vectorization of
> > > > > > > complex
> > > > > > > > > > > > additions of
> > > > > > > > > > > > > +# float, 0 otherwise.
> > > > > > > > > > > > > +#
> > > > > > > > > > > > > +# This won't change for different subtargets so cache the
> > > > > result.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +proc check_effective_target_vect_complex_add_float { } {
> > > > > > > > > > > > > + return [check_cached_effective_target_indexed
> > > > > > > > > > > > vect_complex_add_float {
> > > > > > > > > > > > > + expr {
> > > > > > > > > > > > > +
> > > > > [check_effective_target_arm_v8_3a_complex_neon_ok]
> > > > > > > > > > > > > + || [check_effective_target_aarch64_sve2]
> > > > > > > > > > > > > + ||
> > > > > [check_effective_target_arm_v8_1m_mve_fp_ok]
> > > > > > > > > > > > > + }}]
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +# Return 1 if the target supports hardware vectorization of
> > > > > > > complex
> > > > > > > > > > > > additions of
> > > > > > > > > > > > > +# double, 0 otherwise.
> > > > > > > > > > > > > +#
> > > > > > > > > > > > > +# This won't change for different subtargets so cache the
> > > > > result.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +proc check_effective_target_vect_complex_add_double { } {
> > > > > > > > > > > > > + return [check_cached_effective_target_indexed
> > > > > > > > > > > > vect_complex_add_double {
> > > > > > > > > > > > > + expr {
> > > > > > > > > > > > > +
> > > > > [check_effective_target_arm_v8_3a_complex_neon_ok]
> > > > > > > > > > > > > + || [check_effective_target_aarch64_sve2]
> > > > > > > > > > > > > + ||
> > > > > [check_effective_target_arm_v8_1m_mve_fp_ok]
> > > > > > > > > > > > > + }}]
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +# Return 1 if the target supports signed int->float conversion
> > > > > > > > > > > > > #
> > > > > > > > > > > > >
> > > > > > > > > > > > > proc check_effective_target_vect_intfloat_cvt { } {
> > > > > > > > > > > > > @@ -10367,7 +10462,7 @@ proc
> > > > > > > > > > > >
> > > > > check_effective_target_arm_v8_3a_complex_neon_ok_nocache { }
> > > > > > > {
> > > > > > > > > > > > > set et_arm_v8_3a_complex_neon_flags ""
> > > > > > > > > > > > >
> > > > > > > > > > > > > if { ![istarget arm*-*-*] && ![istarget aarch64*-*-*] } {
> > > > > > > > > > > > > - return 0;
> > > > > > > > > > > > > + return 1;
> > > > > > > > > > > > > }
> > > > > > > > > > > > >
> > > > > > > > > > > > > # Iterate through sets of options to find the compiler flags
> > > > > that
> > > > > > > > > > > > > @@ -10380,11 +10475,11 @@ proc
> > > > > > > > > > > >
> > > > > check_effective_target_arm_v8_3a_complex_neon_ok_nocache { }
> > > > > > > {
> > > > > > > > > > > > > #endif
> > > > > > > > > > > > > } "$flags -march=armv8.3-a"] } {
> > > > > > > > > > > > > set et_arm_v8_3a_complex_neon_flags "$flags -
> > > > > > > > > march=armv8.3-a"
> > > > > > > > > > > > > - return 1
> > > > > > > > > > > > > + return 0;
> > > > > > > > > > > > > }
> > > > > > > > > > > > > }
> > > > > > > > > > > > >
> > > > > > > > > > > > > - return 0;
> > > > > > > > > > > > > + return 1;
> > > > > > > > > > > > > }
> > > > > > > > > > > > >
> > > > > > > > > > > > > proc check_effective_target_arm_v8_3a_complex_neon_ok
> > > > > { } {
> > > > > > > > > > > > > @@ -10400,13 +10495,57 @@ proc
> > > > > > > > > > > > add_options_for_arm_v8_3a_complex_neon { flags } {
> > > > > > > > > > > > > return "$flags $et_arm_v8_3a_complex_neon_flags"
> > > > > > > > > > > > > }
> > > > > > > > > > > > >
> > > > > > > > > > > > > +# Return 1 if the target supports ARMv8.3 Adv.SIMD + FP16
> > > > > > > Complex
> > > > > > > > > > > > instructions
> > > > > > > > > > > > > +# instructions, 0 otherwise. The test is valid for ARM and for
> > > > > > > > > AArch64.
> > > > > > > > > > > > > +# Record the command line options needed.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +proc
> > > > > > > > > > > >
> > > > > > > check_effective_target_arm_v8_3a_fp16_complex_neon_ok_nocache
> > > > > > > > > { } {
> > > > > > > > > > > > > + global et_arm_v8_3a_fp16_complex_neon_flags
> > > > > > > > > > > > > + set et_arm_v8_3a_fp16_complex_neon_flags ""
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + if { ![istarget arm*-*-*] && ![istarget aarch64*-*-*] } {
> > > > > > > > > > > > > + return 1;
> > > > > > > > > > > > > + }
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + # Iterate through sets of options to find the compiler flags
> > > > > that
> > > > > > > > > > > > > + # need to be added to the -march option.
> > > > > > > > > > > > > + foreach flags {"" "-mfloat-abi=softfp -mfpu=auto" "-mfloat-
> > > > > > > > > abi=hard -
> > > > > > > > > > > > mfpu=auto"} {
> > > > > > > > > > > > > + if { [check_no_compiler_messages_nocache \
> > > > > > > > > > > > > + arm_v8_3a_fp16_complex_neon_ok object {
> > > > > > > > > > > > > + #if !defined (__ARM_FEATURE_COMPLEX)
> > > > > > > > > > > > > + #error "__ARM_FEATURE_COMPLEX not defined"
> > > > > > > > > > > > > + #endif
> > > > > > > > > > > > > + } "$flags -march=armv8.3-a+fp16"] } {
> > > > > > > > > > > > > + set et_arm_v8_3a_fp16_complex_neon_flags \
> > > > > > > > > > > > > + "$flags -march=armv8.3-a+fp16"
> > > > > > > > > > > > > + return 0;
> > > > > > > > > > > > > + }
> > > > > > > > > > > > > + }
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + return 1;
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +proc
> > > > > > > check_effective_target_arm_v8_3a_fp16_complex_neon_ok { }
> > > > > > > > > {
> > > > > > > > > > > > > + return [check_cached_effective_target
> > > > > > > > > > > > arm_v8_3a_fp16_complex_neon_ok \
> > > > > > > > > > > > > +
> > > > > > > > > > > >
> > > > > > > check_effective_target_arm_v8_3a_fp16_complex_neon_ok_nocache]
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +proc add_options_for_arm_v8_3a_fp16_complex_neon
> > > > > { flags }
> > > > > > > {
> > > > > > > > > > > > > + if { !
> > > > > > > > > [check_effective_target_arm_v8_3a_fp16_complex_neon_ok] } {
> > > > > > > > > > > > > + return "$flags"
> > > > > > > > > > > > > + }
> > > > > > > > > > > > > + global et_arm_v8_3a_fp16_complex_neon_flags
> > > > > > > > > > > > > + return "$flags $et_arm_v8_3a_fp16_complex_neon_flags"
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +
> > > > > > > > > > > > > # Return 1 if the target supports executing AdvSIMD
> > > > > instructions
> > > > > > > > > from
> > > > > > > > > > > > ARMv8.3
> > > > > > > > > > > > > # with the complex instruction extension, 0 otherwise. The
> > > > > test is
> > > > > > > > > valid for
> > > > > > > > > > > > > # ARM and for AArch64.
> > > > > > > > > > > > >
> > > > > > > > > > > > > proc check_effective_target_arm_v8_3a_complex_neon_hw
> > > > > { } {
> > > > > > > > > > > > > if
> > > > > { ![check_effective_target_arm_v8_3a_complex_neon_ok] }
> > > > > > > {
> > > > > > > > > > > > > - return 0;
> > > > > > > > > > > > > + return 1;
> > > > > > > > > > > > > }
> > > > > > > > > > > > > return [check_runtime
> > > > > > > arm_v8_3a_complex_neon_hw_available {
> > > > > > > > > > > > > #include "arm_neon.h"
> > > > > > > > > > > > > @@ -10431,7 +10570,7 @@ proc
> > > > > > > > > > > > check_effective_target_arm_v8_3a_complex_neon_hw { } {
> > > > > > > > > > > > > : /* No clobbers. */);
> > > > > > > > > > > > > #endif
> > > > > > > > > > > > >
> > > > > > > > > > > > > - return (results[0] == 8 && results[1] == 24) ? 1 : 0;
> > > > > > > > > > > > > + return (results[0] == 8 && results[1] == 24) ? 0 : 1;
> > > > > > > > > > > > > }
> > > > > > > > > > > > > } [add_options_for_arm_v8_3a_complex_neon ""]]
> > > > > > > > > > > > > }
> > > > > > > > > > > > > diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-
> > > > > > > patterns.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 0000000000000000000000000000000000000000..aeb402289277c4bb48b62b7e9
> > > > > > > > > > > > e074850a99d3182
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/gcc/tree-vect-slp-patterns.c
> > > > > > > > > > > > > @@ -0,0 +1,739 @@
> > > > > > > > > > > > > +/* SLP - Pattern matcher on SLP trees
> > > > > > > > > > > > > + Copyright (C) 2020 Free Software Foundation, Inc.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +This file is part of GCC.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +GCC is free software; you can redistribute it and/or modify it
> > > > > > > under
> > > > > > > > > > > > > +the terms of the GNU General Public License as published by
> > > > > the
> > > > > > > > > Free
> > > > > > > > > > > > > +Software Foundation; either version 3, or (at your option)
> > > > > any
> > > > > > > later
> > > > > > > > > > > > > +version.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +GCC is distributed in the hope that it will be useful, but
> > > > > WITHOUT
> > > > > > > > > ANY
> > > > > > > > > > > > > +WARRANTY; without even the implied warranty of
> > > > > > > > > MERCHANTABILITY or
> > > > > > > > > > > > > +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
> > > > > > > Public
> > > > > > > > > > > > License
> > > > > > > > > > > > > +for more details.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +You should have received a copy of the GNU General Public
> > > > > > > License
> > > > > > > > > > > > > +along with GCC; see the file COPYING3. If not see
> > > > > > > > > > > > > +<http://www.gnu.org/licenses/>. */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#include "config.h"
> > > > > > > > > > > > > +#include "system.h"
> > > > > > > > > > > > > +#include "coretypes.h"
> > > > > > > > > > > > > +#include "backend.h"
> > > > > > > > > > > > > +#include "target.h"
> > > > > > > > > > > > > +#include "rtl.h"
> > > > > > > > > > > > > +#include "tree.h"
> > > > > > > > > > > > > +#include "gimple.h"
> > > > > > > > > > > > > +#include "tree-pass.h"
> > > > > > > > > > > > > +#include "ssa.h"
> > > > > > > > > > > > > +#include "optabs-tree.h"
> > > > > > > > > > > > > +#include "insn-config.h"
> > > > > > > > > > > > > +#include "recog.h" /* FIXME: for insn_data */
> > > > > > > > > > > > > +#include "fold-const.h"
> > > > > > > > > > > > > +#include "stor-layout.h"
> > > > > > > > > > > > > +#include "gimple-iterator.h"
> > > > > > > > > > > > > +#include "cfgloop.h"
> > > > > > > > > > > > > +#include "tree-vectorizer.h"
> > > > > > > > > > > > > +#include "langhooks.h"
> > > > > > > > > > > > > +#include "gimple-walk.h"
> > > > > > > > > > > > > +#include "dbgcnt.h"
> > > > > > > > > > > > > +#include "tree-vector-builder.h"
> > > > > > > > > > > > > +#include "vec-perm-indices.h"
> > > > > > > > > > > > > +#include "gimple-fold.h"
> > > > > > > > > > > > > +#include "internal-fn.h"
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +/* SLP Pattern matching mechanism.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + This extension to the SLP vectorizer allows one to transform
> > > > > the
> > > > > > > > > > > > generated SLP
> > > > > > > > > > > > > + tree based on any pattern. The difference between this
> > > > > and
> > > > > > > the
> > > > > > > > > normal
> > > > > > > > > > > > vect
> > > > > > > > > > > > > + pattern matcher is that unlike the former, this matcher
> > > > > allows
> > > > > > > you
> > > > > > > > > to
> > > > > > > > > > > > match
> > > > > > > > > > > > > + with instructions that do not belong to the same SSA
> > > > > dominator
> > > > > > > > > graph.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + The only requirement that this pattern matcher has is that
> > > > > you
> > > > > > > are
> > > > > > > > > only
> > > > > > > > > > > > > + only allowed to either match an entire group or none.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + The pattern matcher currently only allows you to perform
> > > > > > > > > replacements
> > > > > > > > > > > > to
> > > > > > > > > > > > > + internal functions.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + Once the patterns are matched it is one way, these cannot
> > > > > be
> > > > > > > > > undone. It
> > > > > > > > > > > > is
> > > > > > > > > > > > > + currently not supported to match patterns recursively.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + To add a new pattern, implement the vect_pattern class and
> > > > > > > add
> > > > > > > > > the
> > > > > > > > > > > > type to
> > > > > > > > > > > > > + slp_patterns.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +*/
> > > > > > > > > > > > > +
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > +/*********************************************************
> > > > > > > > > > > > **********************
> > > > > > > > > > > > > + * vect_pattern class
> > > > > > > > > > > > > +
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > **********************************************************
> > > > > > > > > > > > ********************/
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +/* Default implementation of recognize that peforms
> > > > > matching,
> > > > > > > > > validation
> > > > > > > > > > > > and
> > > > > > > > > > > > > + replacement of nodes but that can be overriden if required.
> > > > > */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +static bool
> > > > > > > > > > > > > +vect_pattern_validate_optab (internal_fn ifn, slp_tree node)
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + tree vectype = SLP_TREE_VECTYPE (node);
> > > > > > > > > > > > > + if (ifn == IFN_LAST || !vectype)
> > > > > > > > > > > > > + return false;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + if (dump_enabled_p ())
> > > > > > > > > > > > > + dump_printf_loc (MSG_NOTE, vect_location,
> > > > > > > > > > > > > + "Found %s pattern in SLP tree\n",
> > > > > > > > > > > > > + internal_fn_name (ifn));
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + if (direct_internal_fn_supported_p (ifn, vectype,
> > > > > > > > > > > > OPTIMIZE_FOR_SPEED))
> > > > > > > > > > > > > + {
> > > > > > > > > > > > > + if (dump_enabled_p ())
> > > > > > > > > > > > > + dump_printf_loc (MSG_NOTE, vect_location,
> > > > > > > > > > > > > + "Target supports %s vectorization
> > > > > with
> > > > > > > > > mode %T\n",
> > > > > > > > > > > > > + internal_fn_name (ifn), vectype);
> > > > > > > > > > > > > + }
> > > > > > > > > > > > > + else
> > > > > > > > > > > > > + {
> > > > > > > > > > > > > + if (dump_enabled_p ())
> > > > > > > > > > > > > + {
> > > > > > > > > > > > > + if (!vectype)
> > > > > > > > > > > > > + dump_printf_loc (MSG_NOTE, vect_location,
> > > > > > > > > > > > > + "Target does not support vector
> > > > > type
> > > > > > > > > for %T\n",
> > > > > > > > > > > > > + SLP_TREE_DEF_TYPE (node));
> > > > > > > > > > > > > + else
> > > > > > > > > > > > > + dump_printf_loc (MSG_NOTE, vect_location,
> > > > > > > > > > > > > + "Target does not support %s for
> > > > > vector
> > > > > > > > > type "
> > > > > > > > > > > > > + "%T\n", internal_fn_name (ifn),
> > > > > vectype);
> > > > > > > > > > > > > + }
> > > > > > > > > > > > > + return false;
> > > > > > > > > > > > > + }
> > > > > > > > > > > > > + return true;
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > +/*********************************************************
> > > > > > > > > > > > **********************
> > > > > > > > > > > > > + * General helper types
> > > > > > > > > > > > > +
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > **********************************************************
> > > > > > > > > > > > ********************/
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +/* The COMPLEX_OPERATION enum denotes the possible
> > > > > pair of
> > > > > > > > > > > > operations that can
> > > > > > > > > > > > > + be matched when looking for expressions that we are
> > > > > > > interested
> > > > > > > > > > > > matching for
> > > > > > > > > > > > > + complex numbers addition and mla. */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +typedef enum _complex_operation : unsigned {
> > > > > > > > > > > > > + PLUS_PLUS,
> > > > > > > > > > > > > + MINUS_PLUS,
> > > > > > > > > > > > > + PLUS_MINUS,
> > > > > > > > > > > > > + MULT_MULT,
> > > > > > > > > > > > > + CMPLX_NONE
> > > > > > > > > > > > > +} complex_operation_t;
> > > > > > > > > > > > > +
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > +/*********************************************************
> > > > > > > > > > > > **********************
> > > > > > > > > > > > > + * General helper functions
> > > > > > > > > > > > > +
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > **********************************************************
> > > > > > > > > > > > ********************/
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +/* Helper function of linear_loads_p that checks to see if the
> > > > > > > load
> > > > > > > > > > > > permutation
> > > > > > > > > > > > > + is sequential and in monotonically increasing order of loads
> > > > > with
> > > > > > > no
> > > > > > > > > gaps.
> > > > > > > > > > > > > +*/
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +static inline bool
> > > > > > > > > > > > > +is_linear_load_p (load_permutation_t loads)
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + if (loads.length() == 0)
> > > > > > > > > > > > > + return false;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + unsigned leader = loads[0];
> > > > > > > > > > > > > + unsigned load, i;
> > > > > > > > > > > > > + FOR_EACH_VEC_ELT_FROM (loads, i, load, 1)
> > > > > > > > > > > > > + if (load != ++leader)
> > > > > > > > > > > > > + return false;
> > > > > > > > > > > > > + return true;
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +/* Check to see if all loads rooted in ROOT are linear.
> > > > > Linearity is
> > > > > > > > > > > > > + defined as having no gaps between values loaded. */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +static load_permutation_t
> > > > > > > > > > > > > +linear_loads_p (slp_tree_to_load_perm_map_t
> > > > > *perm_cache,
> > > > > > > > > slp_tree
> > > > > > > > > > > > root,
> > > > > > > > > > > > > + bool *linear)
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + *linear = false;
> > > > > > > > > > > > > + if (!root)
> > > > > > > > > > > > > + return vNULL;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + unsigned i;
> > > > > > > > > > > > > + load_permutation_t loads = vNULL;
> > > > > > > > > > > > > + load_permutation_t *tmp;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + if ((tmp = perm_cache->get (root)) != NULL)
> > > > > > > > > > > > > + {
> > > > > > > > > > > > > + *linear = is_linear_load_p (*tmp);
> > > > > > > > > > > > > + return *tmp;
> > > > > > > > > > > > > + }
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + perm_cache->put (root, vNULL);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + /* If it's a load node, then just read the load permute. */
> > > > > > > > > > > > > + if (SLP_TREE_LOAD_PERMUTATION (root).exists ())
> > > > > > > > > > > > > + {
> > > > > > > > > > > > > + loads = SLP_TREE_LOAD_PERMUTATION (root);
> > > > > > > > > > > > > + perm_cache->put (root, loads);
> > > > > > > > > > > > > + if (!is_linear_load_p (loads))
> > > > > > > > > > > > > + return loads;
> > > > > > > > > > > > > + }
> > > > > > > > > > > > > + else if (SLP_TREE_DEF_TYPE (root) == vect_external_def)
> > > > > > > > > > > > > + {
> > > > > > > > > > > > > + loads.create (SLP_TREE_LANES (root));
> > > > > > > > > > > > > + tree op;
> > > > > > > > > > > > > + FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_OPS (root), i, op)
> > > > > > > > > > > > > + {
> > > > > > > > > > > > > + if (TREE_CODE (op) != SSA_NAME)
> > > > > > > > > > > > > + return vNULL;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + gimple *defstmt = SSA_NAME_DEF_STMT (op);
> > > > > > > > > > > > > + if (!is_gimple_assign (defstmt))
> > > > > > > > > > > > > + return vNULL;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + switch (gimple_assign_rhs_code (defstmt))
> > > > > > > > > > > > > + {
> > > > > > > > > > > > > + case IMAGPART_EXPR:
> > > > > > > > > > > > > + loads.safe_push (1);
> > > > > > > > > > > > > + break;
> > > > > > > > > > > > > + case REALPART_EXPR:
> > > > > > > > > > > > > + loads.safe_push (0);
> > > > > > > > > > > > > + break;
> > > > > > > > > > > > > + default:
> > > > > > > > > > > > > + {
> > > > > > > > > > > > > + loads.release ();
> > > > > > > > > > > > > + return vNULL;
> > > > > > > > > > > > > + }
> > > > > > > > > > > > > + }
> > > > > > > > > > > > > + }
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + perm_cache->put (root, loads);
> > > > > > > > > > > > > + if (!is_linear_load_p (loads))
> > > > > > > > > > > > > + return loads;
> > > > > > > > > > > > > + }
> > > > > > > > > > > > > + else if (SLP_TREE_DEF_TYPE (root) != vect_internal_def)
> > > > > > > > > > > > > + return vNULL;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + auto_vec<load_permutation_t> all_loads;
> > > > > > > > > > > > > + bool is_perm = SLP_TREE_LANE_PERMUTATION
> > > > > (root).exists ();
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + slp_tree child;
> > > > > > > > > > > > > + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (root), i, child)
> > > > > > > > > > > > > + {
> > > > > > > > > > > > > + loads = linear_loads_p (perm_cache, child, linear);
> > > > > > > > > > > > > + if ((!*linear && !is_perm) || !loads.exists ())
> > > > > > > > > > > > > + return loads;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + all_loads.safe_push (loads);
> > > > > > > > > > > > > + }
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + if (is_perm)
> > > > > > > > > > > > > + {
> > > > > > > > > > > > > + lane_permutation_t perm =
> > > > > SLP_TREE_LANE_PERMUTATION
> > > > > > > > > (root);
> > > > > > > > > > > > > + load_permutation_t nloads;
> > > > > > > > > > > > > + nloads.create (SLP_TREE_LANES (root));
> > > > > > > > > > > > > + nloads.quick_grow (SLP_TREE_LANES (root));
> > > > > > > > > > > > > + for (i = 0; i < SLP_TREE_LANES (root); i++)
> > > > > > > > > > > > > + nloads[i] = all_loads[perm[i].first][perm[i].second];
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + perm_cache->put (root, nloads);
> > > > > > > > > > > > > + if (!is_linear_load_p (nloads))
> > > > > > > > > > > > > + return nloads;
> > > > > > > > > > > > > + loads = nloads;
> > > > > > > > > > > > > + }
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + perm_cache->put (root, loads);
> > > > > > > > > > > > > + *linear = true;
> > > > > > > > > > > > > + return loads;
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +/* This function attempts to make a node rooted in NODE
> > > > > with
> > > > > > > > > parent
> > > > > > > > > > > > PARENT
> > > > > > > > > > > > > + linear. If the node if already linear than the node itself is
> > > > > > > returned
> > > > > > > > > > > > > + in RESULT.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + If the node is not linear then a new VEC_PERM_EXPR node
> > > > > is
> > > > > > > > > created
> > > > > > > > > > > > with a
> > > > > > > > > > > > > + lane permute that when applied will make the node linear.
> > > > > If
> > > > > > > such
> > > > > > > > > a
> > > > > > > > > > > > > + permute cannot be created then FALSE is returned from
> > > > > the
> > > > > > > > > function.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + Here linearity is defined as having a sequential, monotically
> > > > > > > > > increasing
> > > > > > > > > > > > > + load position inside the load permute generated by the
> > > > > loads
> > > > > > > > > reachable
> > > > > > > > > > > > from
> > > > > > > > > > > > > + NODE. */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +static bool
> > > > > > > > > > > > > +vect_slp_make_linear (slp_tree_to_load_perm_map_t
> > > > > > > > > *perm_cache,
> > > > > > > > > > > > > + slp_tree parent, slp_tree node, slp_tree
> > > > > *result)
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + bool is_linear = false;
> > > > > > > > > > > > > + unsigned x, val;
> > > > > > > > > > > > > + load_permutation_t load_perm = linear_loads_p
> > > > > (perm_cache,
> > > > > > > > > node,
> > > > > > > > > > > > &is_linear);
> > > > > > > > > > > > > + if (is_linear)
> > > > > > > > > > > > > + {
> > > > > > > > > > > > > + *result = node;
> > > > > > > > > > > > > + SLP_TREE_REF_COUNT (node)++;
> > > > > > > > > > > > > + return true;
> > > > > > > > > > > > > + }
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + /* Attempt to linearise the permute. */
> > > > > > > > > > > > > + vec<std::pair<unsigned, unsigned> > zipped;
> > > > > > > > > > > > > + zipped.create (load_perm.length ());
> > > > > > > > > > > > > + FOR_EACH_VEC_ELT (load_perm, x, val)
> > > > > > > > > > > > > + zipped.quick_push (std::make_pair (val, x));
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + typedef const std::pair<unsigned, unsigned>* cmp_t;
> > > > > > > > > > > > > + zipped.qsort ([](const void *a, const void *b) -> int
> > > > > > > > > > > > > + { return (int)((cmp_t)a)->first - (int)((cmp_t)b)->first; });
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + /* Verify if we have a linear permute sequence. */
> > > > > > > > > > > > > + if (zipped.length () > 0)
> > > > > > > > > > > > > + {
> > > > > > > > > > > > > + unsigned leader = zipped[0].first;
> > > > > > > > > > > > > + for (x = 1; x < zipped.length (); x++)
> > > > > > > > > > > > > + if(!(is_linear = (zipped[x].first == ++leader)))
> > > > > > > > > > > > > + break;
> > > > > > > > > > > > > + }
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + if (!is_linear)
> > > > > > > > > > > > > + {
> > > > > > > > > > > > > + if (dump_enabled_p ())
> > > > > > > > > > > > > + dump_printf_loc (MSG_NOTE, vect_location,
> > > > > > > > > > > > > + "Loads could not be made
> > > > > linear %p\n",
> > > > > > > > > > > > > + node);
> > > > > > > > > > > > > + zipped.release ();
> > > > > > > > > > > > > + return false;
> > > > > > > > > > > > > + }
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + for (x = 0; x < zipped.length (); x++)
> > > > > > > > > > > > > + zipped[x].first = 0;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + /* Create the new permute node and store it instead. */
> > > > > > > > > > > > > + slp_tree vnode = vect_create_new_slp_node (vNULL, 1);
> > > > > > > > > > > > > + SLP_TREE_CODE (vnode) = VEC_PERM_EXPR;
> > > > > > > > > > > > > + SLP_TREE_LANE_PERMUTATION (vnode) = zipped;
> > > > > > > > > > > > > + SLP_TREE_VECTYPE (vnode) = SLP_TREE_VECTYPE (parent);
> > > > > > > > > > > > > + SLP_TREE_CHILDREN (vnode).quick_push (node);
> > > > > > > > > > > > > + SLP_TREE_REF_COUNT (vnode) = 1;
> > > > > > > > > > > > > + SLP_TREE_LANES (vnode) = SLP_TREE_LANES (node);
> > > > > > > > > > > > > + SLP_TREE_REPRESENTATIVE (vnode) =
> > > > > > > SLP_TREE_REPRESENTATIVE
> > > > > > > > > > > > (parent);
> > > > > > > > > > > > > + SLP_TREE_REF_COUNT (node)++;
> > > > > > > > > > > > > + *result = vnode;
> > > > > > > > > > > > > + return is_linear;
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +/* Checks to see of the expression represented by NODE is a
> > > > > > > gimple
> > > > > > > > > > > > assign with
> > > > > > > > > > > > > + code CODE. */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +static inline bool
> > > > > > > > > > > > > +vect_match_expression_p (slp_tree node, tree_code code)
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + if (!node
> > > > > > > > > > > > > + || !SLP_TREE_REPRESENTATIVE (node))
> > > > > > > > > > > > > + return false;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + gimple* expr = STMT_VINFO_STMT
> > > > > > > (SLP_TREE_REPRESENTATIVE
> > > > > > > > > > > > (node));
> > > > > > > > > > > > > + if (!is_gimple_assign (expr)
> > > > > > > > > > > > > + || gimple_assign_rhs_code (expr) != code)
> > > > > > > > > > > > > + return false;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + return true;
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +/* Check if the given lane permute in PERMUTES matches an
> > > > > > > > > alternating
> > > > > > > > > > > > sequence
> > > > > > > > > > > > > + of {P0 P1 P0 P1 ...}. This to account for unrolled loops.
> > > > > Further
> > > > > > > > > mode
> > > > > > > > > > > > > + there resulting permute must be linear. */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +static inline bool
> > > > > > > > > > > > > +vect_check_lane_permute (lane_permutation_t &permutes,
> > > > > > > > > > > > > + unsigned p0, unsigned p1)
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + if (permutes.length () == 0)
> > > > > > > > > > > > > + return false;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + unsigned val[2] = {p0, p1};
> > > > > > > > > > > > > + unsigned seed = permutes[0].second;
> > > > > > > > > > > > > + for (unsigned i = 0; i < permutes.length (); i++)
> > > > > > > > > > > > > + if (permutes[i].first != val[i % 2]
> > > > > > > > > > > > > + || permutes[i].second != seed++)
> > > > > > > > > > > > > + return false;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + return true;
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +/* This function will match the two gimple expressions
> > > > > > > representing
> > > > > > > > > > > > NODE1 and
> > > > > > > > > > > > > + NODE2 in parallel and returns the pair operation that
> > > > > > > represents
> > > > > > > > > the two
> > > > > > > > > > > > > + expressions in the two statements.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + If match is successful then the corresponding
> > > > > > > complex_operation is
> > > > > > > > > > > > > + returned and the arguments to the two matched
> > > > > operations
> > > > > > > are
> > > > > > > > > > > > returned in OPS.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + If TWO_OPERANDS it is expected that the LANES of the
> > > > > parent
> > > > > > > > > > > > VEC_PERM select
> > > > > > > > > > > > > + from the two nodes alternatingly.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + If unsuccessful then CMPLX_NONE is returned and OPS is
> > > > > > > > > untouched.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + e.g. the following gimple statements
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + stmt 0 _39 = _37 + _12;
> > > > > > > > > > > > > + stmt 1 _6 = _38 - _36;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + will return PLUS_MINUS along with OPS containing {_37,
> > > > > _12,
> > > > > > > _38,
> > > > > > > > > _36}.
> > > > > > > > > > > > > +*/
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +static complex_operation_t
> > > > > > > > > > > > > +vect_detect_pair_op (slp_tree node1, slp_tree node2,
> > > > > > > > > > > > lane_permutation_t &lanes,
> > > > > > > > > > > > > + bool two_operands = true, vec<slp_tree>
> > > > > *ops =
> > > > > > > > > NULL)
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + complex_operation_t result = CMPLX_NONE;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + if (vect_match_expression_p (node1, MINUS_EXPR)
> > > > > > > > > > > > > + && vect_match_expression_p (node2, PLUS_EXPR)
> > > > > > > > > > > > > + && (!two_operands || vect_check_lane_permute (lanes,
> > > > > 0,
> > > > > > > 1)))
> > > > > > > > > > > > > + result = MINUS_PLUS;
> > > > > > > > > > > > > + else if (vect_match_expression_p (node1, PLUS_EXPR)
> > > > > > > > > > > > > + && vect_match_expression_p (node2,
> > > > > MINUS_EXPR)
> > > > > > > > > > > > > + && (!two_operands || vect_check_lane_permute
> > > > > (lanes, 0,
> > > > > > > > > 1)))
> > > > > > > > > > > > > + result = PLUS_MINUS;
> > > > > > > > > > > > > + else if (vect_match_expression_p (node1, PLUS_EXPR)
> > > > > > > > > > > > > + && vect_match_expression_p (node2, PLUS_EXPR))
> > > > > > > > > > > > > + result = PLUS_PLUS;
> > > > > > > > > > > > > + else if (vect_match_expression_p (node1, MULT_EXPR)
> > > > > > > > > > > > > + && vect_match_expression_p (node2,
> > > > > MULT_EXPR))
> > > > > > > > > > > > > + result = MULT_MULT;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + if (result != CMPLX_NONE && ops != NULL)
> > > > > > > > > > > > > + {
> > > > > > > > > > > > > + ops->create (2);
> > > > > > > > > > > > > + ops->quick_push (node1);
> > > > > > > > > > > > > + ops->quick_push (node2);
> > > > > > > > > > > > > + }
> > > > > > > > > > > > > + return result;
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +/* Overload of vect_detect_pair_op that matches against the
> > > > > > > > > > > > representative
> > > > > > > > > > > > > + statements in the children of NODE. It is expected that
> > > > > NODE
> > > > > > > has
> > > > > > > > > > > > exactly
> > > > > > > > > > > > > + two children and when TWO_OPERANDS then NODE must
> > > > > be a
> > > > > > > > > > > > VEC_PERM. */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +static complex_operation_t
> > > > > > > > > > > > > +vect_detect_pair_op (slp_tree node, bool two_operands =
> > > > > true,
> > > > > > > > > > > > > + vec<slp_tree> *ops = NULL)
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + if (!two_operands && SLP_TREE_CODE (node) ==
> > > > > > > VEC_PERM_EXPR)
> > > > > > > > > > > > > + return CMPLX_NONE;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + if (SLP_TREE_CHILDREN (node).length () != 2)
> > > > > > > > > > > > > + return CMPLX_NONE;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + vec<slp_tree> children = SLP_TREE_CHILDREN (node);
> > > > > > > > > > > > > + lane_permutation_t &lanes =
> > > > > SLP_TREE_LANE_PERMUTATION
> > > > > > > > > (node);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + return vect_detect_pair_op (children[0], children[1], lanes,
> > > > > > > > > > > > two_operands,
> > > > > > > > > > > > > + ops);
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > +/*********************************************************
> > > > > > > > > > > > **********************
> > > > > > > > > > > > > + * complex_pattern class
> > > > > > > > > > > > > +
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > **********************************************************
> > > > > > > > > > > > ********************/
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +/* SLP Complex Numbers pattern matching.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + As an example, the following simple loop:
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + double a[restrict N]; double b[restrict N]; double c[restrict
> > > > > N];
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + for (int i=0; i < N; i+=2)
> > > > > > > > > > > > > + {
> > > > > > > > > > > > > + c[i] = a[i] - b[i+1];
> > > > > > > > > > > > > + c[i+1] = a[i+1] + b[i];
> > > > > > > > > > > > > + }
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + which represents a complex addition on with a rotation of
> > > > > 90*
> > > > > > > > > around
> > > > > > > > > > > > the
> > > > > > > > > > > > > + argand plane. i.e. if `a` and `b` were complex numbers then
> > > > > this
> > > > > > > > > would be
> > > > > > > > > > > > the
> > > > > > > > > > > > > + same as `a + (b * I)`.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + Here the expressions for `c[i]` and `c[i+1]` are independent
> > > > > but
> > > > > > > > > have to
> > > > > > > > > > > > be
> > > > > > > > > > > > > + both recognized in order for the pattern to work. As an SLP
> > > > > tree
> > > > > > > > > this is
> > > > > > > > > > > > > + represented as
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + +--------------------------------+
> > > > > > > > > > > > > + | stmt 0 *_9 = _10; |
> > > > > > > > > > > > > + | stmt 1 *_15 = _16; |
> > > > > > > > > > > > > + +--------------------------------+
> > > > > > > > > > > > > + |
> > > > > > > > > > > > > + |
> > > > > > > > > > > > > + v
> > > > > > > > > > > > > + +--------------------------------+
> > > > > > > > > > > > > + | stmt 0 _10 = _4 - _8; |
> > > > > > > > > > > > > + | stmt 1 _16 = _12 + _14; |
> > > > > > > > > > > > > + | lane permutation { 0[0] 1[1] } |
> > > > > > > > > > > > > + +--------------------------------+
> > > > > > > > > > > > > + | |
> > > > > > > > > > > > > + | |
> > > > > > > > > > > > > + | |
> > > > > > > > > > > > > + +-----+ | | +-----+
> > > > > > > > > > > > > + | | | | | |
> > > > > > > > > > > > > + +-----| { } |<-----+ +----->| { } --------+
> > > > > > > > > > > > > + | | | +------------------| | |
> > > > > > > > > > > > > + | +-----+ | +-----+ |
> > > > > > > > > > > > > + | | | |
> > > > > > > > > > > > > + | | | |
> > > > > > > > > > > > > + | +------|------------------+ |
> > > > > > > > > > > > > + | | | |
> > > > > > > > > > > > > + v v v v
> > > > > > > > > > > > > + +--------------------------+ +--------------------------------+
> > > > > > > > > > > > > + | stmt 0 _8 = *_7; | | stmt 0 _4 = *_3; |
> > > > > > > > > > > > > + | stmt 1 _14 = *_13; | | stmt 1 _12 = *_11; |
> > > > > > > > > > > > > + | load permutation { 1 0 } | | load permutation { 0 1 } |
> > > > > > > > > > > > > + +--------------------------+ +--------------------------------+
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + The pattern matcher allows you to replace both statements
> > > > > 0
> > > > > > > and 1
> > > > > > > > > or
> > > > > > > > > > > > none at
> > > > > > > > > > > > > + all. Because this operation is a two operands operation the
> > > > > > > actual
> > > > > > > > > nodes
> > > > > > > > > > > > > + being replaced are those in the { } nodes. The actual scalar
> > > > > > > > > statements
> > > > > > > > > > > > > + themselves are not replaced or used during the matching
> > > > > but
> > > > > > > > > instead the
> > > > > > > > > > > > > + SLP_TREE_REPRESENTATIVE statements are inspected. You
> > > > > are
> > > > > > > > > also
> > > > > > > > > > > > allowed to
> > > > > > > > > > > > > + replace and match on any number of nodes.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + Because the pattern matcher matches on the
> > > > > representative
> > > > > > > > > statement
> > > > > > > > > > > > for the
> > > > > > > > > > > > > + SLP node the case of two_operators it allows you to match
> > > > > the
> > > > > > > > > children
> > > > > > > > > > > > of the
> > > > > > > > > > > > > + node. This is done using the method `recognize ()`.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +*/
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +/* The complex_pattern class contains common code for
> > > > > pattern
> > > > > > > > > > > > matchers that work
> > > > > > > > > > > > > + on complex numbers. These provide functionality to allow
> > > > > de-
> > > > > > > > > > > > construction and
> > > > > > > > > > > > > + validation of sequences depicting/transforming REAL and
> > > > > IMAG
> > > > > > > > > pairs. */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +class complex_pattern : public vect_pattern
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + protected:
> > > > > > > > > > > > > + auto_vec<slp_tree> m_workset;
> > > > > > > > > > > > > + complex_pattern (slp_tree *node, vec<slp_tree> *m_ops,
> > > > > > > > > internal_fn
> > > > > > > > > > > > ifn)
> > > > > > > > > > > > > + : vect_pattern (node, m_ops, ifn)
> > > > > > > > > > > > > + {
> > > > > > > > > > > > > + this->m_workset.safe_push (*node);
> > > > > > > > > > > > > + }
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + public:
> > > > > > > > > > > > > + void build (slp_tree_to_load_perm_map_t *, vec_info *);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + static internal_fn
> > > > > > > > > > > > > + matches (complex_operation_t op,
> > > > > > > > > slp_tree_to_load_perm_map_t *,
> > > > > > > > > > > > > + vec<slp_tree> *);
> > > > > > > > > > > > > +};
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +/* Create a replacement pattern statement for each node in
> > > > > > > > > m_node and
> > > > > > > > > > > > inserts
> > > > > > > > > > > > > + the new statement into m_node as the new representative
> > > > > > > > > statement.
> > > > > > > > > > > > The old
> > > > > > > > > > > > > + statement is marked as being in a pattern defined by the
> > > > > new
> > > > > > > > > statement.
> > > > > > > > > > > > The
> > > > > > > > > > > > > + statement is created as call to internal function IFN with
> > > > > > > > > m_num_args
> > > > > > > > > > > > > + arguments.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + Futhermore the new pattern is also added to the
> > > > > vectorization
> > > > > > > > > > > > information
> > > > > > > > > > > > > + structure VINFO and the old statement STMT_INFO is
> > > > > marked
> > > > > > > as
> > > > > > > > > unused
> > > > > > > > > > > > while
> > > > > > > > > > > > > + the new statement is marked as used and the number of
> > > > > SLP
> > > > > > > uses
> > > > > > > > > of the
> > > > > > > > > > > > new
> > > > > > > > > > > > > + statement is incremented.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + The newly created SLP nodes are marked as SLP only and
> > > > > will
> > > > > > > be
> > > > > > > > > > > > dissolved
> > > > > > > > > > > > > + if SLP is aborted.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + The newly created gimple call is returned and the BB
> > > > > remains
> > > > > > > > > unchanged.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + This default method is designed to only match against
> > > > > simple
> > > > > > > > > operands
> > > > > > > > > > > > where
> > > > > > > > > > > > > + all the input and output types are the same.
> > > > > > > > > > > > > +*/
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +void
> > > > > > > > > > > > > +complex_pattern::build (slp_tree_to_load_perm_map_t
> > > > > > > > > *perm_cache,
> > > > > > > > > > > > > + vec_info *vinfo)
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + stmt_vec_info stmt_info;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + auto_vec<tree> args;
> > > > > > > > > > > > > + args.create (this->m_num_args);
> > > > > > > > > > > > > + args.quick_grow_cleared (this->m_num_args);
> > > > > > > > > > > > > + slp_tree node;
> > > > > > > > > > > > > + unsigned ix;
> > > > > > > > > > > > > + stmt_vec_info call_stmt_info;
> > > > > > > > > > > > > + gcall *call_stmt = NULL;
> > > > > > > > > > > > > + auto_vec<slp_tree> nodes;
> > > > > > > > > > > > > + slp_tree tmp = NULL;
> > > > > > > > > > > > > + node = this->m_ops[0];
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + /* First re-arrange the children. */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), ix, tmp)
> > > > > > > > > > > > > + {
> > > > > > > > > > > > > + slp_tree vnode = NULL;
> > > > > > > > > > > > > + if (vect_slp_make_linear (perm_cache, node, tmp,
> > > > > &vnode))
> > > > > > > > > > > > > + nodes.safe_push (vnode);
> > > > > > > > > > > > > + else
> > > > > > > > > > > > > + {
> > > > > > > > > > > > > + FOR_EACH_VEC_ELT (nodes, ix, tmp)
> > > > > > > > > > > > > + vect_free_slp_tree (tmp);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + return;
> > > > > > > > > > > > > + }
> > > > > > > > > > > > > + }
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + FOR_EACH_VEC_ELT (this->m_ops, ix, node)
> > > > > > > > > > > > > + vect_free_slp_tree (node);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + SLP_TREE_CHILDREN (*this->m_node).truncate (0);
> > > > > > > > > > > > > + SLP_TREE_CHILDREN (*this->m_node).safe_splice (nodes);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + /* Now modify the nodes themselves. */
> > > > > > > > > > > > > + FOR_EACH_VEC_ELT (this->m_workset, ix, node)
> > > > > > > > > > > > > + {
> > > > > > > > > > > > > + /* Calculate the location of the statement in NODE to
> > > > > replace.
> > > > > > > */
> > > > > > > > > > > > > + stmt_info = SLP_TREE_REPRESENTATIVE (node);
> > > > > > > > > > > > > + gimple* old_stmt = STMT_VINFO_STMT (stmt_info);
> > > > > > > > > > > > > + tree lhs_old_stmt = gimple_get_lhs (old_stmt);
> > > > > > > > > > > > > + tree type = TREE_TYPE (lhs_old_stmt);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + /* Create the argument set for use by
> > > > > > > > > gimple_build_call_internal_vec.
> > > > > > > > > > > > */
> > > > > > > > > > > > > + for (unsigned i = 0; i < this->m_num_args; i++)
> > > > > > > > > > > > > + args[i] = lhs_old_stmt;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + /* Create the new pattern statements. */
> > > > > > > > > > > > > + call_stmt = gimple_build_call_internal_vec (this->m_ifn,
> > > > > args);
> > > > > > > > > > > > > + tree var = make_temp_ssa_name (type, call_stmt,
> > > > > > > "slp_patt");
> > > > > > > > > > > > > + gimple_call_set_lhs (call_stmt, var);
> > > > > > > > > > > > > + gimple_set_location (call_stmt, gimple_location
> > > > > (old_stmt));
> > > > > > > > > > > > > + gimple_call_set_nothrow (call_stmt, true);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + /* Adjust the book-keeping for the new and old
> > > > > statements
> > > > > > > for
> > > > > > > > > use
> > > > > > > > > > > > during
> > > > > > > > > > > > > + SLP. This is required to get the right VF and
> > > > > statement during
> > > > > > > > > SLP
> > > > > > > > > > > > > + analysis. These changes are created after relevancy
> > > > > has
> > > > > > > > > been set for
> > > > > > > > > > > > > + the nodes as such we need to manually update them.
> > > > > Any
> > > > > > > > > changes
> > > > > > > > > > > > will be
> > > > > > > > > > > > > + undone if SLP is cancelled. */
> > > > > > > > > > > > > + call_stmt_info
> > > > > > > > > > > > > + = vinfo->add_pattern_stmt (call_stmt, stmt_info);
> > > > > > > > > > > > > + STMT_VINFO_RELEVANT (call_stmt_info) =
> > > > > > > vect_used_in_scope;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + /* Unfortunately still need this on the new pattern
> > > > > because
> > > > > > > non-
> > > > > > > > > loop
> > > > > > > > > > > > SLP
> > > > > > > > > > > > > + doesn't call vect_detect_hybrid_slp so it never
> > > > > updates it.
> > > > > > > > > */
> > > > > > > > > > > > > + STMT_SLP_TYPE (call_stmt_info) = pure_slp;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + /* add_pattern_stmt can't be done in
> > > > > > > vect_mark_pattern_stmts
> > > > > > > > > > > > because
> > > > > > > > > > > > > + the non-SLP pattern matchers already have added
> > > > > the
> > > > > > > > > statement to
> > > > > > > > > > > > VINFO
> > > > > > > > > > > > > + by the time it is called. Some of them need to
> > > > > modify the
> > > > > > > > > returned
> > > > > > > > > > > > > + stmt_info. vect_mark_pattern_stmts is called by
> > > > > > > > > recog_pattern and
> > > > > > > > > > > > it
> > > > > > > > > > > > > + would increase the size of each pattern with
> > > > > boilerplate code
> > > > > > > > > to
> > > > > > > > > > > > make
> > > > > > > > > > > > > + the call there. */
> > > > > > > > > > > > > + vect_mark_pattern_stmts (vinfo, stmt_info, call_stmt,
> > > > > > > > > > > > > + SLP_TREE_VECTYPE (node));
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + /* Since we are replacing all the statements in the group
> > > > > with
> > > > > > > the
> > > > > > > > > same
> > > > > > > > > > > > > + thing it doesn't really matter. So just set it every
> > > > > time a new
> > > > > > > > > stmt
> > > > > > > > > > > > > + is created. */
> > > > > > > > > > > > > + SLP_TREE_REPRESENTATIVE (node) = call_stmt_info;
> > > > > > > > > > > > > + SLP_TREE_CODE (node) = CALL_EXPR;
> > > > > > > > > > > > > + }
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > +/*********************************************************
> > > > > > > > > > > > **********************
> > > > > > > > > > > > > + * complex_add_pattern class
> > > > > > > > > > > > > +
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > **********************************************************
> > > > > > > > > > > > ********************/
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +class complex_add_pattern : public complex_pattern
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + protected:
> > > > > > > > > > > > > + complex_add_pattern (slp_tree *node, vec<slp_tree>
> > > > > > > *m_ops,
> > > > > > > > > > > > internal_fn ifn)
> > > > > > > > > > > > > + : complex_pattern (node, m_ops, ifn)
> > > > > > > > > > > > > + {
> > > > > > > > > > > > > + this->m_num_args = 2;
> > > > > > > > > > > > > + }
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + public:
> > > > > > > > > > > > > + static internal_fn
> > > > > > > > > > > > > + matches (complex_operation_t op,
> > > > > > > > > slp_tree_to_load_perm_map_t *,
> > > > > > > > > > > > > + vec<slp_tree> *);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + static vect_pattern*
> > > > > > > > > > > > > + recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
> > > > > > > > > > > > > +};
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +/* Pattern matcher for trying to match complex addition
> > > > > pattern
> > > > > > > in
> > > > > > > > > SLP
> > > > > > > > > > > > tree.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + If no match is found then IFN is set to IFN_LAST.
> > > > > > > > > > > > > + This function matches the patterns shaped as:
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + c[i] = a[i] - b[i+1];
> > > > > > > > > > > > > + c[i+1] = a[i+1] + b[i];
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + If a match occurred then TRUE is returned, else FALSE. The
> > > > > > > initial
> > > > > > > > > match
> > > > > > > > > > > > is
> > > > > > > > > > > > > + expected to be in OP1 and the initial match operands in
> > > > > args0.
> > > > > > > */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +internal_fn
> > > > > > > > > > > > > +complex_add_pattern::matches (complex_operation_t op,
> > > > > > > > > > > > > + slp_tree_to_load_perm_map_t
> > > > > > > > > *perm_cache,
> > > > > > > > > > > > > + vec<slp_tree> *ops)
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + internal_fn ifn = IFN_LAST;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + /* Find the two components. Rotation in the complex plane
> > > > > will
> > > > > > > > > modify
> > > > > > > > > > > > > + the operations:
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + * Rotation 0: + +
> > > > > > > > > > > > > + * Rotation 90: - +
> > > > > > > > > > > > > + * Rotation 180: - -
> > > > > > > > > > > > > + * Rotation 270: + -
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + Rotation 0 and 180 can be handled by normal SIMD code,
> > > > > so
> > > > > > > we
> > > > > > > > > don't
> > > > > > > > > > > > need
> > > > > > > > > > > > > + to care about them here. */
> > > > > > > > > > > > > + if (op == MINUS_PLUS)
> > > > > > > > > > > > > + ifn = IFN_COMPLEX_ADD_ROT90;
> > > > > > > > > > > > > + else if (op == PLUS_MINUS)
> > > > > > > > > > > > > + ifn = IFN_COMPLEX_ADD_ROT270;
> > > > > > > > > > > > > + else
> > > > > > > > > > > > > + return ifn;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + /* verify that there is a permute, otherwise this isn't a
> > > > > pattern
> > > > > > > we
> > > > > > > > > > > > > + we support. */
> > > > > > > > > > > > > + bool is_linear = false;
> > > > > > > > > > > > > + gcc_assert (ops->length () == 2);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + vec<slp_tree> children = SLP_TREE_CHILDREN ((*ops)[0]);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + /* First node must be unpermuted. */
> > > > > > > > > > > > > + linear_loads_p (perm_cache, children[0], &is_linear);
> > > > > > > > > > > > > + if (!is_linear)
> > > > > > > > > > > > > + return IFN_LAST;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + /* Second node must be permuted. */
> > > > > > > > > > > > > + if (linear_loads_p (perm_cache, children[1],
> > > > > &is_linear).length
> > > > > > > () >
> > > > > > > > > 0
> > > > > > > > > > > > > + && is_linear)
> > > > > > > > > > > > > + return IFN_LAST;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + return ifn;
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +vect_pattern*
> > > > > > > > > > > > > +complex_add_pattern::recognize
> > > > > > > (slp_tree_to_load_perm_map_t
> > > > > > > > > > > > *perm_cache,
> > > > > > > > > > > > > + slp_tree *node)
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + auto_vec<slp_tree> ops;
> > > > > > > > > > > > > + complex_operation_t op
> > > > > > > > > > > > > + = vect_detect_pair_op (*node, true, &ops);
> > > > > > > > > > > > > + internal_fn ifn = complex_add_pattern::matches (op,
> > > > > > > perm_cache,
> > > > > > > > > > > > &ops);
> > > > > > > > > > > > > + if (!vect_pattern_validate_optab (ifn, *node))
> > > > > > > > > > > > > + return NULL;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + return new complex_add_pattern (node, &ops, ifn);
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > +/*********************************************************
> > > > > > > > > > > > **********************
> > > > > > > > > > > > > + * Pattern matching definitions
> > > > > > > > > > > > > +
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > **********************************************************
> > > > > > > > > > > > ********************/
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define SLP_PATTERN(x) &x::recognize
> > > > > > > > > > > > > +vect_pattern_decl_t slp_patterns[]
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + /* For least amount of back-tracking and more efficient
> > > > > > > matching
> > > > > > > > > > > > > + order patterns from the largest to the smallest. Especially
> > > > > if
> > > > > > > they
> > > > > > > > > > > > > + overlap in what they can detect. */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + SLP_PATTERN (complex_add_pattern),
> > > > > > > > > > > > > +};
> > > > > > > > > > > > > +#undef SLP_PATTERN
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +/* Set the number of SLP pattern matchers available. */
> > > > > > > > > > > > > +size_t num__slp_patterns =
> > > > > > > > > > > > sizeof(slp_patterns)/sizeof(vect_pattern_decl_t);
> > > > > > > > > > > > > diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > d19874f175703a96b1c1110874067fdbec48c068..7f5fbdbd4969036b5db1cb698
> > > > > > > > > > > > da970304c87b03b 100644
> > > > > > > > > > > > > --- a/gcc/tree-vect-slp.c
> > > > > > > > > > > > > +++ b/gcc/tree-vect-slp.c
> > > > > > > > > > > > > @@ -105,7 +105,7 @@ _slp_tree::~_slp_tree ()
> > > > > > > > > > > > >
> > > > > > > > > > > > > /* Recursively free the memory allocated for the SLP tree
> > > > > rooted
> > > > > > > at
> > > > > > > > > NODE.
> > > > > > > > > > > > */
> > > > > > > > > > > > >
> > > > > > > > > > > > > -static void
> > > > > > > > > > > > > +void
> > > > > > > > > > > > > vect_free_slp_tree (slp_tree node)
> > > > > > > > > > > > > {
> > > > > > > > > > > > > int i;
> > > > > > > > > > > > > @@ -148,7 +148,7 @@ vect_free_slp_instance (slp_instance
> > > > > > > instance)
> > > > > > > > > > > > >
> > > > > > > > > > > > > /* Create an SLP node for SCALAR_STMTS. */
> > > > > > > > > > > > >
> > > > > > > > > > > > > -slp_tree
> > > > > > > > > > > > > +static slp_tree
> > > > > > > > > > > > > vect_create_new_slp_node (slp_tree node,
> > > > > > > > > > > > > vec<stmt_vec_info> scalar_stmts, unsigned
> > > > > > > > > nops)
> > > > > > > > > > > > > {
> > > > > > > > > > > > > @@ -165,7 +165,7 @@ vect_create_new_slp_node (slp_tree
> > > > > > > node,
> > > > > > > > > > > > >
> > > > > > > > > > > > > /* Create an SLP node for SCALAR_STMTS. */
> > > > > > > > > > > > >
> > > > > > > > > > > > > -static slp_tree
> > > > > > > > > > > > > +slp_tree
> > > > > > > > > > > > > vect_create_new_slp_node (vec<stmt_vec_info>
> > > > > scalar_stmts,
> > > > > > > > > unsigned
> > > > > > > > > > > > nops)
> > > > > > > > > > > > > {
> > > > > > > > > > > > > return vect_create_new_slp_node (new _slp_tree,
> > > > > scalar_stmts,
> > > > > > > > > nops);
> > > > > > > > > > > > > @@ -2175,6 +2175,84 @@ calculate_unrolling_factor
> > > > > (poly_uint64
> > > > > > > > > nunits,
> > > > > > > > > > > > unsigned int group_size)
> > > > > > > > > > > > > return exact_div (common_multiple (nunits, group_size),
> > > > > > > > > group_size);
> > > > > > > > > > > > > }
> > > > > > > > > > > > >
> > > > > > > > > > > > > +/* Helper function of vect_match_slp_patterns.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + Attempts to match patterns against the slp tree rooted in
> > > > > > > > > REF_NODE
> > > > > > > > > > > > using
> > > > > > > > > > > > > + VINFO. Patterns are matched in post-order traversal.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + If matching is successful the value in REF_NODE is updated
> > > > > and
> > > > > > > > > returned,
> > > > > > > > > > > > if
> > > > > > > > > > > > > + not then it is returned unchanged. */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +static bool
> > > > > > > > > > > > > +vect_match_slp_patterns_2 (slp_tree *ref_node, vec_info
> > > > > > > *vinfo,
> > > > > > > > > > > > > + slp_tree_to_load_perm_map_t
> > > > > > > > > *perm_cache,
> > > > > > > > > > > > > + hash_set<slp_tree> *visited)
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + unsigned i;
> > > > > > > > > > > > > + slp_tree node = *ref_node;
> > > > > > > > > > > > > + bool found_p = false;
> > > > > > > > > > > > > + if (!node || visited->add (node))
> > > > > > > > > > > > > + return false;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + slp_tree child;
> > > > > > > > > > > > > + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
> > > > > > > > > > > > > + found_p |= vect_match_slp_patterns_2
> > > > > > > (&SLP_TREE_CHILDREN
> > > > > > > > > > > > (node)[i],
> > > > > > > > > > > > > + vinfo, perm_cache,
> > > > > visited);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + for (unsigned x = 0; x < num__slp_patterns; x++)
> > > > > > > > > > > > > + {
> > > > > > > > > > > > > + vect_pattern *pattern = slp_patterns[x] (perm_cache,
> > > > > > > ref_node);
> > > > > > > > > > > > > + if (pattern)
> > > > > > > > > > > > > + {
> > > > > > > > > > > > > + pattern->build (perm_cache, vinfo);
> > > > > > > > > > > > > + delete pattern;
> > > > > > > > > > > > > + found_p = true;
> > > > > > > > > > > > > + }
> > > > > > > > > > > > > + }
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + return found_p;
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +/* Applies pattern matching to the given SLP tree rooted in
> > > > > > > > > REF_NODE
> > > > > > > > > > > > using
> > > > > > > > > > > > > + vec_info VINFO.
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + The modified tree is returned. Patterns are tried in order
> > > > > and
> > > > > > > > > multiple
> > > > > > > > > > > > > + patterns may match. */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +static bool
> > > > > > > > > > > > > +vect_match_slp_patterns (slp_instance instance, vec_info
> > > > > > > *vinfo,
> > > > > > > > > > > > > + hash_set<slp_tree> *visited,
> > > > > > > > > > > > > + slp_tree_to_load_perm_map_t
> > > > > > > > > *perm_cache,
> > > > > > > > > > > > > + scalar_stmts_to_slp_tree_map_t *
> > > > > /*
> > > > > > > > > bst_map */)
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + DUMP_VECT_SCOPE ("vect_match_slp_patterns");
> > > > > > > > > > > > > + slp_tree *ref_node = &SLP_INSTANCE_TREE (instance);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + if (dump_enabled_p ())
> > > > > > > > > > > > > + dump_printf_loc (MSG_NOTE, vect_location,
> > > > > > > > > > > > > + "Analyzing SLP tree %p for patterns\n",
> > > > > > > > > > > > > + SLP_INSTANCE_TREE (instance));
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + bool found_p
> > > > > > > > > > > > > + = vect_match_slp_patterns_2 (ref_node, vinfo,
> > > > > perm_cache,
> > > > > > > > > visited);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + if (found_p)
> > > > > > > > > > > > > + {
> > > > > > > > > > > > > + if (dump_enabled_p ())
> > > > > > > > > > > > > + {
> > > > > > > > > > > > > + dump_printf_loc (MSG_NOTE, vect_location,
> > > > > > > > > > > > > + "Pattern matched SLP tree\n");
> > > > > > > > > > > > > + vect_print_slp_graph (MSG_NOTE, vect_location,
> > > > > > > > > *ref_node);
> > > > > > > > > > > > > + }
> > > > > > > > > > > > > + }
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + return found_p;
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +/* Analyze an SLP instance starting from a group of grouped
> > > > > > > stores.
> > > > > > > > > Call
> > > > > > > > > > > > > + vect_build_slp_tree to build a tree of packed stmts if
> > > > > possible.
> > > > > > > > > > > > > + Return FALSE if it's impossible to SLP any stmt in the loop.
> > > > > */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > static bool
> > > > > > > > > > > > > vect_analyze_slp_instance (vec_info *vinfo,
> > > > > > > > > > > > > scalar_stmts_to_slp_tree_map_t *bst_map,
> > > > > > > > > > > > > @@ -2540,6 +2618,7 @@ vect_analyze_slp (vec_info *vinfo,
> > > > > > > unsigned
> > > > > > > > > > > > max_tree_size)
> > > > > > > > > > > > > {
> > > > > > > > > > > > > unsigned int i;
> > > > > > > > > > > > > stmt_vec_info first_element;
> > > > > > > > > > > > > + slp_instance instance;
> > > > > > > > > > > > >
> > > > > > > > > > > > > DUMP_VECT_SCOPE ("vect_analyze_slp");
> > > > > > > > > > > > >
> > > > > > > > > > > > > @@ -2586,6 +2665,13 @@ vect_analyze_slp (vec_info *vinfo,
> > > > > > > > > unsigned
> > > > > > > > > > > > max_tree_size)
> > > > > > > > > > > > > slp_inst_kind_reduc_group,
> > > > > > > > > > > > max_tree_size);
> > > > > > > > > > > > > }
> > > > > > > > > > > > >
> > > > > > > > > > > > > + hash_set<slp_tree> visited_patterns;
> > > > > > > > > > > > > + slp_tree_to_load_perm_map_t perm_cache;
> > > > > > > > > > > > > + /* See if any patterns can be found in the SLP tree. */
> > > > > > > > > > > > > + FOR_EACH_VEC_ELT (LOOP_VINFO_SLP_INSTANCES (vinfo),
> > > > > i,
> > > > > > > > > instance)
> > > > > > > > > > > > > + vect_match_slp_patterns (instance, vinfo,
> > > > > &visited_patterns,
> > > > > > > > > > > > &perm_cache,
> > > > > > > > > > > > > + bst_map);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > /* The map keeps a reference on SLP nodes built, release
> > > > > that.
> > > > > > > */
> > > > > > > > > > > > > for (scalar_stmts_to_slp_tree_map_t::iterator it = bst_map-
> > > > > > > >begin
> > > > > > > > > ();
> > > > > > > > > > > > > it != bst_map->end (); ++it)
> > > > > > > > > > > > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> > > > > > > > > > > > > index
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > 91e2e10761d591b99ad55467e4719219ea5c0e49..ea39f56365e6c6fcbaaeb9cde
> > > > > > > > > > > > 769a81a109d6af3 100644
> > > > > > > > > > > > > --- a/gcc/tree-vectorizer.h
> > > > > > > > > > > > > +++ b/gcc/tree-vectorizer.h
> > > > > > > > > > > > > @@ -27,6 +27,7 @@ typedef class _stmt_vec_info
> > > > > > > *stmt_vec_info;
> > > > > > > > > > > > > #include "tree-hash-traits.h"
> > > > > > > > > > > > > #include "target.h"
> > > > > > > > > > > > > #include "alloc-pool.h"
> > > > > > > > > > > > > +#include "internal-fn.h"
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > /* Used for naming of new temporaries. */
> > > > > > > > > > > > > @@ -1994,6 +1995,7 @@ extern void
> > > > > duplicate_and_interleave
> > > > > > > > > (vec_info *,
> > > > > > > > > > > > gimple_seq *, tree,
> > > > > > > > > > > > > extern int vect_get_place_in_interleaving_chain
> > > > > (stmt_vec_info,
> > > > > > > > > > > > stmt_vec_info);
> > > > > > > > > > > > > extern bool vect_update_shared_vectype (stmt_vec_info,
> > > > > tree);
> > > > > > > > > > > > > extern slp_tree vect_create_new_slp_node
> > > > > > > (vec<stmt_vec_info>,
> > > > > > > > > > > > unsigned);
> > > > > > > > > > > > > +extern void vect_free_slp_tree (slp_tree);
> > > > > > > > > > > > >
> > > > > > > > > > > > > /* In tree-vect-patterns.c. */
> > > > > > > > > > > > > extern void
> > > > > > > > > > > > > @@ -2010,4 +2012,67 @@ void
> > > > > vect_free_loop_info_assumptions
> > > > > > > > > (class
> > > > > > > > > > > > loop *);
> > > > > > > > > > > > > gimple *vect_loop_vectorized_call (class loop *, gcond
> > > > > **cond =
> > > > > > > > > NULL);
> > > > > > > > > > > > > bool vect_stmt_dominates_stmt_p (gimple *, gimple *);
> > > > > > > > > > > > >
> > > > > > > > > > > > > +/* SLP Pattern matcher types, tree-vect-slp-patterns.c. */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +/* Forward declaration of possible two operands operation
> > > > > that
> > > > > > > can
> > > > > > > > > be
> > > > > > > > > > > > matched
> > > > > > > > > > > > > + by the complex numbers pattern matchers. */
> > > > > > > > > > > > > +enum _complex_operation : unsigned;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +/* Cache from nodes to the load permutation they represent.
> > > > > */
> > > > > > > > > > > > > +typedef hash_map <slp_tree, load_permutation_t >
> > > > > > > > > > > > > + slp_tree_to_load_perm_map_t;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +/* Vector pattern matcher base class. All SLP pattern
> > > > > matchers
> > > > > > > must
> > > > > > > > > > > > inherit
> > > > > > > > > > > > > + from this type. */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +class vect_pattern
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + protected:
> > > > > > > > > > > > > + /* The number of arguments that the IFN requires. */
> > > > > > > > > > > > > + unsigned m_num_args;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + /* The internal function that will be used when a pattern is
> > > > > > > created.
> > > > > > > > > */
> > > > > > > > > > > > > + internal_fn m_ifn;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + /* The current node being inspected. */
> > > > > > > > > > > > > + slp_tree *m_node;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + /* The list of operands to be the children for the node
> > > > > > > produced
> > > > > > > > > when
> > > > > > > > > > > > the
> > > > > > > > > > > > > + internal function is created. */
> > > > > > > > > > > > > + vec<slp_tree> m_ops;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + /* Default constructor where NODE is the root of the tree
> > > > > to
> > > > > > > > > inspect. */
> > > > > > > > > > > > > + vect_pattern (slp_tree *node, vec<slp_tree> *m_ops,
> > > > > > > internal_fn
> > > > > > > > > ifn)
> > > > > > > > > > > > > + {
> > > > > > > > > > > > > + this->m_ifn = ifn;
> > > > > > > > > > > > > + this->m_node = node;
> > > > > > > > > > > > > + this->m_ops.create (0);
> > > > > > > > > > > > > + this->m_ops.safe_splice (*m_ops);
> > > > > > > > > > > > > + }
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + public:
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + /* Create a new instance of the pattern matcher class of
> > > > > the
> > > > > > > given
> > > > > > > > > type.
> > > > > > > > > > > > */
> > > > > > > > > > > > > + static vect_pattern* recognize
> > > > > > > (slp_tree_to_load_perm_map_t *,
> > > > > > > > > > > > slp_tree *);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + /* Build the pattern from the data collected so far. */
> > > > > > > > > > > > > + virtual void build (slp_tree_to_load_perm_map_t *,
> > > > > vec_info
> > > > > > > *) =
> > > > > > > > > 0;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + /* Default destructor. */
> > > > > > > > > > > > > + virtual ~vect_pattern ()
> > > > > > > > > > > > > + {
> > > > > > > > > > > > > + this->m_ops.release ();
> > > > > > > > > > > > > + }
> > > > > > > > > > > > > +};
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +/* Function pointer to create a new pattern matcher from a
> > > > > > > generic
> > > > > > > > > type.
> > > > > > > > > > > > */
> > > > > > > > > > > > > +typedef vect_pattern* (*vect_pattern_decl_t)
> > > > > > > > > > > > (slp_tree_to_load_perm_map_t *,
> > > > > > > > > > > > > + slp_tree *);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +/* List of supported pattern matchers. */
> > > > > > > > > > > > > +extern vect_pattern_decl_t slp_patterns[];
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +/* Number of supported pattern matchers. */
> > > > > > > > > > > > > +extern size_t num__slp_patterns;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > #endif /* GCC_TREE_VECTORIZER_H */
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > Richard Biener <rguenther@suse.de>
> > > > > > > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5,
> > > > > 90409
> > > > > > > > > > > > Nuernberg,
> > > > > > > > > > > > Germany; GF: Felix Imend
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Richard Biener <rguenther@suse.de>
> > > > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> > > > > > > > > Nuernberg,
> > > > > > > > > Germany; GF: Felix Imend
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Richard Biener <rguenther@suse.de>
> > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> > > > > > > Nuernberg,
> > > > > > > Germany; GF: Felix Imend
> > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > > Richard Biener <rguenther@suse.de>
> > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> > > > > Nuernberg,
> > > > > Germany; GF: Felix Imend
> > > >
> > >
> > > --
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > Germany; GF: Felix Imend
> >
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imend
--
[-- Attachment #2: pr13812.patch --]
[-- Type: text/x-diff, Size: 93310 bytes --]
diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 16be66fefc6bce12e010dcc191dcfde11340f30d..a6f995a10d753b9e515d45258974f3d813aad1db 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1647,6 +1647,7 @@ OBJS = \
tree-vect-loop.o \
tree-vect-loop-manip.o \
tree-vect-slp.o \
+ tree-vect-slp-patterns.o \
tree-vectorizer.o \
tree-vector-builder.o \
tree-vrp.o \
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 573a340c14b17fab2393d3ac2f2a6b7b9b681003..ec6ec180b91fcf9f481b6754c044483787fd923c 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -6154,6 +6154,54 @@ floating-point mode.
This pattern is not allowed to @code{FAIL}.
+@cindex @code{cadd90@var{m}3} instruction pattern
+@item @samp{cadd90@var{m}3}
+Perform vector add and subtract on even/odd number pairs. The operation being
+matched is semantically described as
+
+@smallexample
+ for (int i = 0; i < N; i += 2)
+ @{
+ c[i] = a[i] - b[i+1];
+ c[i+1] = a[i+1] + b[i];
+ @}
+@end smallexample
+
+This operation is semantically equivalent to performing a vector addition of
+complex numbers in operand 1 with operand 2 rotated by 90 degrees around
+the argand plane and storing the result in operand 0.
+
+In GCC lane ordering the real part of the number must be in the even lanes with
+the imaginary part in the odd lanes.
+
+The operation is only supported for vector modes @var{m}.
+
+This pattern is not allowed to @code{FAIL}.
+
+@cindex @code{cadd270@var{m}3} instruction pattern
+@item @samp{cadd270@var{m}3}
+Perform vector add and subtract on even/odd number pairs. The operation being
+matched is semantically described as
+
+@smallexample
+ for (int i = 0; i < N; i += 2)
+ @{
+ c[i] = a[i] + b[i+1];
+ c[i+1] = a[i+1] - b[i];
+ @}
+@end smallexample
+
+This operation is semantically equivalent to performing a vector addition of
+complex numbers in operand 1 with operand 2 rotated by 270 degrees around
+the argand plane and storing the result in operand 0.
+
+In GCC lane ordering the real part of the number must be in the even lanes with
+the imaginary part in the odd lanes.
+
+The operation is only supported for vector modes @var{m}.
+
+This pattern is not allowed to @code{FAIL}.
+
@cindex @code{ffs@var{m}2} instruction pattern
@item @samp{ffs@var{m}2}
Store into operand 0 one plus the index of the least significant 1-bit
diff --git a/gcc/doc/passes.texi b/gcc/doc/passes.texi
index a5ae4143a8c1293e674b499120372ee5fe5c412b..c86df5cd843084a5b7933ef99a23386891a7b0c1 100644
--- a/gcc/doc/passes.texi
+++ b/gcc/doc/passes.texi
@@ -709,7 +709,8 @@ loop.
The pass is implemented in @file{tree-vectorizer.c} (the main driver),
@file{tree-vect-loop.c} and @file{tree-vect-loop-manip.c} (loop specific parts
and general loop utilities), @file{tree-vect-slp} (loop-aware SLP
-functionality), @file{tree-vect-stmts.c} and @file{tree-vect-data-refs.c}.
+functionality), @file{tree-vect-stmts.c}, @file{tree-vect-data-refs.c} and
+@file{tree-vect-slp-patterns.c} containing the SLP pattern matcher.
Analysis of data references is in @file{tree-data-ref.c}.
SLP Vectorization. This pass performs vectorization of straight-line code. The
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 91a7bfea3eecad747320afcd11e05eedb9f7fcbf..511fe70162b5d9db3a61a5285d31c008f6835487 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -277,6 +277,9 @@ DEF_INTERNAL_FLT_FN (SCALB, ECF_CONST, scalb, binary)
DEF_INTERNAL_FLT_FLOATN_FN (FMIN, ECF_CONST, fmin, binary)
DEF_INTERNAL_FLT_FLOATN_FN (FMAX, ECF_CONST, fmax, binary)
DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONST, xorsign, binary)
+DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT90, ECF_CONST, cadd90, binary)
+DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary)
+
/* FP scales. */
DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 5607f51e6b4b775a92d1d8ffcd3e9b53e9270d6c..e9727def4dbf941bb9ac8b56f83f8ea0f52b262c 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -290,6 +290,8 @@ OPTAB_D (atan_optab, "atan$a2")
OPTAB_D (atanh_optab, "atanh$a2")
OPTAB_D (copysign_optab, "copysign$F$a3")
OPTAB_D (xorsign_optab, "xorsign$F$a3")
+OPTAB_D (cadd90_optab, "cadd90$a3")
+OPTAB_D (cadd270_optab, "cadd270$a3")
OPTAB_D (cos_optab, "cos$a2")
OPTAB_D (cosh_optab, "cosh$a2")
OPTAB_D (exp10_optab, "exp10$a2")
diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-byte.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-byte.c
new file mode 100644
index 0000000000000000000000000000000000000000..6db7b21910d1ec098a604e94b40e40eb2f8408e3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-byte.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_complex_add_byte } */
+/* { dg-require-effective-target stdint_types } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+
+#define TYPE int8_t
+#define N 16
+#include <stdint.h>
+#include "complex-add-pattern-template.c"
+
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 1 "slp1" { xfail aarch64_sve2 } } } */
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "slp1" { xfail aarch64_sve2 } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-int.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-int.c
new file mode 100644
index 0000000000000000000000000000000000000000..e9bcce5e78cf82a8f1627629cb91dbfbe1503832
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-int.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_complex_add_int } */
+/* { dg-require-effective-target stdint_types } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+
+#define TYPE int32_t
+#define N 16
+#include <stdint.h>
+#include "complex-add-pattern-template.c"
+
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 1 "slp1" } } */
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "slp1" { xfail aarch64_sve2 } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-long.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-long.c
new file mode 100644
index 0000000000000000000000000000000000000000..fb6a5dbc5d6c7be48448333e2bcbd6a73bcb77d4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-long.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_complex_add_long } */
+/* { dg-require-effective-target stdint_types } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+
+#define TYPE int64_t
+#define N 16
+#include <stdint.h>
+#include "complex-add-pattern-template.c"
+
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 1 "slp1" } } */
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "slp1" { xfail aarch64_sve2 } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-short.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-short.c
new file mode 100644
index 0000000000000000000000000000000000000000..ec0a134f4f11b1e0115ccf896980c596e238c9d9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-short.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_complex_add_short } */
+/* { dg-require-effective-target stdint_types } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+
+#define TYPE int16_t
+#define N 16
+#include <stdint.h>
+#include "complex-add-pattern-template.c"
+
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 1 "slp1" } } */
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "slp1" { xfail aarch64_sve2 } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-byte.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-byte.c
new file mode 100644
index 0000000000000000000000000000000000000000..dd6ff946ad023a43ecc1873988fcda54ae1f8d35
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-byte.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_complex_add_byte } */
+/* { dg-require-effective-target stdint_types } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+
+#define TYPE uint8_t
+#define N 16
+#include <stdint.h>
+#include "complex-add-pattern-template.c"
+
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 1 "slp1" { xfail aarch64_sve2 } } } */
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "slp1" { xfail aarch64_sve2 } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-int.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-int.c
new file mode 100644
index 0000000000000000000000000000000000000000..7cdfabae55d2c047f6cfddfc0a1541217fe636d2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-int.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_complex_add_int } */
+/* { dg-require-effective-target stdint_types } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+
+#define TYPE uint32_t
+#define N 16
+#include <stdint.h>
+#include "complex-add-pattern-template.c"
+
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 1 "slp1" } } */
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "slp1" { xfail aarch64_sve2 } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-long.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-long.c
new file mode 100644
index 0000000000000000000000000000000000000000..58451959818a48b6079172bd247c73920009268b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-long.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_complex_add_long } */
+/* { dg-require-effective-target stdint_types } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+
+#define TYPE uint64_t
+#define N 16
+#include <stdint.h>
+#include "complex-add-pattern-template.c"
+
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 1 "slp1" } } */
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "slp1" { xfail aarch64_sve2 } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-short.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-short.c
new file mode 100644
index 0000000000000000000000000000000000000000..f22c5e550b380c7ddcbb8441565bd0020f101ac2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-short.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_complex_add_short } */
+/* { dg-require-effective-target stdint_types } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+
+#define TYPE uint16_t
+#define N 16
+#include <stdint.h>
+#include "complex-add-pattern-template.c"
+
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 1 "slp1" } } */
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "slp1" { xfail aarch64_sve2 } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/complex/complex-add-pattern-template.c b/gcc/testsuite/gcc.dg/vect/complex/complex-add-pattern-template.c
new file mode 100644
index 0000000000000000000000000000000000000000..e8b8b19d1708673b17564b31d22df3443d667277
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/complex-add-pattern-template.c
@@ -0,0 +1,60 @@
+void add90 (TYPE a[restrict N], TYPE b[restrict N], TYPE c[restrict N])
+{
+ for (int i=0; i < N; i+=2)
+ {
+ c[i] = a[i] - b[i+1];
+ c[i+1] = a[i+1] + b[i];
+ }
+}
+
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 1 "vect" } } */
+
+void add270 (TYPE a[restrict N], TYPE b[restrict N], TYPE c[restrict N])
+{
+ for (int i=0; i < N; i+=2)
+ {
+ c[i] = a[i] + b[i+1];
+ c[i+1] = a[i+1] - b[i];
+ }
+}
+
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "vect" } } */
+
+void addMixed (TYPE a[restrict N], TYPE b[restrict N], TYPE c[restrict N])
+{
+ for (int i=0; i < N; i+=4)
+ {
+ c[i] = a[i] - b[i+1];
+ c[i+1] = a[i+1] + b[i];
+ c[i+2] = a[i+2] + b[i+3];
+ c[i+3] = a[i+3] - b[i+2];
+ }
+}
+
+void add90HandUnrolled (TYPE a[restrict N], TYPE b[restrict N],
+ TYPE c[restrict N])
+{
+ for (int i=0; i < (N /2); i+=4)
+ {
+ c[i] = a[i] - b[i+1];
+ c[i+2] = a[i+2] - b[i+3];
+ c[i+1] = a[i+1] + b[i];
+ c[i+3] = a[i+3] + b[i+2];
+ }
+}
+
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 1 "vect" } } */
+
+void add90Hybrid (TYPE a[restrict N], TYPE b[restrict N], TYPE c[restrict N],
+ TYPE d[restrict N])
+{
+ for (int i=0; i < N; i+=2)
+ {
+ c[i] = a[i] - b[i+1];
+ c[i+1] = a[i+1] + b[i];
+ d[i] = a[i] - b[i];
+ d[i+1] = a[i+1] - b[i+1];
+ }
+}
+
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 2 "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/complex/complex-add-template.c b/gcc/testsuite/gcc.dg/vect/complex/complex-add-template.c
new file mode 100644
index 0000000000000000000000000000000000000000..42d2591fc7f525026d72fa5be3eb62c80504cd19
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/complex-add-template.c
@@ -0,0 +1,79 @@
+#include <complex.h>
+
+void add0 (_Complex TYPE a[restrict N], _Complex TYPE b[restrict N],
+ _Complex TYPE c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] = a[i] + b[i];
+}
+
+void add90snd (_Complex TYPE a[restrict N], _Complex TYPE b[restrict N],
+ _Complex TYPE c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] = a[i] + (b[i] * I);
+}
+
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 1 "vect" } } */
+
+void add180snd (_Complex TYPE a[restrict N], _Complex TYPE b[restrict N],
+ _Complex TYPE c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] = a[i] + (b[i] * I * I);
+}
+
+void add270snd (_Complex TYPE a[restrict N], _Complex TYPE b[restrict N],
+ _Complex TYPE c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] = a[i] + (b[i] * I * I * I);
+}
+
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "vect" } } */
+
+void add90fst (_Complex TYPE a[restrict N], _Complex TYPE b[restrict N],
+ _Complex TYPE c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] = (a[i] * I) + b[i];
+}
+
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 1 "vect" } } */
+
+void add180fst (_Complex TYPE a[restrict N], _Complex TYPE b[restrict N],
+ _Complex TYPE c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] = (a[i] * I * I) + b[i];
+}
+
+void add270fst (_Complex TYPE a[restrict N], _Complex TYPE b[restrict N],
+ _Complex TYPE c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] = (a[i] * I * I * I) + b[i];
+}
+
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "vect" } } */
+
+void addconjfst (_Complex TYPE a[restrict N], _Complex TYPE b[restrict N],
+ _Complex TYPE c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] = ~a[i] + b[i];
+}
+
+void addconjsnd (_Complex TYPE a[restrict N], _Complex TYPE b[restrict N],
+ _Complex TYPE c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] = a[i] + ~b[i];
+}
+
+void addconjboth (_Complex TYPE a[restrict N], _Complex TYPE b[restrict N],
+ _Complex TYPE c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] = ~a[i] + ~b[i];
+}
diff --git a/gcc/testsuite/gcc.dg/vect/complex/complex-operations-run.c b/gcc/testsuite/gcc.dg/vect/complex/complex-operations-run.c
new file mode 100644
index 0000000000000000000000000000000000000000..a0348a7041ca384104bc5ab688d941c14e5b7381
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/complex-operations-run.c
@@ -0,0 +1,103 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_complex_add_double } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+
+#include <stdio.h>
+#include <complex.h>
+#include <string.h>
+#include <float.h>
+#include <math.h>
+
+#define PREF old
+#pragma GCC push_options
+#pragma GCC optimize ("no-tree-vectorize")
+# include "complex-operations.c"
+#pragma GCC pop_options
+#undef PREF
+
+#define PREF new
+# include "complex-operations.c"
+#undef PREF
+
+#define TYPE double
+#define TYPE2 double
+#define EP pow(2, -45)
+
+#define xstr(s) str(s)
+#define str(s) #s
+
+#define FCMP(A, B) \
+ ((fabs (creal (A) - creal (B)) <= EP) && (fabs (cimag (A) - cimag (B)) <= EP))
+
+#define CMP(A, B) \
+ (FCMP(A,B) ? "PASS" : "FAIL")
+
+#define COMPARE(A,B) \
+ memset (&c1, 0, sizeof (c1)); \
+ memset (&c2, 0, sizeof (c2)); \
+ A; B; \
+ if (!FCMP(c1[0],c2[0]) || !FCMP(c1[1], c2[1])) \
+ { \
+ printf ("=> %s vs %s\n", xstr (A), xstr (B)); \
+ printf ("%a\n", creal (c1[0]) - creal (c2[0])); \
+ printf ("%a\n", cimag (c1[1]) - cimag (c2[1])); \
+ printf ("%.2f+%.2fI == %.2f+%.2fI (%s)\n", creal (c1[0]), cimag (c1[0]), creal (c2[0]), cimag (c2[0]), CMP (c1[0], c2[0])); \
+ printf ("%.2f+%.2fI == %.2f+%.2fI (%s)\n", creal (c1[1]), cimag (c1[1]), creal (c2[1]), cimag (c2[1]), CMP (c1[1], c2[1])); \
+ printf ("\n"); \
+ __builtin_abort (); \
+ }
+
+int main ()
+{
+ TYPE2 complex a[] = { 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I };
+ TYPE complex b[] = { 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I };
+ TYPE complex c2[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
+ TYPE complex c1[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
+ TYPE diff1, diff2;
+
+ COMPARE(fma0_old(a, b, c1), fma0_new(a, b, c2));
+ COMPARE(fma90_old(a, b, c1), fma90_new(a, b, c2));
+ COMPARE(fma180_old(a, b, c1), fma180_new(a, b, c2));
+ COMPARE(fma270_old(a, b, c1), fma270_new(a, b, c2));
+ COMPARE(fma0_snd_old(a, b, c1), fma0_snd_new(a, b, c2));
+ COMPARE(fma90_snd_old(a, b, c1), fma90_snd_new(a, b, c2));
+ COMPARE(fma180_snd_old(a, b, c1), fma180_snd_new(a, b, c2));
+ COMPARE(fma270_snd_old(a, b, c1), fma270_snd_new(a, b, c2));
+ COMPARE(fma_conj_first_old(a, b, c1), fma_conj_first_new(a, b, c2));
+ COMPARE(fma_conj_second_old(a, b, c1), fma_conj_second_new(a, b, c2));
+ COMPARE(fma_conj_both_old(a, b, c1), fma_conj_both_new(a, b, c2));
+ COMPARE(fms0_old(a, b, c1), fms0_new(a, b, c2));
+ COMPARE(fms90_old(a, b, c1), fms90_new(a, b, c2));
+ COMPARE(fms180_old(a, b, c1), fms180_new(a, b, c2));
+ COMPARE(fms270_old(a, b, c1), fms270_new(a, b, c2));
+ COMPARE(fms0_snd_old(a, b, c1), fms0_snd_new(a, b, c2));
+ COMPARE(fms90_snd_old(a, b, c1), fms90_snd_new(a, b, c2));
+ COMPARE(fms180_snd_old(a, b, c1), fms180_snd_new(a, b, c2));
+ COMPARE(fms270_snd_old(a, b, c1), fms270_snd_new(a, b, c2));
+ COMPARE(fms_conj_first_old(a, b, c1), fms_conj_first_new(a, b, c2));
+ COMPARE(fms_conj_second_old(a, b, c1), fms_conj_second_new(a, b, c2));
+ COMPARE(fms_conj_both_old(a, b, c1), fms_conj_both_new(a, b, c2));
+ COMPARE(mul0_old(a, b, c1), mul0_new(a, b, c2));
+ COMPARE(mul90_old(a, b, c1), mul90_new(a, b, c2));
+ COMPARE(mul180_old(a, b, c1), mul180_new(a, b, c2));
+ COMPARE(mul270_old(a, b, c1), mul270_new(a, b, c2));
+ COMPARE(mul0_snd_old(a, b, c1), mul0_snd_new(a, b, c2));
+ COMPARE(mul90_snd_old(a, b, c1), mul90_snd_new(a, b, c2));
+ COMPARE(mul180_snd_old(a, b, c1), mul180_snd_new(a, b, c2));
+ COMPARE(mul270_snd_old(a, b, c1), mul270_snd_new(a, b, c2));
+ COMPARE(mul_conj_first_old(a, b, c1), mul_conj_first_new(a, b, c2));
+ COMPARE(mul_conj_second_old(a, b, c1), mul_conj_second_new(a, b, c2));
+ COMPARE(mul_conj_both_old(a, b, c1), mul_conj_both_new(a, b, c2));
+ COMPARE(add0_old(a, b, c1), add0_new(a, b, c2));
+ COMPARE(add90_old(a, b, c1), add90_new(a, b, c2));
+ COMPARE(add180_old(a, b, c1), add180_new(a, b, c2));
+ COMPARE(add270_old(a, b, c1), add270_new(a, b, c2));
+ COMPARE(add0_snd_old(a, b, c1), add0_snd_new(a, b, c2));
+ COMPARE(add90_snd_old(a, b, c1), add90_snd_new(a, b, c2));
+ COMPARE(add180_snd_old(a, b, c1), add180_snd_new(a, b, c2));
+ COMPARE(add270_snd_old(a, b, c1), add270_snd_new(a, b, c2));
+ COMPARE(add_conj_first_old(a, b, c1), add_conj_first_new(a, b, c2));
+ COMPARE(add_conj_second_old(a, b, c1), add_conj_second_new(a, b, c2));
+ COMPARE(add_conj_both_old(a, b, c1), add_conj_both_new(a, b, c2));
+}
diff --git a/gcc/testsuite/gcc.dg/vect/complex/complex-operations.c b/gcc/testsuite/gcc.dg/vect/complex/complex-operations.c
new file mode 100644
index 0000000000000000000000000000000000000000..fdce995481d23c6a536293c8ee59eaf9ca9239bf
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/complex-operations.c
@@ -0,0 +1,358 @@
+#include <stdio.h>
+#include <complex.h>
+
+#ifndef PREF
+#define PREF c
+#endif
+
+#define FX(N,P) P ## _ ## N
+#define MK(N,P) FX(P,N)
+
+#define N 32
+#define TYPE double
+
+// ------ FMA
+
+// Complex FMA instructions rotating the result
+
+__attribute__((noinline,noipa))
+void MK(fma0, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] += a[i] * b[i];
+}
+
+__attribute__((noinline,noipa))
+void MK(fma90, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] += a[i] * b[i] * I;
+}
+
+__attribute__((noinline,noipa))
+void MK(fma180, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] += a[i] * b[i] * I * I;
+}
+
+__attribute__((noinline,noipa))
+void MK(fma270, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] += a[i] * b[i] * I * I * I;
+}
+
+// Complex FMA instructions rotating the second parameter.
+
+
+__attribute__((noinline,noipa))
+void MK(fma0_snd, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] += a[i] * b[i];
+}
+
+__attribute__((noinline,noipa))
+void MK(fma90_snd, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] += a[i] * (b[i] * I);
+}
+
+__attribute__((noinline,noipa))
+void MK(fma180_snd, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] += a[i] * (b[i] * I * I);
+}
+
+__attribute__((noinline,noipa))
+void MK(fma270_snd, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] += a[i] * (b[i] * I * I * I);
+}
+
+// Complex FMA instructions with conjucated values.
+
+
+__attribute__((noinline,noipa))
+void MK(fma_conj_first, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] += conj (a[i]) * b[i];
+}
+
+__attribute__((noinline,noipa))
+void MK(fma_conj_second, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] += a[i] * conj (b[i]);
+}
+
+__attribute__((noinline,noipa))
+void MK(fma_conj_both, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] += conj (a[i]) * conj (b[i]);
+}
+
+// ----- FMS
+
+// Complex FMS instructions rotating the result
+
+__attribute__((noinline,noipa))
+void MK(fms0, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] -= a[i] * b[i];
+}
+
+__attribute__((noinline,noipa))
+void MK(fms90, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] -= a[i] * b[i] * I;
+}
+
+__attribute__((noinline,noipa))
+void MK(fms180, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] -= a[i] * b[i] * I * I;
+}
+
+__attribute__((noinline,noipa))
+void MK(fms270, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] -= a[i] * b[i] * I * I * I;
+}
+
+// Complex FMS instructions rotating the second parameter.
+
+__attribute__((noinline,noipa))
+void MK(fms0_snd, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] -= a[i] * b[i];
+}
+
+__attribute__((noinline,noipa))
+void MK(fms90_snd, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] -= a[i] * (b[i] * I);
+}
+
+__attribute__((noinline,noipa))
+void MK(fms180_snd, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] -= a[i] * (b[i] * I * I);
+}
+
+__attribute__((noinline,noipa))
+void MK(fms270_snd, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] -= a[i] * (b[i] * I * I * I);
+}
+
+// Complex FMS instructions with conjucated values.
+
+__attribute__((noinline,noipa))
+void MK(fms_conj_first, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] -= conj (a[i]) * b[i];
+}
+
+__attribute__((noinline,noipa))
+void MK(fms_conj_second, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] -= a[i] * conj (b[i]);
+}
+
+__attribute__((noinline,noipa))
+void MK(fms_conj_both, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] -= conj (a[i]) * conj (b[i]);
+}
+
+
+// ----- MUL
+
+// Complex MUL instructions rotating the result
+
+__attribute__((noinline,noipa))
+void MK(mul0, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] = a[i] * b[i];
+}
+
+__attribute__((noinline,noipa))
+void MK(mul90, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] = a[i] * b[i] * I;
+}
+
+__attribute__((noinline,noipa))
+void MK(mul180, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] = a[i] * b[i] * I * I;
+}
+
+__attribute__((noinline,noipa))
+void MK(mul270, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] = a[i] * b[i] * I * I * I;
+}
+
+// Complex MUL instructions rotating the second parameter.
+
+__attribute__((noinline,noipa))
+void MK(mul0_snd, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] = a[i] * b[i];
+}
+
+__attribute__((noinline,noipa))
+void MK(mul90_snd, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] = a[i] * (b[i] * I);
+}
+
+__attribute__((noinline,noipa))
+void MK(mul180_snd, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] = a[i] * (b[i] * I * I);
+}
+
+__attribute__((noinline,noipa))
+void MK(mul270_snd, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] = a[i] * (b[i] * I * I * I);
+}
+
+// Complex FMS instructions with conjucated values.
+
+__attribute__((noinline,noipa))
+void MK(mul_conj_first, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] = conj (a[i]) * b[i];
+}
+
+__attribute__((noinline,noipa))
+void MK(mul_conj_second, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] = a[i] * conj (b[i]);
+}
+
+__attribute__((noinline,noipa))
+void MK(mul_conj_both, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] = conj (a[i]) * conj (b[i]);
+}
+
+
+// ----- ADD
+
+// Complex ADD instructions rotating the result
+
+__attribute__((noinline,noipa))
+void MK(add0, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] = a[i] + b[i];
+}
+
+__attribute__((noinline,noipa))
+void MK(add90, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] = (a[i] + b[i]) * I;
+}
+
+__attribute__((noinline,noipa))
+void MK(add180, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] = (a[i] + b[i]) * I * I;
+}
+
+__attribute__((noinline,noipa))
+void MK(add270, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] = (a[i] + b[i]) * I * I * I;
+}
+
+// Complex ADD instructions rotating the second parameter.
+
+__attribute__((noinline,noipa))
+void MK(add0_snd, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] = a[i] + b[i];
+}
+
+__attribute__((noinline,noipa))
+void MK(add90_snd, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] = a[i] + (b[i] * I);
+}
+
+__attribute__((noinline,noipa))
+void MK(add180_snd, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] = a[i] + (b[i] * I * I);
+}
+
+__attribute__((noinline,noipa))
+void MK(add270_snd, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] = a[i] + (b[i] * I * I * I);
+}
+
+// Complex ADD instructions with conjucated values.
+
+__attribute__((noinline,noipa))
+void MK(add_conj_first, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] = conj (a[i]) + b[i];
+}
+
+__attribute__((noinline,noipa))
+void MK(add_conj_second, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] = a[i] + conj (b[i]);
+}
+
+__attribute__((noinline,noipa))
+void MK(add_conj_both, PREF) (TYPE complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
+{
+ for (int i=0; i < N; i++)
+ c[i] = conj (a[i]) + conj (b[i]);
+}
+
+
diff --git a/gcc/testsuite/gcc.dg/vect/complex/complex.exp b/gcc/testsuite/gcc.dg/vect/complex/complex.exp
new file mode 100644
index 0000000000000000000000000000000000000000..daeb02820ce3c83af0b5047cc25c7348790e1b8e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/complex.exp
@@ -0,0 +1,20 @@
+# Copyright (C) 1997-2020 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3. If not see
+# <http://www.gnu.org/licenses/>.
+
+# GCC testsuite that uses the `dg.exp' driver.
+
+# Load support procs.
+load_file $srcdir/$subdir/../vect.exp
diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-double.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-double.c
new file mode 100644
index 0000000000000000000000000000000000000000..9a97d10357741eca73067b41ce7234e87b53a880
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-double.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_complex_add_double } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+
+#define TYPE double
+#define N 16
+#include "complex-add-template.c"
+
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 1 "slp1" } } */
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "slp1" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-float.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-float.c
new file mode 100644
index 0000000000000000000000000000000000000000..63ca9788063f473483064229836e0d0445ebc747
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-float.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_complex_add_float } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+
+#define TYPE float
+#define N 16
+#include "complex-add-template.c"
+
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 1 "slp1" } } */
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "slp1" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-half-float.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-half-float.c
new file mode 100644
index 0000000000000000000000000000000000000000..a6fb4296938112246e98bb45055b7d49df45b5d0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-half-float.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_complex_add_half } */
+/* { dg-add-options arm_v8_3a_fp16_complex_neon } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+
+#define TYPE _Float16
+#define N 16
+#include "complex-add-template.c"
+
+/* Vectorization is failing for these cases. They should work but for now ignore. */
+
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "slp1" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 1 "slp1" { xfail *-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-double.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-double.c
new file mode 100644
index 0000000000000000000000000000000000000000..4c0b9035677f53c792be3e53181eec1e688e0b3e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-double.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_complex_add_double } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+
+#define TYPE double
+#define N 16
+#include "complex-add-pattern-template.c"
+
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 1 "slp1" } } */
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "slp1" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-float.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-float.c
new file mode 100644
index 0000000000000000000000000000000000000000..18ad35316fbd45b228fd9c1b612590df446cf1a5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-float.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_complex_add_float } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+
+#define TYPE float
+#define N 16
+#include "complex-add-pattern-template.c"
+
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 1 "slp1" } } */
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "slp1" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-half-float.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-half-float.c
new file mode 100644
index 0000000000000000000000000000000000000000..801d89dca34bf4771313a97b2bd1953383074d68
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-half-float.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_complex_add_half } */
+/* { dg-add-options arm_v8_3a_fp16_complex_neon } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+
+#define TYPE _Float16
+#define N 16
+#include "complex-add-pattern-template.c"
+
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 1 "slp1" } } */
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "slp1" { xfail arm*-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-double.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-double.c
new file mode 100644
index 0000000000000000000000000000000000000000..9b285b4f875aa2e8adc8daeb720c2f21e3dce38d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-double.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_complex_add_double } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+
+#define TYPE double
+#define N 200
+#include "complex-add-template.c"
+
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 2 "vect" } } */
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 2 "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-float.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-float.c
new file mode 100644
index 0000000000000000000000000000000000000000..f63d38433e53142eb0cc42968682201c1ad32140
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-float.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_complex_add_float } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+
+#define TYPE float
+#define N 200
+#include "complex-add-template.c"
+
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 2 "vect" } } */
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 2 "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-half-float.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-half-float.c
new file mode 100644
index 0000000000000000000000000000000000000000..1736ab9037cd555b2d6dffc9b984ede469b9cf84
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-half-float.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_complex_add_half } */
+/* { dg-add-options arm_v8_3a_fp16_complex_neon } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+
+#define TYPE _Float16
+#define N 200
+#include "complex-add-template.c"
+
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 2 "vect" } } */
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 2 "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-pattern-double.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-pattern-double.c
new file mode 100644
index 0000000000000000000000000000000000000000..6dd621ad1c07138501921241fb37e7f419e963c3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-pattern-double.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_complex_add_double } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+
+#define TYPE double
+#define N 200
+#include "complex-add-pattern-template.c"
+
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 4 "vect" } } */
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-pattern-float.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-pattern-float.c
new file mode 100644
index 0000000000000000000000000000000000000000..e081abbc5f879385cd76d57359eb18e54cce911f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-pattern-float.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_complex_add_float } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+
+#define TYPE float
+#define N 200
+#include "complex-add-pattern-template.c"
+
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 4 "vect" } } */
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-pattern-half-float.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-pattern-half-float.c
new file mode 100644
index 0000000000000000000000000000000000000000..b368e086083c5a4c59a383acce8ec770dca9277c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-pattern-half-float.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_complex_add_half } */
+/* { dg-add-options arm_v8_3a_fp16_complex_neon } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+
+#define TYPE _Float16
+#define N 200
+#include "complex-add-pattern-template.c"
+
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 4 "vect" } } */
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "vect" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-byte.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-byte.c
new file mode 100644
index 0000000000000000000000000000000000000000..b98dabe4a527f2382a2bd9dc5b04fe0a4435a021
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-byte.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_complex_add_byte } */
+/* { dg-require-effective-target stdint_types } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+
+#define TYPE int8_t
+#define N 200
+#include <stdint.h>
+#include "complex-add-pattern-template.c"
+
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-int.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-int.c
new file mode 100644
index 0000000000000000000000000000000000000000..a5d565a748e0d3fd157bbd3245b683a3114ade7f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-int.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_complex_add_int } */
+/* { dg-require-effective-target stdint_types } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+
+#define TYPE int32_t
+#define N 200
+#include <stdint.h>
+#include "complex-add-pattern-template.c"
+
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-long.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-long.c
new file mode 100644
index 0000000000000000000000000000000000000000..b67d3cea3b5e469121e1f870819713046c619bc7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-long.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_complex_add_long } */
+/* { dg-require-effective-target stdint_types } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+
+#define TYPE int64_t
+#define N 200
+#include <stdint.h>
+#include "complex-add-pattern-template.c"
+
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-short.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-short.c
new file mode 100644
index 0000000000000000000000000000000000000000..1aa3609a23e83e2b26a990cd5ca246d9ed5e8e58
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-short.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_complex_add_short } */
+/* { dg-require-effective-target stdint_types } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+
+#define TYPE int16_t
+#define N 200
+#include <stdint.h>
+#include "complex-add-pattern-template.c"
+
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-byte.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-byte.c
new file mode 100644
index 0000000000000000000000000000000000000000..c8d45cfa755a15732ce0cfefc2798c03598cc224
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-byte.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_complex_add_byte } */
+/* { dg-require-effective-target stdint_types } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+
+#define TYPE uint8_t
+#define N 200
+#include <stdint.h>
+#include "complex-add-pattern-template.c"
+
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-int.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-int.c
new file mode 100644
index 0000000000000000000000000000000000000000..7a9aaf88e1cddbc867832036c5b26337666bcd14
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-int.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_complex_add_int } */
+/* { dg-require-effective-target stdint_types } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+
+#define TYPE uint32_t
+#define N 200
+#include <stdint.h>
+#include "complex-add-pattern-template.c"
+
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-long.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-long.c
new file mode 100644
index 0000000000000000000000000000000000000000..c4ac5269176cbd9ad3f5a02f96a0650e8790afdd
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-long.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_complex_add_long } */
+/* { dg-require-effective-target stdint_types } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+
+#define TYPE uint64_t
+#define N 200
+#include <stdint.h>
+#include "complex-add-pattern-template.c"
+
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-short.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-short.c
new file mode 100644
index 0000000000000000000000000000000000000000..8203ebc07a4c6dc9704c92700badb160f1a51223
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-short.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_complex_add_short } */
+/* { dg-require-effective-target stdint_types } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+
+#define TYPE uint16_t
+#define N 200
+#include <stdint.h>
+#include "complex-add-pattern-template.c"
+
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "vect" } } */
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 89c4f67554f6da90fe420694ec2a61e6770f04a8..4af9c5f9e55bbe4d1b15fa002348ae1befc6a1af 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3367,7 +3367,116 @@ proc check_effective_target_vect_int { } {
}}]
}
-# Return 1 if the target supports signed int->float conversion
+# Return 1 if the target supports hardware vectorization of complex additions of
+# byte, 0 otherwise.
+#
+# This won't change for different subtargets so cache the result.
+
+proc check_effective_target_vect_complex_add_byte { } {
+ return [check_cached_effective_target_indexed vect_complex_add_byte {
+ expr {
+ ([check_effective_target_aarch64_sve2]
+ && [check_effective_target_aarch64_little_endian])
+ || ([check_effective_target_arm_v8_1m_mve_fp_ok]
+ && [check_effective_target_arm_little_endian])
+ }}]
+}
+
+# Return 1 if the target supports hardware vectorization of complex additions of
+# short, 0 otherwise.
+#
+# This won't change for different subtargets so cache the result.
+
+proc check_effective_target_vect_complex_add_short { } {
+ return [check_cached_effective_target_indexed vect_complex_add_short {
+ expr {
+ ([check_effective_target_aarch64_sve2]
+ && [check_effective_target_aarch64_little_endian])
+ || ([check_effective_target_arm_v8_1m_mve_fp_ok]
+ && [check_effective_target_arm_little_endian])
+ }}]
+}
+
+# Return 1 if the target supports hardware vectorization of complex additions of
+# int, 0 otherwise.
+#
+# This won't change for different subtargets so cache the result.
+
+proc check_effective_target_vect_complex_add_int { } {
+ return [check_cached_effective_target_indexed vect_complex_add_int {
+ expr {
+ ([check_effective_target_aarch64_sve2]
+ && [check_effective_target_aarch64_little_endian])
+ || ([check_effective_target_arm_v8_1m_mve_fp_ok]
+ && [check_effective_target_arm_little_endian])
+ }}]
+}
+
+# Return 1 if the target supports hardware vectorization of complex additions of
+# long, 0 otherwise.
+#
+# This won't change for different subtargets so cache the result.
+
+proc check_effective_target_vect_complex_add_long { } {
+ return [check_cached_effective_target_indexed vect_complex_add_long {
+ expr {
+ ([check_effective_target_aarch64_sve2]
+ && [check_effective_target_aarch64_little_endian])
+ || ([check_effective_target_arm_v8_1m_mve_fp_ok]
+ && [check_effective_target_arm_little_endian])
+ }}]
+}
+
+# Return 1 if the target supports hardware vectorization of complex additions of
+# half, 0 otherwise.
+#
+# This won't change for different subtargets so cache the result.
+
+proc check_effective_target_vect_complex_add_half { } {
+ return [check_cached_effective_target_indexed vect_complex_add_half {
+ expr {
+ ([check_effective_target_arm_v8_3a_fp16_complex_neon_ok]
+ && ([check_effective_target_aarch64_little_endian]
+ || [check_effective_target_arm_little_endian]))
+ || ([check_effective_target_aarch64_sve2]
+ && [check_effective_target_aarch64_little_endian])
+ || ([check_effective_target_arm_v8_1m_mve_fp_ok]
+ && [check_effective_target_arm_little_endian])
+ }}]
+}
+
+# Return 1 if the target supports hardware vectorization of complex additions of
+# float, 0 otherwise.
+#
+# This won't change for different subtargets so cache the result.
+
+proc check_effective_target_vect_complex_add_float { } {
+ return [check_cached_effective_target_indexed vect_complex_add_float {
+ expr {
+ ([check_effective_target_arm_v8_3a_complex_neon_ok]
+ && ([check_effective_target_aarch64_little_endian]
+ || [check_effective_target_arm_little_endian]))
+ || ([check_effective_target_aarch64_sve2]
+ && [check_effective_target_aarch64_little_endian])
+ || ([check_effective_target_arm_v8_1m_mve_fp_ok]
+ && [check_effective_target_arm_little_endian])
+ }}]
+}
+
+# Return 1 if the target supports hardware vectorization of complex additions of
+# double, 0 otherwise.
+#
+# This won't change for different subtargets so cache the result.
+
+proc check_effective_target_vect_complex_add_double { } {
+ return [check_cached_effective_target_indexed vect_complex_add_double {
+ expr {
+ ([check_effective_target_aarch64_sve2]
+ && [check_effective_target_aarch64_little_endian])
+ }}]
+}
+
+# Return 1 if the target supports signed int->float conversion
#
proc check_effective_target_vect_intfloat_cvt { } {
@@ -10386,13 +10495,13 @@ proc check_effective_target_arm_v8_3a_complex_neon_ok_nocache { } {
# need to be added to the -march option.
foreach flags {"" "-mfloat-abi=softfp -mfpu=auto" "-mfloat-abi=hard -mfpu=auto"} {
if { [check_no_compiler_messages_nocache \
- arm_v8_3a_complex_neon_ok object {
+ arm_v8_3a_complex_neon_ok assembly {
#if !defined (__ARM_FEATURE_COMPLEX)
#error "__ARM_FEATURE_COMPLEX not defined"
#endif
} "$flags -march=armv8.3-a"] } {
set et_arm_v8_3a_complex_neon_flags "$flags -march=armv8.3-a"
- return 1
+ return 1;
}
}
@@ -10412,13 +10521,57 @@ proc add_options_for_arm_v8_3a_complex_neon { flags } {
return "$flags $et_arm_v8_3a_complex_neon_flags"
}
+# Return 1 if the target supports ARMv8.3 Adv.SIMD + FP16 Complex instructions
+# instructions, 0 otherwise. The test is valid for ARM and for AArch64.
+# Record the command line options needed.
+
+proc check_effective_target_arm_v8_3a_fp16_complex_neon_ok_nocache { } {
+ global et_arm_v8_3a_fp16_complex_neon_flags
+ set et_arm_v8_3a_fp16_complex_neon_flags ""
+
+ if { ![istarget arm*-*-*] && ![istarget aarch64*-*-*] } {
+ return 0;
+ }
+
+ # Iterate through sets of options to find the compiler flags that
+ # need to be added to the -march option.
+ foreach flags {"" "-mfloat-abi=softfp -mfpu=auto" "-mfloat-abi=hard -mfpu=auto"} {
+ if { [check_no_compiler_messages_nocache \
+ arm_v8_3a_fp16_complex_neon_ok assembly {
+ #if !defined (__ARM_FEATURE_COMPLEX)
+ #error "__ARM_FEATURE_COMPLEX not defined"
+ #endif
+ } "$flags -march=armv8.3-a+fp16"] } {
+ set et_arm_v8_3a_fp16_complex_neon_flags \
+ "$flags -march=armv8.3-a+fp16"
+ return 1;
+ }
+ }
+
+ return 0;
+}
+
+proc check_effective_target_arm_v8_3a_fp16_complex_neon_ok { } {
+ return [check_cached_effective_target arm_v8_3a_fp16_complex_neon_ok \
+ check_effective_target_arm_v8_3a_fp16_complex_neon_ok_nocache]
+}
+
+proc add_options_for_arm_v8_3a_fp16_complex_neon { flags } {
+ if { ! [check_effective_target_arm_v8_3a_fp16_complex_neon_ok] } {
+ return "$flags"
+ }
+ global et_arm_v8_3a_fp16_complex_neon_flags
+ return "$flags $et_arm_v8_3a_fp16_complex_neon_flags"
+}
+
+
# Return 1 if the target supports executing AdvSIMD instructions from ARMv8.3
# with the complex instruction extension, 0 otherwise. The test is valid for
# ARM and for AArch64.
proc check_effective_target_arm_v8_3a_complex_neon_hw { } {
if { ![check_effective_target_arm_v8_3a_complex_neon_ok] } {
- return 0;
+ return 1;
}
return [check_runtime arm_v8_3a_complex_neon_hw_available {
#include "arm_neon.h"
@@ -10443,7 +10596,7 @@ proc check_effective_target_arm_v8_3a_complex_neon_hw { } {
: /* No clobbers. */);
#endif
- return (results[0] == 8 && results[1] == 24) ? 1 : 0;
+ return (results[0] == 8 && results[1] == 24) ? 0 : 1;
}
} [add_options_for_arm_v8_3a_complex_neon ""]]
}
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 72bbec4b45d225d73eddfaca69468cd23842a42a..52757add0e3dbae41608a1786661b326f0da9be9 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -2698,9 +2698,13 @@ again:
STMT_SLP_TYPE (stmt_info) = loop_vect;
if (STMT_VINFO_IN_PATTERN_P (stmt_info))
{
+ stmt_vec_info pattern_stmt_info
+ = STMT_VINFO_RELATED_STMT (stmt_info);
+ if (STMT_VINFO_SLP_VECT_ONLY (pattern_stmt_info))
+ STMT_VINFO_IN_PATTERN_P (stmt_info) = false;
+
gimple *pattern_def_seq = STMT_VINFO_PATTERN_DEF_SEQ (stmt_info);
- stmt_info = STMT_VINFO_RELATED_STMT (stmt_info);
- STMT_SLP_TYPE (stmt_info) = loop_vect;
+ STMT_SLP_TYPE (pattern_stmt_info) = loop_vect;
for (gimple_stmt_iterator pi = gsi_start (pattern_def_seq);
!gsi_end_p (pi); gsi_next (&pi))
STMT_SLP_TYPE (loop_vinfo->lookup_stmt (gsi_stmt (pi)))
diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
new file mode 100644
index 0000000000000000000000000000000000000000..bb2830d1d35d1607d1566868ffbcead97e4790d7
--- /dev/null
+++ b/gcc/tree-vect-slp-patterns.c
@@ -0,0 +1,720 @@
+/* SLP - Pattern matcher on SLP trees
+ Copyright (C) 2020 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3. If not see
+<http://www.gnu.org/licenses/>. */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "target.h"
+#include "rtl.h"
+#include "tree.h"
+#include "gimple.h"
+#include "tree-pass.h"
+#include "ssa.h"
+#include "optabs-tree.h"
+#include "insn-config.h"
+#include "recog.h" /* FIXME: for insn_data */
+#include "fold-const.h"
+#include "stor-layout.h"
+#include "gimple-iterator.h"
+#include "cfgloop.h"
+#include "tree-vectorizer.h"
+#include "langhooks.h"
+#include "gimple-walk.h"
+#include "dbgcnt.h"
+#include "tree-vector-builder.h"
+#include "vec-perm-indices.h"
+#include "gimple-fold.h"
+#include "internal-fn.h"
+
+/* SLP Pattern matching mechanism.
+
+ This extension to the SLP vectorizer allows one to transform the generated SLP
+ tree based on any pattern. The difference between this and the normal vect
+ pattern matcher is that unlike the former, this matcher allows you to match
+ with instructions that do not belong to the same SSA dominator graph.
+
+ The only requirement that this pattern matcher has is that you are only
+ only allowed to either match an entire group or none.
+
+ The pattern matcher currently only allows you to perform replacements to
+ internal functions.
+
+ Once the patterns are matched it is one way, these cannot be undone. It is
+ currently not supported to match patterns recursively.
+
+ To add a new pattern, implement the vect_pattern class and add the type to
+ slp_patterns.
+
+*/
+
+/*******************************************************************************
+ * vect_pattern class
+ ******************************************************************************/
+
+/* Default implementation of recognize that performs matching, validation and
+ replacement of nodes but that can be overriden if required. */
+
+static bool
+vect_pattern_validate_optab (internal_fn ifn, slp_tree node)
+{
+ tree vectype = SLP_TREE_VECTYPE (node);
+ if (ifn == IFN_LAST || !vectype)
+ return false;
+
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_NOTE, vect_location,
+ "Found %s pattern in SLP tree\n",
+ internal_fn_name (ifn));
+
+ if (direct_internal_fn_supported_p (ifn, vectype, OPTIMIZE_FOR_SPEED))
+ {
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_NOTE, vect_location,
+ "Target supports %s vectorization with mode %T\n",
+ internal_fn_name (ifn), vectype);
+ }
+ else
+ {
+ if (dump_enabled_p ())
+ {
+ if (!vectype)
+ dump_printf_loc (MSG_NOTE, vect_location,
+ "Target does not support vector type for %T\n",
+ SLP_TREE_DEF_TYPE (node));
+ else
+ dump_printf_loc (MSG_NOTE, vect_location,
+ "Target does not support %s for vector type "
+ "%T\n", internal_fn_name (ifn), vectype);
+ }
+ return false;
+ }
+ return true;
+}
+
+/*******************************************************************************
+ * General helper types
+ ******************************************************************************/
+
+/* The COMPLEX_OPERATION enum denotes the possible pair of operations that can
+ be matched when looking for expressions that we are interested matching for
+ complex numbers addition and mla. */
+
+typedef enum _complex_operation : unsigned {
+ PLUS_PLUS,
+ MINUS_PLUS,
+ PLUS_MINUS,
+ MULT_MULT,
+ CMPLX_NONE
+} complex_operation_t;
+
+/*******************************************************************************
+ * General helper functions
+ ******************************************************************************/
+
+/* Helper function of linear_loads_p that checks to see if the load permutation
+ is sequential and in monotonically increasing order of loads with no gaps.
+*/
+
+static inline complex_perm_kinds_t
+is_linear_load_p (load_permutation_t loads)
+{
+ if (loads.length() == 0)
+ return PERM_UNKNOWN;
+
+ unsigned load, i;
+ complex_perm_kinds_t candidates[4]
+ = { PERM_EVENODD
+ , PERM_ODDEVEN
+ , PERM_ODDODD
+ , PERM_EVENEVEN
+ };
+
+ int valid_patterns = 4;
+ FOR_EACH_VEC_ELT_FROM (loads, i, load, 1)
+ {
+ if (candidates[0] != PERM_UNKNOWN && load != i)
+ {
+ candidates[0] = PERM_UNKNOWN;
+ valid_patterns--;
+ }
+ if (candidates[1] != PERM_UNKNOWN
+ && load != (i % 2 == 0 ? i + 1 : i - 1))
+ {
+ candidates[1] = PERM_UNKNOWN;
+ valid_patterns--;
+ }
+ if (candidates[2] != PERM_UNKNOWN && load != 1)
+ {
+ candidates[2] = PERM_UNKNOWN;
+ valid_patterns--;
+ }
+ if (candidates[3] != PERM_UNKNOWN && load != 0)
+ {
+ candidates[3] = PERM_UNKNOWN;
+ valid_patterns--;
+ }
+
+ if (valid_patterns == 0)
+ return PERM_UNKNOWN;
+ }
+
+ for (i = 0; i < sizeof(candidates); i++)
+ if (candidates[i] != PERM_UNKNOWN)
+ return candidates[i];
+
+ return PERM_UNKNOWN;
+}
+
+/* Combine complex_perm_kinds A and B into a new permute kind that describes the
+ resulting operation. */
+
+static inline complex_perm_kinds_t
+vect_merge_perms (complex_perm_kinds_t a, complex_perm_kinds_t b)
+{
+ if (a == b)
+ return a;
+
+ if (a == PERM_TOP)
+ return b;
+
+ if (b == PERM_TOP)
+ return a;
+
+ return PERM_UNKNOWN;
+}
+
+/* Check to see if all loads rooted in ROOT are linear. Linearity is
+ defined as having no gaps between values loaded. */
+
+static complex_load_perm_t
+linear_loads_p (slp_tree_to_load_perm_map_t *perm_cache, slp_tree root)
+{
+ if (!root)
+ return std::make_pair (PERM_UNKNOWN, vNULL);
+
+ unsigned i;
+ complex_load_perm_t *tmp;
+
+ if ((tmp = perm_cache->get (root)) != NULL)
+ return *tmp;
+
+ complex_load_perm_t retval = std::make_pair (PERM_UNKNOWN, vNULL);
+ perm_cache->put (root, retval);
+
+ /* If it's a load node, then just read the load permute. */
+ if (SLP_TREE_LOAD_PERMUTATION (root).exists ())
+ {
+ retval.first = is_linear_load_p (SLP_TREE_LOAD_PERMUTATION (root));
+ retval.second = SLP_TREE_LOAD_PERMUTATION (root);
+ perm_cache->put (root, retval);
+ return retval;
+ }
+ else if (SLP_TREE_DEF_TYPE (root) != vect_internal_def)
+ {
+ retval.first = PERM_TOP;
+ return retval;
+ }
+
+ auto_vec<load_permutation_t> all_loads;
+ complex_perm_kinds_t kind = PERM_TOP;
+
+ slp_tree child;
+ FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (root), i, child)
+ {
+ complex_load_perm_t res = linear_loads_p (perm_cache, child);
+ kind = vect_merge_perms (kind, res.first);
+ if (kind == PERM_UNKNOWN)
+ return retval;
+ all_loads.safe_push (res.second);
+ }
+
+ if (SLP_TREE_LANE_PERMUTATION (root).exists ())
+ {
+ lane_permutation_t perm = SLP_TREE_LANE_PERMUTATION (root);
+ load_permutation_t nloads;
+ nloads.create (SLP_TREE_LANES (root));
+ nloads.quick_grow (SLP_TREE_LANES (root));
+ for (i = 0; i < SLP_TREE_LANES (root); i++)
+ nloads[i] = all_loads[perm[i].first][perm[i].second];
+
+ retval.first = kind;
+ retval.second = nloads;
+ }
+ else if (all_loads.length () == 1)
+ {
+ retval.first = kind;
+ retval.second = all_loads[0];
+ }
+
+ perm_cache->put (root, retval);
+ return retval;
+}
+
+
+/* This function attempts to make a node rooted in NODE is linear. If the node
+ if already linear than the node itself is returned in RESULT.
+
+ If the node is not linear then a new VEC_PERM_EXPR node is created with a
+ lane permute that when applied will make the node linear. If such a
+ permute cannot be created then FALSE is returned from the function.
+
+ Here linearity is defined as having a sequential, monotically increasing
+ load position inside the load permute generated by the loads reachable from
+ NODE. */
+
+static slp_tree
+vect_build_swap_evenodd_node (slp_tree node)
+{
+ /* Attempt to linearise the permute. */
+ vec<std::pair<unsigned, unsigned> > zipped;
+ zipped.create (SLP_TREE_LANES (node));
+
+ for (unsigned x = 0; x < SLP_TREE_LANES (node); x+=2)
+ {
+ zipped.quick_push (std::make_pair (0, x+1));
+ zipped.quick_push (std::make_pair (0, x));
+ }
+
+ /* Create the new permute node and store it instead. */
+ slp_tree vnode = vect_create_new_slp_node (1, VEC_PERM_EXPR);
+ SLP_TREE_LANE_PERMUTATION (vnode) = zipped;
+ SLP_TREE_VECTYPE (vnode) = SLP_TREE_VECTYPE (node);
+ SLP_TREE_CHILDREN (vnode).quick_push (node);
+ SLP_TREE_REF_COUNT (vnode) = 1;
+ SLP_TREE_LANES (vnode) = SLP_TREE_LANES (node);
+ SLP_TREE_REPRESENTATIVE (vnode) = SLP_TREE_REPRESENTATIVE (node);
+ SLP_TREE_REF_COUNT (node)++;
+ return vnode;
+}
+
+/* Checks to see of the expression represented by NODE is a gimple assign with
+ code CODE. */
+
+static inline bool
+vect_match_expression_p (slp_tree node, tree_code code)
+{
+ if (!node
+ || !SLP_TREE_REPRESENTATIVE (node))
+ return false;
+
+ gimple* expr = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE (node));
+ if (!is_gimple_assign (expr)
+ || gimple_assign_rhs_code (expr) != code)
+ return false;
+
+ return true;
+}
+
+/* Check if the given lane permute in PERMUTES matches an alternating sequence
+ of {even odd even odd ...}. This to account for unrolled loops. Further
+ mode there resulting permute must be linear. */
+
+static inline bool
+vect_check_evenodd_blend (lane_permutation_t &permutes,
+ unsigned even, unsigned odd)
+{
+ if (permutes.length () == 0)
+ return false;
+
+ unsigned val[2] = {even, odd};
+ unsigned seed = 0;
+ for (unsigned i = 0; i < permutes.length (); i++)
+ if (permutes[i].first != val[i % 2]
+ || permutes[i].second != seed++)
+ return false;
+
+ return true;
+}
+
+/* This function will match the two gimple expressions representing NODE1 and
+ NODE2 in parallel and returns the pair operation that represents the two
+ expressions in the two statements.
+
+ If match is successful then the corresponding complex_operation is
+ returned and the arguments to the two matched operations are returned in OPS.
+
+ If TWO_OPERANDS it is expected that the LANES of the parent VEC_PERM select
+ from the two nodes alternatingly.
+
+ If unsuccessful then CMPLX_NONE is returned and OPS is untouched.
+
+ e.g. the following gimple statements
+
+ stmt 0 _39 = _37 + _12;
+ stmt 1 _6 = _38 - _36;
+
+ will return PLUS_MINUS along with OPS containing {_37, _12, _38, _36}.
+*/
+
+static complex_operation_t
+vect_detect_pair_op (slp_tree node1, slp_tree node2, lane_permutation_t &lanes,
+ bool two_operands = true, vec<slp_tree> *ops = NULL)
+{
+ complex_operation_t result = CMPLX_NONE;
+
+ if (vect_match_expression_p (node1, MINUS_EXPR)
+ && vect_match_expression_p (node2, PLUS_EXPR)
+ && (!two_operands || vect_check_evenodd_blend (lanes, 0, 1)))
+ result = MINUS_PLUS;
+ else if (vect_match_expression_p (node1, PLUS_EXPR)
+ && vect_match_expression_p (node2, MINUS_EXPR)
+ && (!two_operands || vect_check_evenodd_blend (lanes, 0, 1)))
+ result = PLUS_MINUS;
+ else if (vect_match_expression_p (node1, PLUS_EXPR)
+ && vect_match_expression_p (node2, PLUS_EXPR))
+ result = PLUS_PLUS;
+ else if (vect_match_expression_p (node1, MULT_EXPR)
+ && vect_match_expression_p (node2, MULT_EXPR))
+ result = MULT_MULT;
+
+ if (result != CMPLX_NONE && ops != NULL)
+ {
+ ops->create (2);
+ ops->quick_push (node1);
+ ops->quick_push (node2);
+ }
+ return result;
+}
+
+/* Overload of vect_detect_pair_op that matches against the representative
+ statements in the children of NODE. It is expected that NODE has exactly
+ two children and when TWO_OPERANDS then NODE must be a VEC_PERM. */
+
+static complex_operation_t
+vect_detect_pair_op (slp_tree node, bool two_operands = true,
+ vec<slp_tree> *ops = NULL)
+{
+ if (!two_operands && SLP_TREE_CODE (node) == VEC_PERM_EXPR)
+ return CMPLX_NONE;
+
+ if (SLP_TREE_CHILDREN (node).length () != 2)
+ return CMPLX_NONE;
+
+ vec<slp_tree> children = SLP_TREE_CHILDREN (node);
+ lane_permutation_t &lanes = SLP_TREE_LANE_PERMUTATION (node);
+
+ return vect_detect_pair_op (children[0], children[1], lanes, two_operands,
+ ops);
+}
+
+/*******************************************************************************
+ * complex_pattern class
+ ******************************************************************************/
+
+/* SLP Complex Numbers pattern matching.
+
+ As an example, the following simple loop:
+
+ double a[restrict N]; double b[restrict N]; double c[restrict N];
+
+ for (int i=0; i < N; i+=2)
+ {
+ c[i] = a[i] - b[i+1];
+ c[i+1] = a[i+1] + b[i];
+ }
+
+ which represents a complex addition on with a rotation of 90* around the
+ argand plane. i.e. if `a` and `b` were complex numbers then this would be the
+ same as `a + (b * I)`.
+
+ Here the expressions for `c[i]` and `c[i+1]` are independent but have to be
+ both recognized in order for the pattern to work. As an SLP tree this is
+ represented as
+
+ +--------------------------------+
+ | stmt 0 *_9 = _10; |
+ | stmt 1 *_15 = _16; |
+ +--------------------------------+
+ |
+ |
+ v
+ +--------------------------------+
+ | stmt 0 _10 = _4 - _8; |
+ | stmt 1 _16 = _12 + _14; |
+ | lane permutation { 0[0] 1[1] } |
+ +--------------------------------+
+ | |
+ | |
+ | |
+ +-----+ | | +-----+
+ | | | | | |
+ +-----| { } |<-----+ +----->| { } --------+
+ | | | +------------------| | |
+ | +-----+ | +-----+ |
+ | | | |
+ | | | |
+ | +------|------------------+ |
+ | | | |
+ v v v v
+ +--------------------------+ +--------------------------------+
+ | stmt 0 _8 = *_7; | | stmt 0 _4 = *_3; |
+ | stmt 1 _14 = *_13; | | stmt 1 _12 = *_11; |
+ | load permutation { 1 0 } | | load permutation { 0 1 } |
+ +--------------------------+ +--------------------------------+
+
+ The pattern matcher allows you to replace both statements 0 and 1 or none at
+ all. Because this operation is a two operands operation the actual nodes
+ being replaced are those in the { } nodes. The actual scalar statements
+ themselves are not replaced or used during the matching but instead the
+ SLP_TREE_REPRESENTATIVE statements are inspected. You are also allowed to
+ replace and match on any number of nodes.
+
+ Because the pattern matcher matches on the representative statement for the
+ SLP node the case of two_operators it allows you to match the children of the
+ node. This is done using the method `recognize ()`.
+
+*/
+
+/* The complex_pattern class contains common code for pattern matchers that work
+ on complex numbers. These provide functionality to allow de-construction and
+ validation of sequences depicting/transforming REAL and IMAG pairs. */
+
+class complex_pattern : public vect_pattern
+{
+ protected:
+ auto_vec<slp_tree> m_workset;
+ complex_pattern (slp_tree *node, vec<slp_tree> *m_ops, internal_fn ifn)
+ : vect_pattern (node, m_ops, ifn)
+ {
+ this->m_workset.safe_push (*node);
+ }
+
+ public:
+ void build (vec_info *);
+
+ static internal_fn
+ matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
+ vec<slp_tree> *);
+};
+
+/* Create a replacement pattern statement for each node in m_node and inserts
+ the new statement into m_node as the new representative statement. The old
+ statement is marked as being in a pattern defined by the new statement. The
+ statement is created as call to internal function IFN with m_num_args
+ arguments.
+
+ Futhermore the new pattern is also added to the vectorization information
+ structure VINFO and the old statement STMT_INFO is marked as unused while
+ the new statement is marked as used and the number of SLP uses of the new
+ statement is incremented.
+
+ The newly created SLP nodes are marked as SLP only and will be dissolved
+ if SLP is aborted.
+
+ The newly created gimple call is returned and the BB remains unchanged.
+
+ This default method is designed to only match against simple operands where
+ all the input and output types are the same.
+*/
+
+void
+complex_pattern::build (vec_info *vinfo)
+{
+ stmt_vec_info stmt_info;
+
+ auto_vec<tree> args;
+ args.create (this->m_num_args);
+ args.quick_grow_cleared (this->m_num_args);
+ slp_tree node;
+ unsigned ix;
+ stmt_vec_info call_stmt_info;
+ gcall *call_stmt = NULL;
+
+ /* Now modify the nodes themselves. */
+ FOR_EACH_VEC_ELT (this->m_workset, ix, node)
+ {
+ /* Calculate the location of the statement in NODE to replace. */
+ stmt_info = SLP_TREE_REPRESENTATIVE (node);
+ gimple* old_stmt = STMT_VINFO_STMT (stmt_info);
+ tree lhs_old_stmt = gimple_get_lhs (old_stmt);
+ tree type = TREE_TYPE (lhs_old_stmt);
+
+ /* Create the argument set for use by gimple_build_call_internal_vec. */
+ for (unsigned i = 0; i < this->m_num_args; i++)
+ args[i] = lhs_old_stmt;
+
+ /* Create the new pattern statements. */
+ call_stmt = gimple_build_call_internal_vec (this->m_ifn, args);
+ tree var = make_temp_ssa_name (type, call_stmt, "slp_patt");
+ gimple_call_set_lhs (call_stmt, var);
+ gimple_set_location (call_stmt, gimple_location (old_stmt));
+ gimple_call_set_nothrow (call_stmt, true);
+
+ /* Adjust the book-keeping for the new and old statements for use during
+ SLP. This is required to get the right VF and statement during SLP
+ analysis. These changes are created after relevancy has been set for
+ the nodes as such we need to manually update them. Any changes will be
+ undone if SLP is cancelled. */
+ call_stmt_info
+ = vinfo->add_pattern_stmt (call_stmt, stmt_info);
+
+ /* Make sure to mark the representative statement pure_slp and
+ relevant. */
+ STMT_VINFO_RELEVANT (call_stmt_info) = vect_used_in_scope;
+ STMT_SLP_TYPE (call_stmt_info) = pure_slp;
+
+ /* add_pattern_stmt can't be done in vect_mark_pattern_stmts because
+ the non-SLP pattern matchers already have added the statement to VINFO
+ by the time it is called. Some of them need to modify the returned
+ stmt_info. vect_mark_pattern_stmts is called by recog_pattern and it
+ would increase the size of each pattern with boilerplate code to make
+ the call there. */
+ vect_mark_pattern_stmts (vinfo, stmt_info, call_stmt,
+ SLP_TREE_VECTYPE (node));
+ STMT_VINFO_SLP_VECT_ONLY (call_stmt_info) = true;
+
+ /* Since we are replacing all the statements in the group with the same
+ thing it doesn't really matter. So just set it every time a new stmt
+ is created. */
+ SLP_TREE_REPRESENTATIVE (node) = call_stmt_info;
+ SLP_TREE_LANE_PERMUTATION (node).release ();
+ SLP_TREE_CODE (node) = CALL_EXPR;
+ }
+}
+
+/*******************************************************************************
+ * complex_add_pattern class
+ ******************************************************************************/
+
+class complex_add_pattern : public complex_pattern
+{
+ protected:
+ complex_add_pattern (slp_tree *node, vec<slp_tree> *m_ops, internal_fn ifn)
+ : complex_pattern (node, m_ops, ifn)
+ {
+ this->m_num_args = 2;
+ }
+
+ public:
+ void build (vec_info *);
+ static internal_fn
+ matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
+ vec<slp_tree> *);
+
+ static vect_pattern*
+ recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
+};
+
+/* Perform a replacement of the detected complex add pattern with the new
+ instruction sequences. */
+
+void
+complex_add_pattern::build (vec_info *vinfo)
+{
+ auto_vec<slp_tree> nodes;
+ slp_tree node = this->m_ops[0];
+ vec<slp_tree> children = SLP_TREE_CHILDREN (node);
+
+ /* First re-arrange the children. */
+ nodes.create (children.length ());
+ nodes.quick_push (children[0]);
+ nodes.quick_push (vect_build_swap_evenodd_node (children[1]));
+
+ SLP_TREE_CHILDREN (*this->m_node).truncate (0);
+ SLP_TREE_CHILDREN (*this->m_node).safe_splice (nodes);
+
+ complex_pattern::build (vinfo);
+}
+
+/* Pattern matcher for trying to match complex addition pattern in SLP tree.
+
+ If no match is found then IFN is set to IFN_LAST.
+ This function matches the patterns shaped as:
+
+ c[i] = a[i] - b[i+1];
+ c[i+1] = a[i+1] + b[i];
+
+ If a match occurred then TRUE is returned, else FALSE. The initial match is
+ expected to be in OP1 and the initial match operands in args0. */
+
+internal_fn
+complex_add_pattern::matches (complex_operation_t op,
+ slp_tree_to_load_perm_map_t *perm_cache,
+ vec<slp_tree> *ops)
+{
+ internal_fn ifn = IFN_LAST;
+
+ /* Find the two components. Rotation in the complex plane will modify
+ the operations:
+
+ * Rotation 0: + +
+ * Rotation 90: - +
+ * Rotation 180: - -
+ * Rotation 270: + -
+
+ Rotation 0 and 180 can be handled by normal SIMD code, so we don't need
+ to care about them here. */
+ if (op == MINUS_PLUS)
+ ifn = IFN_COMPLEX_ADD_ROT90;
+ else if (op == PLUS_MINUS)
+ ifn = IFN_COMPLEX_ADD_ROT270;
+ else
+ return ifn;
+
+ /* verify that there is a permute, otherwise this isn't a pattern we
+ we support. */
+ gcc_assert (ops->length () == 2);
+
+ vec<slp_tree> children = SLP_TREE_CHILDREN ((*ops)[0]);
+
+ /* First node must be unpermuted. */
+ if (linear_loads_p (perm_cache, children[0]).first != PERM_EVENODD)
+ return IFN_LAST;
+
+ /* Second node must be permuted. */
+ if (linear_loads_p (perm_cache, children[1]).first != PERM_ODDEVEN)
+ return IFN_LAST;
+
+ return ifn;
+}
+
+/* Attempt to recognize a complex add pattern. */
+
+vect_pattern*
+complex_add_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache,
+ slp_tree *node)
+{
+ auto_vec<slp_tree> ops;
+ complex_operation_t op
+ = vect_detect_pair_op (*node, true, &ops);
+ internal_fn ifn = complex_add_pattern::matches (op, perm_cache, &ops);
+ if (!vect_pattern_validate_optab (ifn, *node))
+ return NULL;
+
+ return new complex_add_pattern (node, &ops, ifn);
+}
+
+/*******************************************************************************
+ * Pattern matching definitions
+ ******************************************************************************/
+
+#define SLP_PATTERN(x) &x::recognize
+vect_pattern_decl_t slp_patterns[]
+{
+ /* For least amount of back-tracking and more efficient matching
+ order patterns from the largest to the smallest. Especially if they
+ overlap in what they can detect. */
+
+ SLP_PATTERN (complex_add_pattern),
+};
+#undef SLP_PATTERN
+
+/* Set the number of SLP pattern matchers available. */
+size_t num__slp_patterns = sizeof(slp_patterns)/sizeof(vect_pattern_decl_t);
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 1c95ed3319bd0d2ba743e94fa1468f607b5298a1..e277fa08662d115c82d428216bc09a55d3a5087e 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -133,7 +133,7 @@ _slp_tree::~_slp_tree ()
/* Recursively free the memory allocated for the SLP tree rooted at NODE. */
-static void
+void
vect_free_slp_tree (slp_tree node)
{
int i;
@@ -177,17 +177,26 @@ vect_free_slp_instance (slp_instance instance)
/* Create an SLP node for SCALAR_STMTS. */
slp_tree
+vect_create_new_slp_node (unsigned nops, tree_code code)
+{
+ slp_tree node = new _slp_tree;
+ SLP_TREE_SCALAR_STMTS (node) = vNULL;
+ SLP_TREE_CHILDREN (node).create (nops);
+ SLP_TREE_DEF_TYPE (node) = vect_internal_def;
+ SLP_TREE_CODE (node) = code;
+ return node;
+}
+/* Create an SLP node for SCALAR_STMTS. */
+
+static slp_tree
vect_create_new_slp_node (slp_tree node,
vec<stmt_vec_info> scalar_stmts, unsigned nops)
{
SLP_TREE_SCALAR_STMTS (node) = scalar_stmts;
SLP_TREE_CHILDREN (node).create (nops);
SLP_TREE_DEF_TYPE (node) = vect_internal_def;
- if (scalar_stmts.exists ())
- {
- SLP_TREE_REPRESENTATIVE (node) = scalar_stmts[0];
- SLP_TREE_LANES (node) = scalar_stmts.length ();
- }
+ SLP_TREE_REPRESENTATIVE (node) = scalar_stmts[0];
+ SLP_TREE_LANES (node) = scalar_stmts.length ();
return node;
}
@@ -239,7 +248,7 @@ typedef struct _slp_oprnd_info
/* Allocate operands info for NOPS operands, and GROUP_SIZE def-stmts for each
operand. */
-static vec<slp_oprnd_info>
+static vec<slp_oprnd_info>
vect_create_oprnd_info (int nops, int group_size)
{
int i;
@@ -1127,7 +1136,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
{
if (dump_enabled_p ())
{
- dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
"Build SLP failed: different operation "
"in stmt %G", stmt);
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -2203,6 +2212,84 @@ calculate_unrolling_factor (poly_uint64 nunits, unsigned int group_size)
return exact_div (common_multiple (nunits, group_size), group_size);
}
+/* Helper function of vect_match_slp_patterns.
+
+ Attempts to match patterns against the slp tree rooted in REF_NODE using
+ VINFO. Patterns are matched in post-order traversal.
+
+ If matching is successful the value in REF_NODE is updated and returned, if
+ not then it is returned unchanged. */
+
+static bool
+vect_match_slp_patterns_2 (slp_tree *ref_node, vec_info *vinfo,
+ slp_tree_to_load_perm_map_t *perm_cache,
+ hash_set<slp_tree> *visited)
+{
+ unsigned i;
+ slp_tree node = *ref_node;
+ bool found_p = false;
+ if (!node || visited->add (node))
+ return false;
+
+ slp_tree child;
+ FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
+ found_p |= vect_match_slp_patterns_2 (&SLP_TREE_CHILDREN (node)[i],
+ vinfo, perm_cache, visited);
+
+ for (unsigned x = 0; x < num__slp_patterns; x++)
+ {
+ vect_pattern *pattern = slp_patterns[x] (perm_cache, ref_node);
+ if (pattern)
+ {
+ pattern->build (vinfo);
+ delete pattern;
+ found_p = true;
+ }
+ }
+
+ return found_p;
+}
+
+/* Applies pattern matching to the given SLP tree rooted in REF_NODE using
+ vec_info VINFO.
+
+ The modified tree is returned. Patterns are tried in order and multiple
+ patterns may match. */
+
+static bool
+vect_match_slp_patterns (slp_instance instance, vec_info *vinfo,
+ hash_set<slp_tree> *visited,
+ slp_tree_to_load_perm_map_t *perm_cache,
+ scalar_stmts_to_slp_tree_map_t * /* bst_map */)
+{
+ DUMP_VECT_SCOPE ("vect_match_slp_patterns");
+ slp_tree *ref_node = &SLP_INSTANCE_TREE (instance);
+
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_NOTE, vect_location,
+ "Analyzing SLP tree %p for patterns\n",
+ SLP_INSTANCE_TREE (instance));
+
+ bool found_p
+ = vect_match_slp_patterns_2 (ref_node, vinfo, perm_cache, visited);
+
+ if (found_p)
+ {
+ if (dump_enabled_p ())
+ {
+ dump_printf_loc (MSG_NOTE, vect_location,
+ "Pattern matched SLP tree\n");
+ vect_print_slp_graph (MSG_NOTE, vect_location, *ref_node);
+ }
+ }
+
+ return found_p;
+}
+
+/* Analyze an SLP instance starting from a group of grouped stores. Call
+ vect_build_slp_tree to build a tree of packed stmts if possible.
+ Return FALSE if it's impossible to SLP any stmt in the loop. */
+
static bool
vect_analyze_slp_instance (vec_info *vinfo,
scalar_stmts_to_slp_tree_map_t *bst_map,
@@ -2568,6 +2655,7 @@ vect_analyze_slp (vec_info *vinfo, unsigned max_tree_size)
{
unsigned int i;
stmt_vec_info first_element;
+ slp_instance instance;
DUMP_VECT_SCOPE ("vect_analyze_slp");
@@ -2627,6 +2715,13 @@ vect_analyze_slp (vec_info *vinfo, unsigned max_tree_size)
slp_inst_kind_reduc_group, max_tree_size);
}
+ hash_set<slp_tree> visited_patterns;
+ slp_tree_to_load_perm_map_t perm_cache;
+ /* See if any patterns can be found in the SLP tree. */
+ FOR_EACH_VEC_ELT (LOOP_VINFO_SLP_INSTANCES (vinfo), i, instance)
+ vect_match_slp_patterns (instance, vinfo, &visited_patterns, &perm_cache,
+ bst_map);
+
/* The map keeps a reference on SLP nodes built, release that. */
for (scalar_stmts_to_slp_tree_map_t::iterator it = bst_map->begin ();
it != bst_map->end (); ++it)
@@ -3952,7 +4047,7 @@ vect_bb_partition_graph (bb_vec_info bb_vinfo)
and return it. Do not account defs that are marked in LIFE and
update LIFE according to uses of NODE. */
-static void
+static void
vect_bb_slp_scalar_cost (vec_info *vinfo,
slp_tree node, vec<bool, va_heap> *life,
stmt_vector_for_cost *cost_vec,
@@ -3963,7 +4058,7 @@ vect_bb_slp_scalar_cost (vec_info *vinfo,
slp_tree child;
if (visited.add (node))
- return;
+ return;
FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt_info)
{
@@ -4394,7 +4489,7 @@ vect_slp_analyze_bb_1 (bb_vec_info bb_vinfo, int n_stmts, bool &fatal,
{
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
"Failed to SLP the basic block.\n");
- dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
"not vectorized: failed to find SLP opportunities "
"in basic block.\n");
}
@@ -5276,7 +5371,7 @@ vect_transform_slp_perm_load (vec_info *vinfo,
if (!analyze_only)
{
tree mask_vec = NULL_TREE;
-
+
if (! noop_p)
mask_vec = vect_gen_perm_mask_checked (vectype, indices);
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index d130ba67c5ed55e4d019f7a19a2e8ba05f25cea0..66e6b501d52347258ed20e6f40f88fcfae71bc4c 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -26,6 +26,7 @@ typedef class _stmt_vec_info *stmt_vec_info;
#include "tree-data-ref.h"
#include "tree-hash-traits.h"
#include "target.h"
+#include "internal-fn.h"
/* Used for naming of new temporaries. */
@@ -2008,7 +2009,8 @@ extern void duplicate_and_interleave (vec_info *, gimple_seq *, tree,
vec<tree>, unsigned int, vec<tree> &);
extern int vect_get_place_in_interleaving_chain (stmt_vec_info, stmt_vec_info);
extern bool vect_update_shared_vectype (stmt_vec_info, tree);
-extern slp_tree vect_create_new_slp_node (vec<stmt_vec_info>, unsigned);
+extern slp_tree vect_create_new_slp_node (unsigned, tree_code);
+extern void vect_free_slp_tree (slp_tree);
/* In tree-vect-patterns.c. */
extern void
@@ -2025,4 +2027,84 @@ void vect_free_loop_info_assumptions (class loop *);
gimple *vect_loop_vectorized_call (class loop *, gcond **cond = NULL);
bool vect_stmt_dominates_stmt_p (gimple *, gimple *);
+/* SLP Pattern matcher types, tree-vect-slp-patterns.c. */
+
+/* Forward declaration of possible two operands operation that can be matched
+ by the complex numbers pattern matchers. */
+enum _complex_operation : unsigned;
+
+/* All possible load permute values that could result from the partial data-flow
+ analysis. */
+typedef enum _complex_perm_kinds {
+ PERM_UNKNOWN,
+ PERM_EVENODD,
+ PERM_ODDEVEN,
+ PERM_ODDODD,
+ PERM_EVENEVEN,
+ /* Can be combined with any other PERM values. */
+ PERM_TOP
+} complex_perm_kinds_t;
+
+/* A pair with a load permute and a corresponding complex_perm_kind which gives
+ information about the load it represents. */
+typedef std::pair<complex_perm_kinds_t, load_permutation_t>
+ complex_load_perm_t;
+
+/* Cache from nodes to the load permutation they represent. */
+typedef hash_map <slp_tree, complex_load_perm_t>
+ slp_tree_to_load_perm_map_t;
+
+/* Vector pattern matcher base class. All SLP pattern matchers must inherit
+ from this type. */
+
+class vect_pattern
+{
+ protected:
+ /* The number of arguments that the IFN requires. */
+ unsigned m_num_args;
+
+ /* The internal function that will be used when a pattern is created. */
+ internal_fn m_ifn;
+
+ /* The current node being inspected. */
+ slp_tree *m_node;
+
+ /* The list of operands to be the children for the node produced when the
+ internal function is created. */
+ vec<slp_tree> m_ops;
+
+ /* Default constructor where NODE is the root of the tree to inspect. */
+ vect_pattern (slp_tree *node, vec<slp_tree> *m_ops, internal_fn ifn)
+ {
+ this->m_ifn = ifn;
+ this->m_node = node;
+ this->m_ops.create (0);
+ this->m_ops.safe_splice (*m_ops);
+ }
+
+ public:
+
+ /* Create a new instance of the pattern matcher class of the given type. */
+ static vect_pattern* recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
+
+ /* Build the pattern from the data collected so far. */
+ virtual void build (vec_info *) = 0;
+
+ /* Default destructor. */
+ virtual ~vect_pattern ()
+ {
+ this->m_ops.release ();
+ }
+};
+
+/* Function pointer to create a new pattern matcher from a generic type. */
+typedef vect_pattern* (*vect_pattern_decl_t) (slp_tree_to_load_perm_map_t *,
+ slp_tree *);
+
+/* List of supported pattern matchers. */
+extern vect_pattern_decl_t slp_patterns[];
+
+/* Number of supported pattern matchers. */
+extern size_t num__slp_patterns;
+
#endif /* GCC_TREE_VECTORIZER_H */
next prev parent reply other threads:[~2020-12-10 16:59 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-23 11:26 Tamar Christina
2020-11-23 15:50 ` Richard Biener
2020-11-23 17:05 ` Tamar Christina
2020-11-24 9:30 ` Richard Biener
2020-11-24 10:53 ` Richard Biener
2020-11-24 11:19 ` Tamar Christina
2020-11-24 12:24 ` Richard Biener
2020-11-24 13:03 ` Tamar Christina
2020-11-24 14:14 ` Richard Biener
2020-11-24 14:36 ` Richard Biener
2020-11-26 17:48 ` Tamar Christina
2020-11-27 10:30 ` Richard Biener
2020-12-03 1:02 ` Tamar Christina
2020-12-03 13:02 ` Richard Biener
2020-12-10 16:59 ` Tamar Christina [this message]
2020-11-24 10:58 ` Tamar Christina
2020-11-24 11:37 ` Richard Biener
2020-11-24 12:54 ` Hongtao Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201210165901.GA22156@arm.com \
--to=tamar.christina@arm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=hongtao.liu@intel.com \
--cc=nd@arm.com \
--cc=ook@ucw.cz \
--cc=rguenther@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).