public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [1/3 PATCH]middle-end vect: Simplify and extend the complex numbers validation routines.
@ 2021-12-17 15:42 Tamar Christina
  2021-12-17 15:42 ` [2/3 PATCH]AArch64 use canonical ordering for complex mul, fma and fms Tamar Christina
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Tamar Christina @ 2021-12-17 15:42 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, rguenther

[-- Attachment #1: Type: text/plain, Size: 45553 bytes --]

Hi All,

This patch boosts the analysis for complex mul,fma and fms in order to ensure
that it doesn't create an incorrect output.

Essentially it adds an extra verification to check that the two nodes it's going
to combine do the same operations on compatible values.  The reason it needs to
do this is that if one computation differs from the other then with the current
implementation we have no way to deal with it since we have to remove the
permute.

When we can keep the permute around we can probably handle these by unrolling.

While implementing this since I have to do the traversal anyway I took advantage
of it by simplifying the code a bit.  Previously we would determine whether
something is a conjugate and then try to figure out which conjugate it is and
then try to see if the permutes match what we expect.

Now the code that does the traversal will detect this in one go and return to us
whether the operation is something that can be combined and whether a conjugate
is present.

Secondly because it does this I can now simplify the checking code itself to
essentially just try to apply fixed patterns to each operation.

The patterns represent the order operations should appear in. For instance a
complex MUL operation combines :

  Left 1 + Right 1
  Left 2 + Right 2

with a permute on the nodes consisting of:

  { Even, Even } + { Odd, Odd  }
  { Even, Odd  } + { Odd, Even }

By abstracting over these patterns the checking code becomes quite simple.

As part of this I was checking the order of the operands which was left in
"slp" order. as in, the same order they showed up in during SLP, which means
that the accumulator is first.  However it looks like I didn't document this
and the x86 optab was implemented assuming the same order as FMA, i.e. that
the accumulator is last.

I have this changed the order to match that of FMA and FMS which corrects the
x86 codegen and will update the Arm targets.  This has now also been
documented.

Bootstrapped Regtested on aarch64-none-linux-gnu,
x86_64-pc-linux-gnu and no regressions.

Ok for master? and backport to GCC 11 after some stew?

Thanks,
Tamar

gcc/ChangeLog:

	PR tree-optimization/102819
	PR tree-optimization/103169
	* doc/md.texi: Update docs for cfms, cfma.
	* tree-data-ref.h (same_data_refs): Accept optional offset.
	* tree-vect-slp-patterns.c (is_linear_load_p): Fix issue with repeating
	patterns.
	(vect_normalize_conj_loc): Remove.
	(is_eq_or_top): Change to take two nodes.
	(enum _conj_status, compatible_complex_nodes_p,
	vect_validate_multiplication): New.
	(class complex_add_pattern, complex_add_pattern::matches,
	complex_add_pattern::recognize, class complex_mul_pattern,
	complex_mul_pattern::recognize, class complex_fms_pattern,
	complex_fms_pattern::recognize, class complex_operations_pattern,
	complex_operations_pattern::recognize, addsub_pattern::recognize): Pass
	new cache.
	(complex_fms_pattern::matches, complex_mul_pattern::matches): Pass new
	cache and use new validation code.
	* tree-vect-slp.c (vect_match_slp_patterns_2, vect_match_slp_patterns,
	vect_analyze_slp): Pass along cache.
	(compatible_calls_p): Expose.
	* tree-vectorizer.h (compatible_calls_p, slp_node_hash,
	slp_compat_nodes_map_t): New.
	(class vect_pattern): Update signatures include new cache.

gcc/testsuite/ChangeLog:

	PR tree-optimization/102819
	PR tree-optimization/103169
	* g++.dg/vect/pr99149.cc: xfail for now.
	* gcc.dg/vect/complex/pr102819-1.c: New test.
	* gcc.dg/vect/complex/pr102819-2.c: New test.
	* gcc.dg/vect/complex/pr102819-3.c: New test.
	* gcc.dg/vect/complex/pr102819-4.c: New test.
	* gcc.dg/vect/complex/pr102819-5.c: New test.
	* gcc.dg/vect/complex/pr102819-6.c: New test.
	* gcc.dg/vect/complex/pr102819-7.c: New test.
	* gcc.dg/vect/complex/pr102819-8.c: New test.
	* gcc.dg/vect/complex/pr102819-9.c: New test.
	* gcc.dg/vect/complex/pr103169.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 9ec051e94e10cca9eec2773e1b8c01b74b6ea4db..60dc5b3ea6087c2824ad1467bc66e9cfebe9dcfc 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -6325,12 +6325,12 @@ Perform a vector multiply and accumulate that is semantically the same as
 a multiply and accumulate of complex numbers.
 
 @smallexample
-  complex TYPE c[N];
-  complex TYPE a[N];
-  complex TYPE b[N];
+  complex TYPE op0[N];
+  complex TYPE op1[N];
+  complex TYPE op2[N];
   for (int i = 0; i < N; i += 1)
     @{
-      c[i] += a[i] * b[i];
+      op2[i] += op1[i] * op2[i];
     @}
 @end smallexample
 
@@ -6348,12 +6348,12 @@ the same as a multiply and accumulate of complex numbers where the second
 multiply arguments is conjugated.
 
 @smallexample
-  complex TYPE c[N];
-  complex TYPE a[N];
-  complex TYPE b[N];
+  complex TYPE op0[N];
+  complex TYPE op1[N];
+  complex TYPE op2[N];
   for (int i = 0; i < N; i += 1)
     @{
-      c[i] += a[i] * conj (b[i]);
+      op2[i] += op0[i] * conj (op1[i]);
     @}
 @end smallexample
 
@@ -6370,12 +6370,12 @@ Perform a vector multiply and subtract that is semantically the same as
 a multiply and subtract of complex numbers.
 
 @smallexample
-  complex TYPE c[N];
-  complex TYPE a[N];
-  complex TYPE b[N];
+  complex TYPE op0[N];
+  complex TYPE op1[N];
+  complex TYPE op2[N];
   for (int i = 0; i < N; i += 1)
     @{
-      c[i] -= a[i] * b[i];
+      op2[i] -= op0[i] * op1[i];
     @}
 @end smallexample
 
@@ -6393,12 +6393,12 @@ the same as a multiply and subtract of complex numbers where the second
 multiply arguments is conjugated.
 
 @smallexample
-  complex TYPE c[N];
-  complex TYPE a[N];
-  complex TYPE b[N];
+  complex TYPE op0[N];
+  complex TYPE op1[N];
+  complex TYPE op2[N];
   for (int i = 0; i < N; i += 1)
     @{
-      c[i] -= a[i] * conj (b[i]);
+      op2[i] -= op0[i] * conj (op1[i]);
     @}
 @end smallexample
 
@@ -6415,12 +6415,12 @@ Perform a vector multiply that is semantically the same as multiply of
 complex numbers.
 
 @smallexample
-  complex TYPE c[N];
-  complex TYPE a[N];
-  complex TYPE b[N];
+  complex TYPE op0[N];
+  complex TYPE op1[N];
+  complex TYPE op2[N];
   for (int i = 0; i < N; i += 1)
     @{
-      c[i] = a[i] * b[i];
+      op2[i] = op0[i] * op1[i];
     @}
 @end smallexample
 
@@ -6437,12 +6437,12 @@ Perform a vector multiply by conjugate that is semantically the same as a
 multiply of complex numbers where the second multiply arguments is conjugated.
 
 @smallexample
-  complex TYPE c[N];
-  complex TYPE a[N];
-  complex TYPE b[N];
+  complex TYPE op0[N];
+  complex TYPE op1[N];
+  complex TYPE op2[N];
   for (int i = 0; i < N; i += 1)
     @{
-      c[i] = a[i] * conj (b[i]);
+      op2[i] = op0[i] * conj (op1[i]);
     @}
 @end smallexample
 
diff --git a/gcc/testsuite/g++.dg/vect/pr99149.cc b/gcc/testsuite/g++.dg/vect/pr99149.cc
index e6e0594a336fa053ffba64a12e2de43a4e373f49..bb9f5fa89f12b184368bf5488d6e9432c2166463 100755
--- a/gcc/testsuite/g++.dg/vect/pr99149.cc
+++ b/gcc/testsuite/g++.dg/vect/pr99149.cc
@@ -24,4 +24,4 @@ public:
 } n;
 main() { n.j(); }
 
-/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_MUL" 1 "slp2" } } */
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_MUL" 1 "slp2" { xfail { vect_float } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-1.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..46b9a55f05279d732fa1418e02f779cf693ede07
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-1.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+
+float f[12][100];
+
+void bad1(float v1, float v2)
+{
+  for (int r = 0; r < 100; r += 4)
+    {
+      int i = r + 1;
+      f[0][r] = f[1][r] * (f[2][r] + v2) - f[1][i] * (f[2][i] + v1);
+      f[0][i] = f[1][r] * (f[2][i] + v1) + f[1][i] * (f[2][r] + v2);
+      f[0][r+2] = f[1][r+2] * (f[2][r+2] + v2) - f[1][i+2] * (f[2][i+2] + v1);
+      f[0][i+2] = f[1][r+2] * (f[2][i+2] + v1) + f[1][i+2] * (f[2][r+2] + v2);
+      //                  ^^^^^^^             ^^^^^^^
+    }
+}
+
+/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-2.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..ffe646efe57f7ad07541b0fb96601596f46dc5f8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-2.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+
+float f[12][100];
+
+void bad1(float v1, float v2)
+{
+  for (int r = 0; r < 100; r += 2)
+    {
+      int i = r + 1;
+      f[0][r] = f[1][r] * (f[2][r] + v1) - f[1][i] * (f[2][i] + v2);
+      f[0][i] = f[1][r] * (f[2][i] + v1) + f[1][i] * (f[2][r] + v2);
+    }
+}
+
+/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-3.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-3.c
new file mode 100644
index 0000000000000000000000000000000000000000..5f98aa204d8b11b0cb433f8965dbb72cf8940de1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-3.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+
+float f[12][100];
+
+void good1(float v1, float v2)
+{
+  for (int r = 0; r < 100; r += 2)
+    {
+      int i = r + 1;
+      f[0][r] = f[1][r] * (f[2][r] + v2) - f[1][i] * (f[2][i] + v1);
+      f[0][i] = f[1][r] * (f[2][i] + v1) + f[1][i] * (f[2][r] + v2);
+    }
+}
+
+/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-4.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-4.c
new file mode 100644
index 0000000000000000000000000000000000000000..882851789c5085e734000609114be480d3b08bd0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-4.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+
+float f[12][100];
+
+void good1()
+{
+  for (int r = 0; r < 100; r += 2)
+    {
+      int i = r + 1;
+      f[0][r] = f[1][r] * f[2][r] - f[1][i] * f[2][i];
+      f[0][i] = f[1][r] * f[2][i] + f[1][i] * f[2][r];
+    }
+}
+
+/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-5.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-5.c
new file mode 100644
index 0000000000000000000000000000000000000000..6a2d549d65f3f27d407fb0bd469473e6a5c333ae
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-5.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+
+float f[12][100];
+
+void good2()
+{
+  for (int r = 0; r < 100; r += 2)
+    {
+      int i = r + 1;
+      f[0][r] = f[1][r] * (f[2][r] + 1) - f[1][i] * (f[2][i] + 1);
+      f[0][i] = f[1][r] * (f[2][i] + 1) + f[1][i] * (f[2][r] + 1);
+    }
+}
+
+/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-6.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-6.c
new file mode 100644
index 0000000000000000000000000000000000000000..71e66dbe3b29eec1fffb8df9b216022fdc0af54e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-6.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+
+float f[12][100];
+
+void bad1()
+{
+  for (int r = 0; r < 100; r += 2)
+    {
+      int i = r + 1;
+      f[0][r] = f[1][r] * f[2][r] - f[1][i] * f[3][i];
+      f[0][i] = f[1][r] * f[2][i] + f[1][i] * f[3][r];
+      //                  ^^^^^^^             ^^^^^^^
+    }
+}
+
+/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-7.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-7.c
new file mode 100644
index 0000000000000000000000000000000000000000..536672f3c8bb474ad5fa4bb61b3a36b555acf3cf
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-7.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+
+float f[12][100];
+
+void bad2()
+{
+  for (int r = 0; r < 100; r += 2)
+    {
+      int i = r + 1;
+      f[0][r] = f[1][r] * (f[2][r] + 1) - f[1][i] * f[2][i];
+      f[0][i] = f[1][r] * (f[2][i] + 1) + f[1][i] * f[2][r];
+      //                          ^^^^
+    }
+}
+
+/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-8.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-8.c
new file mode 100644
index 0000000000000000000000000000000000000000..07b48148688b7d530e5891d023d558b58a485c23
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-8.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+
+float f[12][100];
+
+void bad3()
+{
+  for (int r = 0; r < 100; r += 2)
+    {
+      int i = r + 1;
+      f[0][r] = f[1][r] * f[2][r] - f[1][r] * f[2][i];
+      f[0][i] = f[1][r] * f[2][i] + f[1][i] * f[2][r];
+      //                            ^^^^^^^
+    }
+}
+
+/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-9.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-9.c
new file mode 100644
index 0000000000000000000000000000000000000000..7655852434b21b381fe7ee316e8caf3d485b8ee1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-9.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+
+#include <stdio.h>
+#include <complex.h>
+
+#define N 200
+#define TYPE float
+#define TYPE2 float
+
+void g (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
+{
+  for (int i=0; i < N; i++)
+    {
+      c[i] -=  a[i] * b[0];
+    }
+}
+
+/* The pattern overlaps with COMPLEX_ADD so we need to support consuming ADDs in COMPLEX_FMS.  */
+
+/* { dg-final { scan-tree-dump "Found COMPLEX_FMS" "vect" { xfail { vect_float } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr103169.c b/gcc/testsuite/gcc.dg/vect/complex/pr103169.c
new file mode 100644
index 0000000000000000000000000000000000000000..1bfabbd85a0eedfb4156a82574324126e9083fc5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/pr103169.c
@@ -0,0 +1,12 @@
+/* { dg-do compile { target { vect_double } } } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+/* { dg-additional-options "-O2 -fvect-cost-model=unlimited" } */
+
+_Complex double b_0, c_0;
+
+void
+mul270snd (void)
+{
+  c_0 = b_0 * 1.0iF * 1.0iF;
+}
+
diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h
index 74f579c9f3f23bac25d21546068c2ab43209aa2b..8ad5fa521279b20fa5e63eecf442d5dc5c16e7ee 100644
--- a/gcc/tree-data-ref.h
+++ b/gcc/tree-data-ref.h
@@ -600,10 +600,11 @@ same_data_refs_base_objects (data_reference_p a, data_reference_p b)
 }
 
 /* Return true when the data references A and B are accessing the same
-   memory object with the same access functions.  */
+   memory object with the same access functions.  Optionally skip the
+   last OFFSET dimensions in the data reference.  */
 
 static inline bool
-same_data_refs (data_reference_p a, data_reference_p b)
+same_data_refs (data_reference_p a, data_reference_p b, int offset = 0)
 {
   unsigned int i;
 
@@ -614,7 +615,7 @@ same_data_refs (data_reference_p a, data_reference_p b)
   if (!same_data_refs_base_objects (a, b))
     return false;
 
-  for (i = 0; i < DR_NUM_DIMENSIONS (a); i++)
+  for (i = offset; i < DR_NUM_DIMENSIONS (a); i++)
     if (!eq_evolutions_p (DR_ACCESS_FN (a, i), DR_ACCESS_FN (b, i)))
       return false;
 
diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
index 0350441fad9690cd5d04337171ca3470a064a571..f8da4153632a700680091f37305a5d3078fbb0c5 100644
--- a/gcc/tree-vect-slp-patterns.c
+++ b/gcc/tree-vect-slp-patterns.c
@@ -149,12 +149,13 @@ is_linear_load_p (load_permutation_t loads)
   int valid_patterns = 4;
   FOR_EACH_VEC_ELT (loads, i, load)
     {
-      if (candidates[0] != PERM_UNKNOWN && load != 1)
+      unsigned adj_load = load % 2;
+      if (candidates[0] != PERM_UNKNOWN && adj_load != 1)
 	{
 	  candidates[0] = PERM_UNKNOWN;
 	  valid_patterns--;
 	}
-      if (candidates[1] != PERM_UNKNOWN && load != 0)
+      if (candidates[1] != PERM_UNKNOWN && adj_load != 0)
 	{
 	  candidates[1] = PERM_UNKNOWN;
 	  valid_patterns--;
@@ -596,11 +597,12 @@ class complex_add_pattern : public complex_pattern
   public:
     void build (vec_info *);
     static internal_fn
-    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, slp_tree *,
-	     vec<slp_tree> *);
+    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
+	     slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *);
 
     static vect_pattern*
-    recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
+    recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t *,
+	       slp_tree *);
 
     static vect_pattern*
     mkInstance (slp_tree *node, vec<slp_tree> *m_ops, internal_fn ifn)
@@ -647,6 +649,7 @@ complex_add_pattern::build (vec_info *vinfo)
 internal_fn
 complex_add_pattern::matches (complex_operation_t op,
 			      slp_tree_to_load_perm_map_t *perm_cache,
+			      slp_compat_nodes_map_t * /* compat_cache */,
 			      slp_tree *node, vec<slp_tree> *ops)
 {
   internal_fn ifn = IFN_LAST;
@@ -692,13 +695,14 @@ complex_add_pattern::matches (complex_operation_t op,
 
 vect_pattern*
 complex_add_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache,
+				slp_compat_nodes_map_t *compat_cache,
 				slp_tree *node)
 {
   auto_vec<slp_tree> ops;
   complex_operation_t op
     = vect_detect_pair_op (*node, true, &ops);
   internal_fn ifn
-    = complex_add_pattern::matches (op, perm_cache, node, &ops);
+    = complex_add_pattern::matches (op, perm_cache, compat_cache, node, &ops);
   if (ifn == IFN_LAST)
     return NULL;
 
@@ -709,147 +713,214 @@ complex_add_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache,
  * complex_mul_pattern
  ******************************************************************************/
 
-/* Check to see if either of the trees in ARGS are a NEGATE_EXPR.  If the first
-   child (args[0]) is a NEGATE_EXPR then NEG_FIRST_P is set to TRUE.
-
-   If a negate is found then the values in ARGS are reordered such that the
-   negate node is always the second one and the entry is replaced by the child
-   of the negate node.  */
+/* Helper function to check if PERM is KIND or PERM_TOP.  */
 
 static inline bool
-vect_normalize_conj_loc (vec<slp_tree> &args, bool *neg_first_p = NULL)
+is_eq_or_top (slp_tree_to_load_perm_map_t *perm_cache,
+	      slp_tree op1, complex_perm_kinds_t kind1,
+	      slp_tree op2, complex_perm_kinds_t kind2)
 {
-  gcc_assert (args.length () == 2);
-  bool neg_found = false;
-
-  if (vect_match_expression_p (args[0], NEGATE_EXPR))
-    {
-      std::swap (args[0], args[1]);
-      neg_found = true;
-      if (neg_first_p)
-	*neg_first_p = true;
-    }
-  else if (vect_match_expression_p (args[1], NEGATE_EXPR))
-    {
-      neg_found = true;
-      if (neg_first_p)
-	*neg_first_p = false;
-    }
+  complex_perm_kinds_t perm1 = linear_loads_p (perm_cache, op1);
+  if (perm1 != kind1 && perm1 != PERM_TOP)
+    return false;
 
-  if (neg_found)
-    args[1] = SLP_TREE_CHILDREN (args[1])[0];
+  complex_perm_kinds_t perm2 = linear_loads_p (perm_cache, op2);
+  if (perm2 != kind2 && perm2 != PERM_TOP)
+    return false;
 
-  return neg_found;
+  return true;
 }
 
-/* Helper function to check if PERM is KIND or PERM_TOP.  */
+enum _conj_status { CONJ_NONE, CONJ_FST, CONJ_SND };
 
 static inline bool
-is_eq_or_top (complex_perm_kinds_t perm, complex_perm_kinds_t kind)
+compatible_complex_nodes_p (slp_compat_nodes_map_t *compat_cache,
+			    slp_tree a, int *pa, slp_tree b, int *pb)
 {
-  return perm == kind || perm == PERM_TOP;
-}
+  bool *tmp;
+  std::pair<slp_tree, slp_tree> key = std::make_pair(a, b);
+  if ((tmp = compat_cache->get (key)) != NULL)
+    return *tmp;
 
-/* Helper function that checks to see if LEFT_OP and RIGHT_OP are both MULT_EXPR
-   nodes but also that they represent an operation that is either a complex
-   multiplication or a complex multiplication by conjugated value.
+   compat_cache->put (key, false);
 
-   Of the negation is expected to be in the first half of the tree (As required
-   by an FMS pattern) then NEG_FIRST is true.  If the operation is a conjugate
-   operation then CONJ_FIRST_OPERAND is set to indicate whether the first or
-   second operand contains the conjugate operation.  */
+  if (SLP_TREE_CHILDREN (a).length () != SLP_TREE_CHILDREN (b).length ())
+    return false;
 
-static inline bool
-vect_validate_multiplication (slp_tree_to_load_perm_map_t *perm_cache,
-			      const vec<slp_tree> &left_op,
-			      const vec<slp_tree> &right_op,
-			     bool neg_first, bool *conj_first_operand,
-			     bool fms)
-{
-  /* The presence of a negation indicates that we have either a conjugate or a
-     rotation.  We need to distinguish which one.  */
-  *conj_first_operand = false;
-  complex_perm_kinds_t kind;
-
-  /* Complex conjugates have the negation on the imaginary part of the
-     number where rotations affect the real component.  So check if the
-     negation is on a dup of lane 1.  */
-  if (fms)
+  if (SLP_TREE_DEF_TYPE (a) != SLP_TREE_DEF_TYPE (b))
+    return false;
+
+  /* Only internal nodes can be loads, as such we can't check further if they
+     are externals.  */
+  if (SLP_TREE_DEF_TYPE (a) != vect_internal_def)
     {
-      /* Canonicalization for fms is not consistent. So have to test both
-	 variants to be sure.  This needs to be fixed in the mid-end so
-	 this part can be simpler.  */
-      kind = linear_loads_p (perm_cache, right_op[0]);
-      if (!((is_eq_or_top (linear_loads_p (perm_cache, right_op[0]), PERM_ODDODD)
-	   && is_eq_or_top (linear_loads_p (perm_cache, right_op[1]),
-			     PERM_ODDEVEN))
-	  || (kind == PERM_ODDEVEN
-	      && is_eq_or_top (linear_loads_p (perm_cache, right_op[1]),
-			     PERM_ODDODD))))
-	return false;
+      for (unsigned i = 0; i < SLP_TREE_SCALAR_OPS (a).length (); i++)
+	{
+	  tree op1 = SLP_TREE_SCALAR_OPS (a)[pa[i % 2]];
+	  tree op2 = SLP_TREE_SCALAR_OPS (b)[pb[i % 2]];
+	  if (!operand_equal_p (op1, op2, 0))
+	    return false;
+	}
+
+      compat_cache->put (key, true);
+      return true;
+    }
+
+  auto a_stmt = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE (a));
+  auto b_stmt = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE (b));
+
+  if (gimple_code (a_stmt) != gimple_code (b_stmt))
+    return false;
+
+  /* code, children, type, externals, loads, constants  */
+  if (gimple_num_args (a_stmt) != gimple_num_args (b_stmt))
+    return false;
+
+  /* At this point, a and b are known to be the same gimple operations.  */
+  if (is_gimple_call (a_stmt))
+    {
+	if (!compatible_calls_p (dyn_cast <gcall *> (a_stmt),
+				 dyn_cast <gcall *> (b_stmt)))
+	  return false;
     }
+  else if (!is_gimple_assign (a_stmt))
+    return false;
   else
     {
-      if (linear_loads_p (perm_cache, right_op[1]) != PERM_ODDODD
-	  && !is_eq_or_top (linear_loads_p (perm_cache, right_op[0]),
-			    PERM_ODDEVEN))
+      tree_code acode = gimple_assign_rhs_code (a_stmt);
+      tree_code bcode = gimple_assign_rhs_code (b_stmt);
+      if ((acode == REALPART_EXPR || acode == IMAGPART_EXPR)
+	  && (bcode == REALPART_EXPR || bcode == IMAGPART_EXPR))
+	return true;
+
+      if (acode != bcode)
 	return false;
     }
 
-  /* Deal with differences in indexes.  */
-  int index1 = fms ? 1 : 0;
-  int index2 = fms ? 0 : 1;
-
-  /* Check if the conjugate is on the second first or second operand.  The
-     order of the node with the conjugate value determines this, and the dup
-     node must be one of lane 0 of the same DR as the neg node.  */
-  kind = linear_loads_p (perm_cache, left_op[index1]);
-  if (kind == PERM_TOP)
+  if (!SLP_TREE_LOAD_PERMUTATION (a).exists ()
+      || !SLP_TREE_LOAD_PERMUTATION (b).exists ())
     {
-      if (linear_loads_p (perm_cache, left_op[index2]) == PERM_EVENODD)
-	return true;
+      for (unsigned i = 0; i < gimple_num_args (a_stmt); i++)
+	{
+	  tree t1 = gimple_arg (a_stmt, i);
+	  tree t2 = gimple_arg (b_stmt, i);
+	  if (TREE_CODE (t1) != TREE_CODE (t2))
+	    return false;
+
+	  /* If SSA name then we will need to inspect the children
+	     so we can punt here.  */
+	  if (TREE_CODE (t1) == SSA_NAME)
+	    continue;
+
+	  if (!operand_equal_p (t1, t2, 0))
+	    return false;
+	}
     }
-  else if (kind == PERM_EVENODD && !neg_first)
+  else
     {
-      if ((kind = linear_loads_p (perm_cache, left_op[index2])) != PERM_EVENEVEN)
+      auto dr1 = STMT_VINFO_DATA_REF (SLP_TREE_REPRESENTATIVE (a));
+      auto dr2 = STMT_VINFO_DATA_REF (SLP_TREE_REPRESENTATIVE (b));
+      /* Don't check the last dimension as that's checked by the lineary
+	 checks.  This check is also much stricter than what we need
+	 because it doesn't consider loading from adjacent elements
+	 in the same struct as loading from the same base object.
+	 But for now, I'll play it safe.  */
+      if (!same_data_refs (dr1, dr2, 1))
 	return false;
-      return true;
     }
-  else if (kind == PERM_EVENEVEN && neg_first)
+
+  for (unsigned i = 0; i < SLP_TREE_CHILDREN (a).length (); i++)
     {
-      if ((kind = linear_loads_p (perm_cache, left_op[index2])) != PERM_EVENODD)
+      if (!compatible_complex_nodes_p (compat_cache,
+				       SLP_TREE_CHILDREN (a)[i], pa,
+				       SLP_TREE_CHILDREN (b)[i], pb))
 	return false;
-
-      *conj_first_operand = true;
-      return true;
     }
-  else
-    return false;
-
-  if (kind != PERM_EVENEVEN)
-    return false;
 
+  compat_cache->put (key, true);
   return true;
 }
 
-/* Helper function to help distinguish between a conjugate and a rotation in a
-   complex multiplication.  The operations have similar shapes but the order of
-   the load permutes are different.  This function returns TRUE when the order
-   is consistent with a multiplication or multiplication by conjugated
-   operand but returns FALSE if it's a multiplication by rotated operand.  */
-
 static inline bool
 vect_validate_multiplication (slp_tree_to_load_perm_map_t *perm_cache,
-			      const vec<slp_tree> &op,
-			      complex_perm_kinds_t permKind)
+			      slp_compat_nodes_map_t *compat_cache,
+			      vec<slp_tree> &left_op,
+			      vec<slp_tree> &right_op,
+			      bool subtract,
+			      enum _conj_status *_status)
 {
-  /* The left node is the more common case, test it first.  */
-  if (!is_eq_or_top (linear_loads_p (perm_cache, op[0]), permKind))
+  auto_vec<slp_tree> ops;
+  enum _conj_status stats = CONJ_NONE;
+
+  /* The complex operations can occur in two layouts and two permute sequences
+     so declare them and re-use them.  */
+  int styles[][4] = { { 0, 2, 1, 3} /* {L1, R1} + {L2, R2}.  */
+		    , { 0, 3, 1, 2} /* {L1, R2} + {L2, R1}.  */
+		    };
+
+  /* Now for the corresponding permutes that go with these values.  */
+  complex_perm_kinds_t perms[][4]
+    = { { PERM_EVENEVEN, PERM_ODDODD, PERM_EVENODD, PERM_ODDEVEN }
+      , { PERM_EVENODD, PERM_ODDEVEN, PERM_EVENEVEN, PERM_ODDODD }
+      };
+
+  /* These permutes are used during comparisons of externals on which
+     we require strict equality.  */
+  int cq[][4][2]
+    = { { { 0, 0 }, { 1, 1 }, { 0, 1 }, { 1, 0 } }
+      , { { 0, 1 }, { 1, 0 }, { 0, 0 }, { 1, 1 } }
+      };
+
+  /* Default to style and perm 0, most operations use this one.  */
+  int style = 0;
+  int perm = subtract ? 1 : 0;
+
+  /* Check if we have a negate operation, if so absorb the node and continue
+     looking.  */
+  bool neg0 = vect_match_expression_p (right_op[0], NEGATE_EXPR);
+  bool neg1 = vect_match_expression_p (right_op[1], NEGATE_EXPR);
+
+  /* Determine which style we're looking at.  We only have different ones
+     whenever a conjugate is involved.  */
+  if (neg0 && neg1)
+    ;
+  else if (neg0)
     {
-      if (!is_eq_or_top (linear_loads_p (perm_cache, op[1]), permKind))
-	return false;
+      right_op[0] = SLP_TREE_CHILDREN (right_op[0])[0];
+      stats = CONJ_FST;
+      if (subtract)
+	perm = 0;
     }
-  return true;
+  else if (neg1)
+    {
+      right_op[1] = SLP_TREE_CHILDREN (right_op[1])[0];
+      stats = CONJ_SND;
+      perm = 1;
+    }
+
+  *_status = stats;
+
+  /* Flatten the inputs after we've remapped them.  */
+  ops.create (4);
+  ops.safe_splice (left_op);
+  ops.safe_splice (right_op);
+
+  /* Extract out the elements to check.  */
+  slp_tree op0 = ops[styles[style][0]];
+  slp_tree op1 = ops[styles[style][1]];
+  slp_tree op2 = ops[styles[style][2]];
+  slp_tree op3 = ops[styles[style][3]];
+
+  /* Do cheapest test first.  If failed no need to analyze further.  */
+  if (linear_loads_p (perm_cache, op0) != perms[perm][0]
+      || linear_loads_p (perm_cache, op1) != perms[perm][1]
+      || !is_eq_or_top (perm_cache, op2, perms[perm][2], op3, perms[perm][3]))
+    return false;
+
+  return compatible_complex_nodes_p (compat_cache, op0, cq[perm][0], op1,
+				     cq[perm][1])
+	 && compatible_complex_nodes_p (compat_cache, op2, cq[perm][2], op3,
+					cq[perm][3]);
 }
 
 /* This function combines two nodes containing only even and only odd lanes
@@ -908,11 +979,12 @@ class complex_mul_pattern : public complex_pattern
   public:
     void build (vec_info *);
     static internal_fn
-    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, slp_tree *,
-	     vec<slp_tree> *);
+    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
+	     slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *);
 
     static vect_pattern*
-    recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
+    recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t *,
+	       slp_tree *);
 
     static vect_pattern*
     mkInstance (slp_tree *node, vec<slp_tree> *m_ops, internal_fn ifn)
@@ -943,6 +1015,7 @@ class complex_mul_pattern : public complex_pattern
 internal_fn
 complex_mul_pattern::matches (complex_operation_t op,
 			      slp_tree_to_load_perm_map_t *perm_cache,
+			      slp_compat_nodes_map_t *compat_cache,
 			      slp_tree *node, vec<slp_tree> *ops)
 {
   internal_fn ifn = IFN_LAST;
@@ -990,17 +1063,13 @@ complex_mul_pattern::matches (complex_operation_t op,
       || linear_loads_p (perm_cache, left_op[1]) == PERM_ODDEVEN)
     return IFN_LAST;
 
-  bool neg_first = false;
-  bool conj_first_operand = false;
-  bool is_neg = vect_normalize_conj_loc (right_op, &neg_first);
+  enum _conj_status status;
+  if (!vect_validate_multiplication (perm_cache, compat_cache, left_op,
+				     right_op, false, &status))
+    return IFN_LAST;
 
-  if (!is_neg)
+  if (status == CONJ_NONE)
     {
-      /* A multiplication needs to multiply agains the real pair, otherwise
-	 the pattern matches that of FMS.   */
-      if (!vect_validate_multiplication (perm_cache, left_op, PERM_EVENEVEN)
-	  || vect_normalize_conj_loc (left_op))
-	return IFN_LAST;
       if (add0)
 	ifn = IFN_COMPLEX_FMA;
       else
@@ -1008,11 +1077,6 @@ complex_mul_pattern::matches (complex_operation_t op,
     }
   else
     {
-      if (!vect_validate_multiplication (perm_cache, left_op, right_op,
-					 neg_first, &conj_first_operand,
-					 false))
-	return IFN_LAST;
-
       if(add0)
 	ifn = IFN_COMPLEX_FMA_CONJ;
       else
@@ -1029,19 +1093,13 @@ complex_mul_pattern::matches (complex_operation_t op,
     ops->quick_push (add0);
 
   complex_perm_kinds_t kind = linear_loads_p (perm_cache, left_op[0]);
-  if (kind == PERM_EVENODD)
-    {
-      ops->quick_push (left_op[1]);
-      ops->quick_push (right_op[1]);
-      ops->quick_push (left_op[0]);
-    }
-  else if (kind == PERM_TOP)
+  if (kind == PERM_EVENODD || kind == PERM_TOP)
     {
       ops->quick_push (left_op[1]);
       ops->quick_push (right_op[1]);
       ops->quick_push (left_op[0]);
     }
-  else if (kind == PERM_EVENEVEN && !conj_first_operand)
+  else if (kind == PERM_EVENEVEN && status != CONJ_SND)
     {
       ops->quick_push (left_op[0]);
       ops->quick_push (right_op[0]);
@@ -1061,13 +1119,14 @@ complex_mul_pattern::matches (complex_operation_t op,
 
 vect_pattern*
 complex_mul_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache,
+				slp_compat_nodes_map_t *compat_cache,
 				slp_tree *node)
 {
   auto_vec<slp_tree> ops;
   complex_operation_t op
     = vect_detect_pair_op (*node, true, &ops);
   internal_fn ifn
-    = complex_mul_pattern::matches (op, perm_cache, node, &ops);
+    = complex_mul_pattern::matches (op, perm_cache, compat_cache, node, &ops);
   if (ifn == IFN_LAST)
     return NULL;
 
@@ -1097,8 +1156,8 @@ complex_mul_pattern::build (vec_info *vinfo)
 
 	/* First re-arrange the children.  */
 	SLP_TREE_CHILDREN (*this->m_node).reserve_exact (2);
-	SLP_TREE_CHILDREN (*this->m_node)[0] = this->m_ops[2];
-	SLP_TREE_CHILDREN (*this->m_node)[1] = newnode;
+	SLP_TREE_CHILDREN (*this->m_node)[0] = newnode;
+	SLP_TREE_CHILDREN (*this->m_node)[1] = this->m_ops[2];
 	break;
       }
     case IFN_COMPLEX_FMA:
@@ -1115,9 +1174,9 @@ complex_mul_pattern::build (vec_info *vinfo)
 
 	/* First re-arrange the children.  */
 	SLP_TREE_CHILDREN (*this->m_node).safe_grow (3);
-	SLP_TREE_CHILDREN (*this->m_node)[0] = this->m_ops[0];
+	SLP_TREE_CHILDREN (*this->m_node)[0] = newnode;
 	SLP_TREE_CHILDREN (*this->m_node)[1] = this->m_ops[3];
-	SLP_TREE_CHILDREN (*this->m_node)[2] = newnode;
+	SLP_TREE_CHILDREN (*this->m_node)[2] = this->m_ops[0];
 
 	/* Tell the builder to expect an extra argument.  */
 	this->m_num_args++;
@@ -1147,11 +1206,12 @@ class complex_fms_pattern : public complex_pattern
   public:
     void build (vec_info *);
     static internal_fn
-    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, slp_tree *,
-	     vec<slp_tree> *);
+    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
+	     slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *);
 
     static vect_pattern*
-    recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
+    recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t *,
+	       slp_tree *);
 
     static vect_pattern*
     mkInstance (slp_tree *node, vec<slp_tree> *m_ops, internal_fn ifn)
@@ -1182,6 +1242,7 @@ class complex_fms_pattern : public complex_pattern
 internal_fn
 complex_fms_pattern::matches (complex_operation_t op,
 			      slp_tree_to_load_perm_map_t *perm_cache,
+			      slp_compat_nodes_map_t *compat_cache,
 			      slp_tree * ref_node, vec<slp_tree> *ops)
 {
   internal_fn ifn = IFN_LAST;
@@ -1197,6 +1258,8 @@ complex_fms_pattern::matches (complex_operation_t op,
   if (!vect_match_expression_p (root, MINUS_EXPR))
     return IFN_LAST;
 
+  /* TODO: Support invariants here, with the new layout CADD now
+	   can match before we get a chance to try CFMS.  */
   auto nodes = SLP_TREE_CHILDREN (root);
   if (!vect_match_expression_p (nodes[1], MULT_EXPR)
       || vect_detect_pair_op (nodes[0]) != PLUS_MINUS)
@@ -1217,16 +1280,14 @@ complex_fms_pattern::matches (complex_operation_t op,
       || !vect_match_expression_p (l0node[1], MULT_EXPR))
     return IFN_LAST;
 
-  bool is_neg = vect_normalize_conj_loc (left_op);
-
-  bool conj_first_operand = false;
-  if (!vect_validate_multiplication (perm_cache, right_op, left_op, false,
-				     &conj_first_operand, true))
+  enum _conj_status status;
+  if (!vect_validate_multiplication (perm_cache, compat_cache, right_op,
+				     left_op, true, &status))
     return IFN_LAST;
 
-  if (!is_neg)
+  if (status == CONJ_NONE)
     ifn = IFN_COMPLEX_FMS;
-  else if (is_neg)
+  else
     ifn = IFN_COMPLEX_FMS_CONJ;
 
   if (!vect_pattern_validate_optab (ifn, *ref_node))
@@ -1243,26 +1304,12 @@ complex_fms_pattern::matches (complex_operation_t op,
       ops->quick_push (right_op[1]);
       ops->quick_push (left_op[1]);
     }
-  else if (kind == PERM_TOP)
-    {
-      ops->quick_push (l0node[0]);
-      ops->quick_push (right_op[1]);
-      ops->quick_push (right_op[0]);
-      ops->quick_push (left_op[0]);
-    }
-  else if (kind == PERM_EVENEVEN && !is_neg)
-    {
-      ops->quick_push (l0node[0]);
-      ops->quick_push (right_op[1]);
-      ops->quick_push (right_op[0]);
-      ops->quick_push (left_op[0]);
-    }
   else
     {
       ops->quick_push (l0node[0]);
       ops->quick_push (right_op[1]);
       ops->quick_push (right_op[0]);
-      ops->quick_push (left_op[1]);
+      ops->quick_push (left_op[0]);
     }
 
   return ifn;
@@ -1272,13 +1319,14 @@ complex_fms_pattern::matches (complex_operation_t op,
 
 vect_pattern*
 complex_fms_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache,
+				slp_compat_nodes_map_t *compat_cache,
 				slp_tree *node)
 {
   auto_vec<slp_tree> ops;
   complex_operation_t op
     = vect_detect_pair_op (*node, true, &ops);
   internal_fn ifn
-    = complex_fms_pattern::matches (op, perm_cache, node, &ops);
+    = complex_fms_pattern::matches (op, perm_cache, compat_cache, node, &ops);
   if (ifn == IFN_LAST)
     return NULL;
 
@@ -1305,9 +1353,24 @@ complex_fms_pattern::build (vec_info *vinfo)
   SLP_TREE_CHILDREN (*this->m_node).create (3);
 
   /* First re-arrange the children.  */
+  switch (this->m_ifn)
+  {
+    case IFN_COMPLEX_FMS:
+      {
+	SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[1]);
+	SLP_TREE_CHILDREN (*this->m_node).quick_push (newnode);
+	break;
+      }
+    case IFN_COMPLEX_FMS_CONJ:
+      {
+	SLP_TREE_CHILDREN (*this->m_node).quick_push (newnode);
+	SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[1]);
+	break;
+      }
+    default:
+      gcc_unreachable ();
+  }
   SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[0]);
-  SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[1]);
-  SLP_TREE_CHILDREN (*this->m_node).quick_push (newnode);
 
   /* And then rewrite the node itself.  */
   complex_pattern::build (vinfo);
@@ -1334,11 +1397,12 @@ class complex_operations_pattern : public complex_pattern
   public:
     void build (vec_info *);
     static internal_fn
-    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, slp_tree *,
-	     vec<slp_tree> *);
+    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
+	     slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *);
 
     static vect_pattern*
-    recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
+    recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t *,
+	       slp_tree *);
 };
 
 /* Dummy matches implementation for proxy object.  */
@@ -1347,6 +1411,7 @@ internal_fn
 complex_operations_pattern::
 matches (complex_operation_t /* op */,
 	 slp_tree_to_load_perm_map_t * /* perm_cache */,
+	 slp_compat_nodes_map_t * /* compat_cache */,
 	 slp_tree * /* ref_node */, vec<slp_tree> * /* ops */)
 {
   return IFN_LAST;
@@ -1356,6 +1421,7 @@ matches (complex_operation_t /* op */,
 
 vect_pattern*
 complex_operations_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache,
+				       slp_compat_nodes_map_t *ccache,
 				       slp_tree *node)
 {
   auto_vec<slp_tree> ops;
@@ -1363,15 +1429,15 @@ complex_operations_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache,
     = vect_detect_pair_op (*node, true, &ops);
   internal_fn ifn = IFN_LAST;
 
-  ifn  = complex_fms_pattern::matches (op, perm_cache, node, &ops);
+  ifn  = complex_fms_pattern::matches (op, perm_cache, ccache, node, &ops);
   if (ifn != IFN_LAST)
     return complex_fms_pattern::mkInstance (node, &ops, ifn);
 
-  ifn  = complex_mul_pattern::matches (op, perm_cache, node, &ops);
+  ifn  = complex_mul_pattern::matches (op, perm_cache, ccache, node, &ops);
   if (ifn != IFN_LAST)
     return complex_mul_pattern::mkInstance (node, &ops, ifn);
 
-  ifn  = complex_add_pattern::matches (op, perm_cache, node, &ops);
+  ifn  = complex_add_pattern::matches (op, perm_cache, ccache, node, &ops);
   if (ifn != IFN_LAST)
     return complex_add_pattern::mkInstance (node, &ops, ifn);
 
@@ -1398,11 +1464,13 @@ class addsub_pattern : public vect_pattern
     void build (vec_info *);
 
     static vect_pattern*
-    recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
+    recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t *,
+	       slp_tree *);
 };
 
 vect_pattern *
-addsub_pattern::recognize (slp_tree_to_load_perm_map_t *, slp_tree *node_)
+addsub_pattern::recognize (slp_tree_to_load_perm_map_t *,
+			   slp_compat_nodes_map_t *, slp_tree *node_)
 {
   slp_tree node = *node_;
   if (SLP_TREE_CODE (node) != VEC_PERM_EXPR
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index b912c3577df61a694d5bb9e22c5303fe6a48ab6e..cb577f8a612d583254e42bb06a6d7a0875de5e75 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -804,7 +804,7 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned char swap,
 /* Return true if call statements CALL1 and CALL2 are similar enough
    to be combined into the same SLP group.  */
 
-static bool
+bool
 compatible_calls_p (gcall *call1, gcall *call2)
 {
   unsigned int nargs = gimple_call_num_args (call1);
@@ -2907,6 +2907,7 @@ optimize_load_redistribution (scalar_stmts_to_slp_tree_map_t *bst_map,
 static bool
 vect_match_slp_patterns_2 (slp_tree *ref_node, vec_info *vinfo,
 			   slp_tree_to_load_perm_map_t *perm_cache,
+			   slp_compat_nodes_map_t *compat_cache,
 			   hash_set<slp_tree> *visited)
 {
   unsigned i;
@@ -2918,11 +2919,13 @@ vect_match_slp_patterns_2 (slp_tree *ref_node, vec_info *vinfo,
   slp_tree child;
   FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
     found_p |= vect_match_slp_patterns_2 (&SLP_TREE_CHILDREN (node)[i],
-					  vinfo, perm_cache, visited);
+					  vinfo, perm_cache, compat_cache,
+					  visited);
 
   for (unsigned x = 0; x < num__slp_patterns; x++)
     {
-      vect_pattern *pattern = slp_patterns[x] (perm_cache, ref_node);
+      vect_pattern *pattern
+	= slp_patterns[x] (perm_cache, compat_cache, ref_node);
       if (pattern)
 	{
 	  pattern->build (vinfo);
@@ -2943,7 +2946,8 @@ vect_match_slp_patterns_2 (slp_tree *ref_node, vec_info *vinfo,
 static bool
 vect_match_slp_patterns (slp_instance instance, vec_info *vinfo,
 			 hash_set<slp_tree> *visited,
-			 slp_tree_to_load_perm_map_t *perm_cache)
+			 slp_tree_to_load_perm_map_t *perm_cache,
+			 slp_compat_nodes_map_t *compat_cache)
 {
   DUMP_VECT_SCOPE ("vect_match_slp_patterns");
   slp_tree *ref_node = &SLP_INSTANCE_TREE (instance);
@@ -2953,7 +2957,8 @@ vect_match_slp_patterns (slp_instance instance, vec_info *vinfo,
 		     "Analyzing SLP tree %p for patterns\n",
 		     SLP_INSTANCE_TREE (instance));
 
-  return vect_match_slp_patterns_2 (ref_node, vinfo, perm_cache, visited);
+  return vect_match_slp_patterns_2 (ref_node, vinfo, perm_cache, compat_cache,
+				    visited);
 }
 
 /* STMT_INFO is a store group of size GROUP_SIZE that we are considering
@@ -3437,12 +3442,14 @@ vect_analyze_slp (vec_info *vinfo, unsigned max_tree_size)
 
   hash_set<slp_tree> visited_patterns;
   slp_tree_to_load_perm_map_t perm_cache;
+  slp_compat_nodes_map_t compat_cache;
 
   /* See if any patterns can be found in the SLP tree.  */
   bool pattern_found = false;
   FOR_EACH_VEC_ELT (LOOP_VINFO_SLP_INSTANCES (vinfo), i, instance)
     pattern_found |= vect_match_slp_patterns (instance, vinfo,
-					      &visited_patterns, &perm_cache);
+					      &visited_patterns, &perm_cache,
+					      &compat_cache);
 
   /* If any were found optimize permutations of loads.  */
   if (pattern_found)
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 2f6e1e268fb07e9de065ff9c45af87546e565d66..83cd0919c7838c65576e1debd881e0ec636a605a 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -2268,6 +2268,7 @@ extern void duplicate_and_interleave (vec_info *, gimple_seq *, tree,
 extern int vect_get_place_in_interleaving_chain (stmt_vec_info, stmt_vec_info);
 extern slp_tree vect_create_new_slp_node (unsigned, tree_code);
 extern void vect_free_slp_tree (slp_tree);
+extern bool compatible_calls_p (gcall *, gcall *);
 
 /* In tree-vect-patterns.c.  */
 extern void
@@ -2306,6 +2307,12 @@ typedef enum _complex_perm_kinds {
 typedef hash_map <slp_tree, complex_perm_kinds_t>
   slp_tree_to_load_perm_map_t;
 
+/* Cache from nodes pair to being compatible or not.  */
+typedef pair_hash <nofree_ptr_hash <_slp_tree>,
+		   nofree_ptr_hash <_slp_tree>> slp_node_hash;
+typedef hash_map <slp_node_hash, bool> slp_compat_nodes_map_t;
+
+
 /* Vector pattern matcher base class.  All SLP pattern matchers must inherit
    from this type.  */
 
@@ -2338,7 +2345,8 @@ class vect_pattern
   public:
 
     /* Create a new instance of the pattern matcher class of the given type.  */
-    static vect_pattern* recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
+    static vect_pattern* recognize (slp_tree_to_load_perm_map_t *,
+				    slp_compat_nodes_map_t *, slp_tree *);
 
     /* Build the pattern from the data collected so far.  */
     virtual void build (vec_info *) = 0;
@@ -2352,6 +2360,7 @@ class vect_pattern
 
 /* Function pointer to create a new pattern matcher from a generic type.  */
 typedef vect_pattern* (*vect_pattern_decl_t) (slp_tree_to_load_perm_map_t *,
+					      slp_compat_nodes_map_t *,
 					      slp_tree *);
 
 /* List of supported pattern matchers.  */


-- 

[-- Attachment #2: rb15145.patch --]
[-- Type: text/x-diff, Size: 41634 bytes --]

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 9ec051e94e10cca9eec2773e1b8c01b74b6ea4db..60dc5b3ea6087c2824ad1467bc66e9cfebe9dcfc 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -6325,12 +6325,12 @@ Perform a vector multiply and accumulate that is semantically the same as
 a multiply and accumulate of complex numbers.
 
 @smallexample
-  complex TYPE c[N];
-  complex TYPE a[N];
-  complex TYPE b[N];
+  complex TYPE op0[N];
+  complex TYPE op1[N];
+  complex TYPE op2[N];
   for (int i = 0; i < N; i += 1)
     @{
-      c[i] += a[i] * b[i];
+      op2[i] += op1[i] * op2[i];
     @}
 @end smallexample
 
@@ -6348,12 +6348,12 @@ the same as a multiply and accumulate of complex numbers where the second
 multiply arguments is conjugated.
 
 @smallexample
-  complex TYPE c[N];
-  complex TYPE a[N];
-  complex TYPE b[N];
+  complex TYPE op0[N];
+  complex TYPE op1[N];
+  complex TYPE op2[N];
   for (int i = 0; i < N; i += 1)
     @{
-      c[i] += a[i] * conj (b[i]);
+      op2[i] += op0[i] * conj (op1[i]);
     @}
 @end smallexample
 
@@ -6370,12 +6370,12 @@ Perform a vector multiply and subtract that is semantically the same as
 a multiply and subtract of complex numbers.
 
 @smallexample
-  complex TYPE c[N];
-  complex TYPE a[N];
-  complex TYPE b[N];
+  complex TYPE op0[N];
+  complex TYPE op1[N];
+  complex TYPE op2[N];
   for (int i = 0; i < N; i += 1)
     @{
-      c[i] -= a[i] * b[i];
+      op2[i] -= op0[i] * op1[i];
     @}
 @end smallexample
 
@@ -6393,12 +6393,12 @@ the same as a multiply and subtract of complex numbers where the second
 multiply arguments is conjugated.
 
 @smallexample
-  complex TYPE c[N];
-  complex TYPE a[N];
-  complex TYPE b[N];
+  complex TYPE op0[N];
+  complex TYPE op1[N];
+  complex TYPE op2[N];
   for (int i = 0; i < N; i += 1)
     @{
-      c[i] -= a[i] * conj (b[i]);
+      op2[i] -= op0[i] * conj (op1[i]);
     @}
 @end smallexample
 
@@ -6415,12 +6415,12 @@ Perform a vector multiply that is semantically the same as multiply of
 complex numbers.
 
 @smallexample
-  complex TYPE c[N];
-  complex TYPE a[N];
-  complex TYPE b[N];
+  complex TYPE op0[N];
+  complex TYPE op1[N];
+  complex TYPE op2[N];
   for (int i = 0; i < N; i += 1)
     @{
-      c[i] = a[i] * b[i];
+      op2[i] = op0[i] * op1[i];
     @}
 @end smallexample
 
@@ -6437,12 +6437,12 @@ Perform a vector multiply by conjugate that is semantically the same as a
 multiply of complex numbers where the second multiply arguments is conjugated.
 
 @smallexample
-  complex TYPE c[N];
-  complex TYPE a[N];
-  complex TYPE b[N];
+  complex TYPE op0[N];
+  complex TYPE op1[N];
+  complex TYPE op2[N];
   for (int i = 0; i < N; i += 1)
     @{
-      c[i] = a[i] * conj (b[i]);
+      op2[i] = op0[i] * conj (op1[i]);
     @}
 @end smallexample
 
diff --git a/gcc/testsuite/g++.dg/vect/pr99149.cc b/gcc/testsuite/g++.dg/vect/pr99149.cc
index e6e0594a336fa053ffba64a12e2de43a4e373f49..bb9f5fa89f12b184368bf5488d6e9432c2166463 100755
--- a/gcc/testsuite/g++.dg/vect/pr99149.cc
+++ b/gcc/testsuite/g++.dg/vect/pr99149.cc
@@ -24,4 +24,4 @@ public:
 } n;
 main() { n.j(); }
 
-/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_MUL" 1 "slp2" } } */
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_MUL" 1 "slp2" { xfail { vect_float } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-1.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..46b9a55f05279d732fa1418e02f779cf693ede07
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-1.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+
+float f[12][100];
+
+void bad1(float v1, float v2)
+{
+  for (int r = 0; r < 100; r += 4)
+    {
+      int i = r + 1;
+      f[0][r] = f[1][r] * (f[2][r] + v2) - f[1][i] * (f[2][i] + v1);
+      f[0][i] = f[1][r] * (f[2][i] + v1) + f[1][i] * (f[2][r] + v2);
+      f[0][r+2] = f[1][r+2] * (f[2][r+2] + v2) - f[1][i+2] * (f[2][i+2] + v1);
+      f[0][i+2] = f[1][r+2] * (f[2][i+2] + v1) + f[1][i+2] * (f[2][r+2] + v2);
+      //                  ^^^^^^^             ^^^^^^^
+    }
+}
+
+/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-2.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..ffe646efe57f7ad07541b0fb96601596f46dc5f8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-2.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+
+float f[12][100];
+
+void bad1(float v1, float v2)
+{
+  for (int r = 0; r < 100; r += 2)
+    {
+      int i = r + 1;
+      f[0][r] = f[1][r] * (f[2][r] + v1) - f[1][i] * (f[2][i] + v2);
+      f[0][i] = f[1][r] * (f[2][i] + v1) + f[1][i] * (f[2][r] + v2);
+    }
+}
+
+/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-3.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-3.c
new file mode 100644
index 0000000000000000000000000000000000000000..5f98aa204d8b11b0cb433f8965dbb72cf8940de1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-3.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+
+float f[12][100];
+
+void good1(float v1, float v2)
+{
+  for (int r = 0; r < 100; r += 2)
+    {
+      int i = r + 1;
+      f[0][r] = f[1][r] * (f[2][r] + v2) - f[1][i] * (f[2][i] + v1);
+      f[0][i] = f[1][r] * (f[2][i] + v1) + f[1][i] * (f[2][r] + v2);
+    }
+}
+
+/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-4.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-4.c
new file mode 100644
index 0000000000000000000000000000000000000000..882851789c5085e734000609114be480d3b08bd0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-4.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+
+float f[12][100];
+
+void good1()
+{
+  for (int r = 0; r < 100; r += 2)
+    {
+      int i = r + 1;
+      f[0][r] = f[1][r] * f[2][r] - f[1][i] * f[2][i];
+      f[0][i] = f[1][r] * f[2][i] + f[1][i] * f[2][r];
+    }
+}
+
+/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-5.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-5.c
new file mode 100644
index 0000000000000000000000000000000000000000..6a2d549d65f3f27d407fb0bd469473e6a5c333ae
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-5.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+
+float f[12][100];
+
+void good2()
+{
+  for (int r = 0; r < 100; r += 2)
+    {
+      int i = r + 1;
+      f[0][r] = f[1][r] * (f[2][r] + 1) - f[1][i] * (f[2][i] + 1);
+      f[0][i] = f[1][r] * (f[2][i] + 1) + f[1][i] * (f[2][r] + 1);
+    }
+}
+
+/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-6.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-6.c
new file mode 100644
index 0000000000000000000000000000000000000000..71e66dbe3b29eec1fffb8df9b216022fdc0af54e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-6.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+
+float f[12][100];
+
+void bad1()
+{
+  for (int r = 0; r < 100; r += 2)
+    {
+      int i = r + 1;
+      f[0][r] = f[1][r] * f[2][r] - f[1][i] * f[3][i];
+      f[0][i] = f[1][r] * f[2][i] + f[1][i] * f[3][r];
+      //                  ^^^^^^^             ^^^^^^^
+    }
+}
+
+/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-7.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-7.c
new file mode 100644
index 0000000000000000000000000000000000000000..536672f3c8bb474ad5fa4bb61b3a36b555acf3cf
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-7.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+
+float f[12][100];
+
+void bad2()
+{
+  for (int r = 0; r < 100; r += 2)
+    {
+      int i = r + 1;
+      f[0][r] = f[1][r] * (f[2][r] + 1) - f[1][i] * f[2][i];
+      f[0][i] = f[1][r] * (f[2][i] + 1) + f[1][i] * f[2][r];
+      //                          ^^^^
+    }
+}
+
+/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-8.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-8.c
new file mode 100644
index 0000000000000000000000000000000000000000..07b48148688b7d530e5891d023d558b58a485c23
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-8.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+
+float f[12][100];
+
+void bad3()
+{
+  for (int r = 0; r < 100; r += 2)
+    {
+      int i = r + 1;
+      f[0][r] = f[1][r] * f[2][r] - f[1][r] * f[2][i];
+      f[0][i] = f[1][r] * f[2][i] + f[1][i] * f[2][r];
+      //                            ^^^^^^^
+    }
+}
+
+/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-9.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-9.c
new file mode 100644
index 0000000000000000000000000000000000000000..7655852434b21b381fe7ee316e8caf3d485b8ee1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-9.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+
+#include <stdio.h>
+#include <complex.h>
+
+#define N 200
+#define TYPE float
+#define TYPE2 float
+
+void g (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
+{
+  for (int i=0; i < N; i++)
+    {
+      c[i] -=  a[i] * b[0];
+    }
+}
+
+/* The pattern overlaps with COMPLEX_ADD so we need to support consuming ADDs in COMPLEX_FMS.  */
+
+/* { dg-final { scan-tree-dump "Found COMPLEX_FMS" "vect" { xfail { vect_float } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr103169.c b/gcc/testsuite/gcc.dg/vect/complex/pr103169.c
new file mode 100644
index 0000000000000000000000000000000000000000..1bfabbd85a0eedfb4156a82574324126e9083fc5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/pr103169.c
@@ -0,0 +1,12 @@
+/* { dg-do compile { target { vect_double } } } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+/* { dg-additional-options "-O2 -fvect-cost-model=unlimited" } */
+
+_Complex double b_0, c_0;
+
+void
+mul270snd (void)
+{
+  c_0 = b_0 * 1.0iF * 1.0iF;
+}
+
diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h
index 74f579c9f3f23bac25d21546068c2ab43209aa2b..8ad5fa521279b20fa5e63eecf442d5dc5c16e7ee 100644
--- a/gcc/tree-data-ref.h
+++ b/gcc/tree-data-ref.h
@@ -600,10 +600,11 @@ same_data_refs_base_objects (data_reference_p a, data_reference_p b)
 }
 
 /* Return true when the data references A and B are accessing the same
-   memory object with the same access functions.  */
+   memory object with the same access functions.  Optionally skip the
+   last OFFSET dimensions in the data reference.  */
 
 static inline bool
-same_data_refs (data_reference_p a, data_reference_p b)
+same_data_refs (data_reference_p a, data_reference_p b, int offset = 0)
 {
   unsigned int i;
 
@@ -614,7 +615,7 @@ same_data_refs (data_reference_p a, data_reference_p b)
   if (!same_data_refs_base_objects (a, b))
     return false;
 
-  for (i = 0; i < DR_NUM_DIMENSIONS (a); i++)
+  for (i = offset; i < DR_NUM_DIMENSIONS (a); i++)
     if (!eq_evolutions_p (DR_ACCESS_FN (a, i), DR_ACCESS_FN (b, i)))
       return false;
 
diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
index 0350441fad9690cd5d04337171ca3470a064a571..f8da4153632a700680091f37305a5d3078fbb0c5 100644
--- a/gcc/tree-vect-slp-patterns.c
+++ b/gcc/tree-vect-slp-patterns.c
@@ -149,12 +149,13 @@ is_linear_load_p (load_permutation_t loads)
   int valid_patterns = 4;
   FOR_EACH_VEC_ELT (loads, i, load)
     {
-      if (candidates[0] != PERM_UNKNOWN && load != 1)
+      unsigned adj_load = load % 2;
+      if (candidates[0] != PERM_UNKNOWN && adj_load != 1)
 	{
 	  candidates[0] = PERM_UNKNOWN;
 	  valid_patterns--;
 	}
-      if (candidates[1] != PERM_UNKNOWN && load != 0)
+      if (candidates[1] != PERM_UNKNOWN && adj_load != 0)
 	{
 	  candidates[1] = PERM_UNKNOWN;
 	  valid_patterns--;
@@ -596,11 +597,12 @@ class complex_add_pattern : public complex_pattern
   public:
     void build (vec_info *);
     static internal_fn
-    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, slp_tree *,
-	     vec<slp_tree> *);
+    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
+	     slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *);
 
     static vect_pattern*
-    recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
+    recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t *,
+	       slp_tree *);
 
     static vect_pattern*
     mkInstance (slp_tree *node, vec<slp_tree> *m_ops, internal_fn ifn)
@@ -647,6 +649,7 @@ complex_add_pattern::build (vec_info *vinfo)
 internal_fn
 complex_add_pattern::matches (complex_operation_t op,
 			      slp_tree_to_load_perm_map_t *perm_cache,
+			      slp_compat_nodes_map_t * /* compat_cache */,
 			      slp_tree *node, vec<slp_tree> *ops)
 {
   internal_fn ifn = IFN_LAST;
@@ -692,13 +695,14 @@ complex_add_pattern::matches (complex_operation_t op,
 
 vect_pattern*
 complex_add_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache,
+				slp_compat_nodes_map_t *compat_cache,
 				slp_tree *node)
 {
   auto_vec<slp_tree> ops;
   complex_operation_t op
     = vect_detect_pair_op (*node, true, &ops);
   internal_fn ifn
-    = complex_add_pattern::matches (op, perm_cache, node, &ops);
+    = complex_add_pattern::matches (op, perm_cache, compat_cache, node, &ops);
   if (ifn == IFN_LAST)
     return NULL;
 
@@ -709,147 +713,214 @@ complex_add_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache,
  * complex_mul_pattern
  ******************************************************************************/
 
-/* Check to see if either of the trees in ARGS are a NEGATE_EXPR.  If the first
-   child (args[0]) is a NEGATE_EXPR then NEG_FIRST_P is set to TRUE.
-
-   If a negate is found then the values in ARGS are reordered such that the
-   negate node is always the second one and the entry is replaced by the child
-   of the negate node.  */
+/* Helper function to check if PERM is KIND or PERM_TOP.  */
 
 static inline bool
-vect_normalize_conj_loc (vec<slp_tree> &args, bool *neg_first_p = NULL)
+is_eq_or_top (slp_tree_to_load_perm_map_t *perm_cache,
+	      slp_tree op1, complex_perm_kinds_t kind1,
+	      slp_tree op2, complex_perm_kinds_t kind2)
 {
-  gcc_assert (args.length () == 2);
-  bool neg_found = false;
-
-  if (vect_match_expression_p (args[0], NEGATE_EXPR))
-    {
-      std::swap (args[0], args[1]);
-      neg_found = true;
-      if (neg_first_p)
-	*neg_first_p = true;
-    }
-  else if (vect_match_expression_p (args[1], NEGATE_EXPR))
-    {
-      neg_found = true;
-      if (neg_first_p)
-	*neg_first_p = false;
-    }
+  complex_perm_kinds_t perm1 = linear_loads_p (perm_cache, op1);
+  if (perm1 != kind1 && perm1 != PERM_TOP)
+    return false;
 
-  if (neg_found)
-    args[1] = SLP_TREE_CHILDREN (args[1])[0];
+  complex_perm_kinds_t perm2 = linear_loads_p (perm_cache, op2);
+  if (perm2 != kind2 && perm2 != PERM_TOP)
+    return false;
 
-  return neg_found;
+  return true;
 }
 
-/* Helper function to check if PERM is KIND or PERM_TOP.  */
+enum _conj_status { CONJ_NONE, CONJ_FST, CONJ_SND };
 
 static inline bool
-is_eq_or_top (complex_perm_kinds_t perm, complex_perm_kinds_t kind)
+compatible_complex_nodes_p (slp_compat_nodes_map_t *compat_cache,
+			    slp_tree a, int *pa, slp_tree b, int *pb)
 {
-  return perm == kind || perm == PERM_TOP;
-}
+  bool *tmp;
+  std::pair<slp_tree, slp_tree> key = std::make_pair(a, b);
+  if ((tmp = compat_cache->get (key)) != NULL)
+    return *tmp;
 
-/* Helper function that checks to see if LEFT_OP and RIGHT_OP are both MULT_EXPR
-   nodes but also that they represent an operation that is either a complex
-   multiplication or a complex multiplication by conjugated value.
+   compat_cache->put (key, false);
 
-   Of the negation is expected to be in the first half of the tree (As required
-   by an FMS pattern) then NEG_FIRST is true.  If the operation is a conjugate
-   operation then CONJ_FIRST_OPERAND is set to indicate whether the first or
-   second operand contains the conjugate operation.  */
+  if (SLP_TREE_CHILDREN (a).length () != SLP_TREE_CHILDREN (b).length ())
+    return false;
 
-static inline bool
-vect_validate_multiplication (slp_tree_to_load_perm_map_t *perm_cache,
-			      const vec<slp_tree> &left_op,
-			      const vec<slp_tree> &right_op,
-			     bool neg_first, bool *conj_first_operand,
-			     bool fms)
-{
-  /* The presence of a negation indicates that we have either a conjugate or a
-     rotation.  We need to distinguish which one.  */
-  *conj_first_operand = false;
-  complex_perm_kinds_t kind;
-
-  /* Complex conjugates have the negation on the imaginary part of the
-     number where rotations affect the real component.  So check if the
-     negation is on a dup of lane 1.  */
-  if (fms)
+  if (SLP_TREE_DEF_TYPE (a) != SLP_TREE_DEF_TYPE (b))
+    return false;
+
+  /* Only internal nodes can be loads, as such we can't check further if they
+     are externals.  */
+  if (SLP_TREE_DEF_TYPE (a) != vect_internal_def)
     {
-      /* Canonicalization for fms is not consistent. So have to test both
-	 variants to be sure.  This needs to be fixed in the mid-end so
-	 this part can be simpler.  */
-      kind = linear_loads_p (perm_cache, right_op[0]);
-      if (!((is_eq_or_top (linear_loads_p (perm_cache, right_op[0]), PERM_ODDODD)
-	   && is_eq_or_top (linear_loads_p (perm_cache, right_op[1]),
-			     PERM_ODDEVEN))
-	  || (kind == PERM_ODDEVEN
-	      && is_eq_or_top (linear_loads_p (perm_cache, right_op[1]),
-			     PERM_ODDODD))))
-	return false;
+      for (unsigned i = 0; i < SLP_TREE_SCALAR_OPS (a).length (); i++)
+	{
+	  tree op1 = SLP_TREE_SCALAR_OPS (a)[pa[i % 2]];
+	  tree op2 = SLP_TREE_SCALAR_OPS (b)[pb[i % 2]];
+	  if (!operand_equal_p (op1, op2, 0))
+	    return false;
+	}
+
+      compat_cache->put (key, true);
+      return true;
+    }
+
+  auto a_stmt = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE (a));
+  auto b_stmt = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE (b));
+
+  if (gimple_code (a_stmt) != gimple_code (b_stmt))
+    return false;
+
+  /* code, children, type, externals, loads, constants  */
+  if (gimple_num_args (a_stmt) != gimple_num_args (b_stmt))
+    return false;
+
+  /* At this point, a and b are known to be the same gimple operations.  */
+  if (is_gimple_call (a_stmt))
+    {
+	if (!compatible_calls_p (dyn_cast <gcall *> (a_stmt),
+				 dyn_cast <gcall *> (b_stmt)))
+	  return false;
     }
+  else if (!is_gimple_assign (a_stmt))
+    return false;
   else
     {
-      if (linear_loads_p (perm_cache, right_op[1]) != PERM_ODDODD
-	  && !is_eq_or_top (linear_loads_p (perm_cache, right_op[0]),
-			    PERM_ODDEVEN))
+      tree_code acode = gimple_assign_rhs_code (a_stmt);
+      tree_code bcode = gimple_assign_rhs_code (b_stmt);
+      if ((acode == REALPART_EXPR || acode == IMAGPART_EXPR)
+	  && (bcode == REALPART_EXPR || bcode == IMAGPART_EXPR))
+	return true;
+
+      if (acode != bcode)
 	return false;
     }
 
-  /* Deal with differences in indexes.  */
-  int index1 = fms ? 1 : 0;
-  int index2 = fms ? 0 : 1;
-
-  /* Check if the conjugate is on the second first or second operand.  The
-     order of the node with the conjugate value determines this, and the dup
-     node must be one of lane 0 of the same DR as the neg node.  */
-  kind = linear_loads_p (perm_cache, left_op[index1]);
-  if (kind == PERM_TOP)
+  if (!SLP_TREE_LOAD_PERMUTATION (a).exists ()
+      || !SLP_TREE_LOAD_PERMUTATION (b).exists ())
     {
-      if (linear_loads_p (perm_cache, left_op[index2]) == PERM_EVENODD)
-	return true;
+      for (unsigned i = 0; i < gimple_num_args (a_stmt); i++)
+	{
+	  tree t1 = gimple_arg (a_stmt, i);
+	  tree t2 = gimple_arg (b_stmt, i);
+	  if (TREE_CODE (t1) != TREE_CODE (t2))
+	    return false;
+
+	  /* If SSA name then we will need to inspect the children
+	     so we can punt here.  */
+	  if (TREE_CODE (t1) == SSA_NAME)
+	    continue;
+
+	  if (!operand_equal_p (t1, t2, 0))
+	    return false;
+	}
     }
-  else if (kind == PERM_EVENODD && !neg_first)
+  else
     {
-      if ((kind = linear_loads_p (perm_cache, left_op[index2])) != PERM_EVENEVEN)
+      auto dr1 = STMT_VINFO_DATA_REF (SLP_TREE_REPRESENTATIVE (a));
+      auto dr2 = STMT_VINFO_DATA_REF (SLP_TREE_REPRESENTATIVE (b));
+      /* Don't check the last dimension as that's checked by the lineary
+	 checks.  This check is also much stricter than what we need
+	 because it doesn't consider loading from adjacent elements
+	 in the same struct as loading from the same base object.
+	 But for now, I'll play it safe.  */
+      if (!same_data_refs (dr1, dr2, 1))
 	return false;
-      return true;
     }
-  else if (kind == PERM_EVENEVEN && neg_first)
+
+  for (unsigned i = 0; i < SLP_TREE_CHILDREN (a).length (); i++)
     {
-      if ((kind = linear_loads_p (perm_cache, left_op[index2])) != PERM_EVENODD)
+      if (!compatible_complex_nodes_p (compat_cache,
+				       SLP_TREE_CHILDREN (a)[i], pa,
+				       SLP_TREE_CHILDREN (b)[i], pb))
 	return false;
-
-      *conj_first_operand = true;
-      return true;
     }
-  else
-    return false;
-
-  if (kind != PERM_EVENEVEN)
-    return false;
 
+  compat_cache->put (key, true);
   return true;
 }
 
-/* Helper function to help distinguish between a conjugate and a rotation in a
-   complex multiplication.  The operations have similar shapes but the order of
-   the load permutes are different.  This function returns TRUE when the order
-   is consistent with a multiplication or multiplication by conjugated
-   operand but returns FALSE if it's a multiplication by rotated operand.  */
-
 static inline bool
 vect_validate_multiplication (slp_tree_to_load_perm_map_t *perm_cache,
-			      const vec<slp_tree> &op,
-			      complex_perm_kinds_t permKind)
+			      slp_compat_nodes_map_t *compat_cache,
+			      vec<slp_tree> &left_op,
+			      vec<slp_tree> &right_op,
+			      bool subtract,
+			      enum _conj_status *_status)
 {
-  /* The left node is the more common case, test it first.  */
-  if (!is_eq_or_top (linear_loads_p (perm_cache, op[0]), permKind))
+  auto_vec<slp_tree> ops;
+  enum _conj_status stats = CONJ_NONE;
+
+  /* The complex operations can occur in two layouts and two permute sequences
+     so declare them and re-use them.  */
+  int styles[][4] = { { 0, 2, 1, 3} /* {L1, R1} + {L2, R2}.  */
+		    , { 0, 3, 1, 2} /* {L1, R2} + {L2, R1}.  */
+		    };
+
+  /* Now for the corresponding permutes that go with these values.  */
+  complex_perm_kinds_t perms[][4]
+    = { { PERM_EVENEVEN, PERM_ODDODD, PERM_EVENODD, PERM_ODDEVEN }
+      , { PERM_EVENODD, PERM_ODDEVEN, PERM_EVENEVEN, PERM_ODDODD }
+      };
+
+  /* These permutes are used during comparisons of externals on which
+     we require strict equality.  */
+  int cq[][4][2]
+    = { { { 0, 0 }, { 1, 1 }, { 0, 1 }, { 1, 0 } }
+      , { { 0, 1 }, { 1, 0 }, { 0, 0 }, { 1, 1 } }
+      };
+
+  /* Default to style and perm 0, most operations use this one.  */
+  int style = 0;
+  int perm = subtract ? 1 : 0;
+
+  /* Check if we have a negate operation, if so absorb the node and continue
+     looking.  */
+  bool neg0 = vect_match_expression_p (right_op[0], NEGATE_EXPR);
+  bool neg1 = vect_match_expression_p (right_op[1], NEGATE_EXPR);
+
+  /* Determine which style we're looking at.  We only have different ones
+     whenever a conjugate is involved.  */
+  if (neg0 && neg1)
+    ;
+  else if (neg0)
     {
-      if (!is_eq_or_top (linear_loads_p (perm_cache, op[1]), permKind))
-	return false;
+      right_op[0] = SLP_TREE_CHILDREN (right_op[0])[0];
+      stats = CONJ_FST;
+      if (subtract)
+	perm = 0;
     }
-  return true;
+  else if (neg1)
+    {
+      right_op[1] = SLP_TREE_CHILDREN (right_op[1])[0];
+      stats = CONJ_SND;
+      perm = 1;
+    }
+
+  *_status = stats;
+
+  /* Flatten the inputs after we've remapped them.  */
+  ops.create (4);
+  ops.safe_splice (left_op);
+  ops.safe_splice (right_op);
+
+  /* Extract out the elements to check.  */
+  slp_tree op0 = ops[styles[style][0]];
+  slp_tree op1 = ops[styles[style][1]];
+  slp_tree op2 = ops[styles[style][2]];
+  slp_tree op3 = ops[styles[style][3]];
+
+  /* Do cheapest test first.  If failed no need to analyze further.  */
+  if (linear_loads_p (perm_cache, op0) != perms[perm][0]
+      || linear_loads_p (perm_cache, op1) != perms[perm][1]
+      || !is_eq_or_top (perm_cache, op2, perms[perm][2], op3, perms[perm][3]))
+    return false;
+
+  return compatible_complex_nodes_p (compat_cache, op0, cq[perm][0], op1,
+				     cq[perm][1])
+	 && compatible_complex_nodes_p (compat_cache, op2, cq[perm][2], op3,
+					cq[perm][3]);
 }
 
 /* This function combines two nodes containing only even and only odd lanes
@@ -908,11 +979,12 @@ class complex_mul_pattern : public complex_pattern
   public:
     void build (vec_info *);
     static internal_fn
-    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, slp_tree *,
-	     vec<slp_tree> *);
+    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
+	     slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *);
 
     static vect_pattern*
-    recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
+    recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t *,
+	       slp_tree *);
 
     static vect_pattern*
     mkInstance (slp_tree *node, vec<slp_tree> *m_ops, internal_fn ifn)
@@ -943,6 +1015,7 @@ class complex_mul_pattern : public complex_pattern
 internal_fn
 complex_mul_pattern::matches (complex_operation_t op,
 			      slp_tree_to_load_perm_map_t *perm_cache,
+			      slp_compat_nodes_map_t *compat_cache,
 			      slp_tree *node, vec<slp_tree> *ops)
 {
   internal_fn ifn = IFN_LAST;
@@ -990,17 +1063,13 @@ complex_mul_pattern::matches (complex_operation_t op,
       || linear_loads_p (perm_cache, left_op[1]) == PERM_ODDEVEN)
     return IFN_LAST;
 
-  bool neg_first = false;
-  bool conj_first_operand = false;
-  bool is_neg = vect_normalize_conj_loc (right_op, &neg_first);
+  enum _conj_status status;
+  if (!vect_validate_multiplication (perm_cache, compat_cache, left_op,
+				     right_op, false, &status))
+    return IFN_LAST;
 
-  if (!is_neg)
+  if (status == CONJ_NONE)
     {
-      /* A multiplication needs to multiply agains the real pair, otherwise
-	 the pattern matches that of FMS.   */
-      if (!vect_validate_multiplication (perm_cache, left_op, PERM_EVENEVEN)
-	  || vect_normalize_conj_loc (left_op))
-	return IFN_LAST;
       if (add0)
 	ifn = IFN_COMPLEX_FMA;
       else
@@ -1008,11 +1077,6 @@ complex_mul_pattern::matches (complex_operation_t op,
     }
   else
     {
-      if (!vect_validate_multiplication (perm_cache, left_op, right_op,
-					 neg_first, &conj_first_operand,
-					 false))
-	return IFN_LAST;
-
       if(add0)
 	ifn = IFN_COMPLEX_FMA_CONJ;
       else
@@ -1029,19 +1093,13 @@ complex_mul_pattern::matches (complex_operation_t op,
     ops->quick_push (add0);
 
   complex_perm_kinds_t kind = linear_loads_p (perm_cache, left_op[0]);
-  if (kind == PERM_EVENODD)
-    {
-      ops->quick_push (left_op[1]);
-      ops->quick_push (right_op[1]);
-      ops->quick_push (left_op[0]);
-    }
-  else if (kind == PERM_TOP)
+  if (kind == PERM_EVENODD || kind == PERM_TOP)
     {
       ops->quick_push (left_op[1]);
       ops->quick_push (right_op[1]);
       ops->quick_push (left_op[0]);
     }
-  else if (kind == PERM_EVENEVEN && !conj_first_operand)
+  else if (kind == PERM_EVENEVEN && status != CONJ_SND)
     {
       ops->quick_push (left_op[0]);
       ops->quick_push (right_op[0]);
@@ -1061,13 +1119,14 @@ complex_mul_pattern::matches (complex_operation_t op,
 
 vect_pattern*
 complex_mul_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache,
+				slp_compat_nodes_map_t *compat_cache,
 				slp_tree *node)
 {
   auto_vec<slp_tree> ops;
   complex_operation_t op
     = vect_detect_pair_op (*node, true, &ops);
   internal_fn ifn
-    = complex_mul_pattern::matches (op, perm_cache, node, &ops);
+    = complex_mul_pattern::matches (op, perm_cache, compat_cache, node, &ops);
   if (ifn == IFN_LAST)
     return NULL;
 
@@ -1097,8 +1156,8 @@ complex_mul_pattern::build (vec_info *vinfo)
 
 	/* First re-arrange the children.  */
 	SLP_TREE_CHILDREN (*this->m_node).reserve_exact (2);
-	SLP_TREE_CHILDREN (*this->m_node)[0] = this->m_ops[2];
-	SLP_TREE_CHILDREN (*this->m_node)[1] = newnode;
+	SLP_TREE_CHILDREN (*this->m_node)[0] = newnode;
+	SLP_TREE_CHILDREN (*this->m_node)[1] = this->m_ops[2];
 	break;
       }
     case IFN_COMPLEX_FMA:
@@ -1115,9 +1174,9 @@ complex_mul_pattern::build (vec_info *vinfo)
 
 	/* First re-arrange the children.  */
 	SLP_TREE_CHILDREN (*this->m_node).safe_grow (3);
-	SLP_TREE_CHILDREN (*this->m_node)[0] = this->m_ops[0];
+	SLP_TREE_CHILDREN (*this->m_node)[0] = newnode;
 	SLP_TREE_CHILDREN (*this->m_node)[1] = this->m_ops[3];
-	SLP_TREE_CHILDREN (*this->m_node)[2] = newnode;
+	SLP_TREE_CHILDREN (*this->m_node)[2] = this->m_ops[0];
 
 	/* Tell the builder to expect an extra argument.  */
 	this->m_num_args++;
@@ -1147,11 +1206,12 @@ class complex_fms_pattern : public complex_pattern
   public:
     void build (vec_info *);
     static internal_fn
-    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, slp_tree *,
-	     vec<slp_tree> *);
+    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
+	     slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *);
 
     static vect_pattern*
-    recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
+    recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t *,
+	       slp_tree *);
 
     static vect_pattern*
     mkInstance (slp_tree *node, vec<slp_tree> *m_ops, internal_fn ifn)
@@ -1182,6 +1242,7 @@ class complex_fms_pattern : public complex_pattern
 internal_fn
 complex_fms_pattern::matches (complex_operation_t op,
 			      slp_tree_to_load_perm_map_t *perm_cache,
+			      slp_compat_nodes_map_t *compat_cache,
 			      slp_tree * ref_node, vec<slp_tree> *ops)
 {
   internal_fn ifn = IFN_LAST;
@@ -1197,6 +1258,8 @@ complex_fms_pattern::matches (complex_operation_t op,
   if (!vect_match_expression_p (root, MINUS_EXPR))
     return IFN_LAST;
 
+  /* TODO: Support invariants here, with the new layout CADD now
+	   can match before we get a chance to try CFMS.  */
   auto nodes = SLP_TREE_CHILDREN (root);
   if (!vect_match_expression_p (nodes[1], MULT_EXPR)
       || vect_detect_pair_op (nodes[0]) != PLUS_MINUS)
@@ -1217,16 +1280,14 @@ complex_fms_pattern::matches (complex_operation_t op,
       || !vect_match_expression_p (l0node[1], MULT_EXPR))
     return IFN_LAST;
 
-  bool is_neg = vect_normalize_conj_loc (left_op);
-
-  bool conj_first_operand = false;
-  if (!vect_validate_multiplication (perm_cache, right_op, left_op, false,
-				     &conj_first_operand, true))
+  enum _conj_status status;
+  if (!vect_validate_multiplication (perm_cache, compat_cache, right_op,
+				     left_op, true, &status))
     return IFN_LAST;
 
-  if (!is_neg)
+  if (status == CONJ_NONE)
     ifn = IFN_COMPLEX_FMS;
-  else if (is_neg)
+  else
     ifn = IFN_COMPLEX_FMS_CONJ;
 
   if (!vect_pattern_validate_optab (ifn, *ref_node))
@@ -1243,26 +1304,12 @@ complex_fms_pattern::matches (complex_operation_t op,
       ops->quick_push (right_op[1]);
       ops->quick_push (left_op[1]);
     }
-  else if (kind == PERM_TOP)
-    {
-      ops->quick_push (l0node[0]);
-      ops->quick_push (right_op[1]);
-      ops->quick_push (right_op[0]);
-      ops->quick_push (left_op[0]);
-    }
-  else if (kind == PERM_EVENEVEN && !is_neg)
-    {
-      ops->quick_push (l0node[0]);
-      ops->quick_push (right_op[1]);
-      ops->quick_push (right_op[0]);
-      ops->quick_push (left_op[0]);
-    }
   else
     {
       ops->quick_push (l0node[0]);
       ops->quick_push (right_op[1]);
       ops->quick_push (right_op[0]);
-      ops->quick_push (left_op[1]);
+      ops->quick_push (left_op[0]);
     }
 
   return ifn;
@@ -1272,13 +1319,14 @@ complex_fms_pattern::matches (complex_operation_t op,
 
 vect_pattern*
 complex_fms_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache,
+				slp_compat_nodes_map_t *compat_cache,
 				slp_tree *node)
 {
   auto_vec<slp_tree> ops;
   complex_operation_t op
     = vect_detect_pair_op (*node, true, &ops);
   internal_fn ifn
-    = complex_fms_pattern::matches (op, perm_cache, node, &ops);
+    = complex_fms_pattern::matches (op, perm_cache, compat_cache, node, &ops);
   if (ifn == IFN_LAST)
     return NULL;
 
@@ -1305,9 +1353,24 @@ complex_fms_pattern::build (vec_info *vinfo)
   SLP_TREE_CHILDREN (*this->m_node).create (3);
 
   /* First re-arrange the children.  */
+  switch (this->m_ifn)
+  {
+    case IFN_COMPLEX_FMS:
+      {
+	SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[1]);
+	SLP_TREE_CHILDREN (*this->m_node).quick_push (newnode);
+	break;
+      }
+    case IFN_COMPLEX_FMS_CONJ:
+      {
+	SLP_TREE_CHILDREN (*this->m_node).quick_push (newnode);
+	SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[1]);
+	break;
+      }
+    default:
+      gcc_unreachable ();
+  }
   SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[0]);
-  SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[1]);
-  SLP_TREE_CHILDREN (*this->m_node).quick_push (newnode);
 
   /* And then rewrite the node itself.  */
   complex_pattern::build (vinfo);
@@ -1334,11 +1397,12 @@ class complex_operations_pattern : public complex_pattern
   public:
     void build (vec_info *);
     static internal_fn
-    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, slp_tree *,
-	     vec<slp_tree> *);
+    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
+	     slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *);
 
     static vect_pattern*
-    recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
+    recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t *,
+	       slp_tree *);
 };
 
 /* Dummy matches implementation for proxy object.  */
@@ -1347,6 +1411,7 @@ internal_fn
 complex_operations_pattern::
 matches (complex_operation_t /* op */,
 	 slp_tree_to_load_perm_map_t * /* perm_cache */,
+	 slp_compat_nodes_map_t * /* compat_cache */,
 	 slp_tree * /* ref_node */, vec<slp_tree> * /* ops */)
 {
   return IFN_LAST;
@@ -1356,6 +1421,7 @@ matches (complex_operation_t /* op */,
 
 vect_pattern*
 complex_operations_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache,
+				       slp_compat_nodes_map_t *ccache,
 				       slp_tree *node)
 {
   auto_vec<slp_tree> ops;
@@ -1363,15 +1429,15 @@ complex_operations_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache,
     = vect_detect_pair_op (*node, true, &ops);
   internal_fn ifn = IFN_LAST;
 
-  ifn  = complex_fms_pattern::matches (op, perm_cache, node, &ops);
+  ifn  = complex_fms_pattern::matches (op, perm_cache, ccache, node, &ops);
   if (ifn != IFN_LAST)
     return complex_fms_pattern::mkInstance (node, &ops, ifn);
 
-  ifn  = complex_mul_pattern::matches (op, perm_cache, node, &ops);
+  ifn  = complex_mul_pattern::matches (op, perm_cache, ccache, node, &ops);
   if (ifn != IFN_LAST)
     return complex_mul_pattern::mkInstance (node, &ops, ifn);
 
-  ifn  = complex_add_pattern::matches (op, perm_cache, node, &ops);
+  ifn  = complex_add_pattern::matches (op, perm_cache, ccache, node, &ops);
   if (ifn != IFN_LAST)
     return complex_add_pattern::mkInstance (node, &ops, ifn);
 
@@ -1398,11 +1464,13 @@ class addsub_pattern : public vect_pattern
     void build (vec_info *);
 
     static vect_pattern*
-    recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
+    recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t *,
+	       slp_tree *);
 };
 
 vect_pattern *
-addsub_pattern::recognize (slp_tree_to_load_perm_map_t *, slp_tree *node_)
+addsub_pattern::recognize (slp_tree_to_load_perm_map_t *,
+			   slp_compat_nodes_map_t *, slp_tree *node_)
 {
   slp_tree node = *node_;
   if (SLP_TREE_CODE (node) != VEC_PERM_EXPR
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index b912c3577df61a694d5bb9e22c5303fe6a48ab6e..cb577f8a612d583254e42bb06a6d7a0875de5e75 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -804,7 +804,7 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned char swap,
 /* Return true if call statements CALL1 and CALL2 are similar enough
    to be combined into the same SLP group.  */
 
-static bool
+bool
 compatible_calls_p (gcall *call1, gcall *call2)
 {
   unsigned int nargs = gimple_call_num_args (call1);
@@ -2907,6 +2907,7 @@ optimize_load_redistribution (scalar_stmts_to_slp_tree_map_t *bst_map,
 static bool
 vect_match_slp_patterns_2 (slp_tree *ref_node, vec_info *vinfo,
 			   slp_tree_to_load_perm_map_t *perm_cache,
+			   slp_compat_nodes_map_t *compat_cache,
 			   hash_set<slp_tree> *visited)
 {
   unsigned i;
@@ -2918,11 +2919,13 @@ vect_match_slp_patterns_2 (slp_tree *ref_node, vec_info *vinfo,
   slp_tree child;
   FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
     found_p |= vect_match_slp_patterns_2 (&SLP_TREE_CHILDREN (node)[i],
-					  vinfo, perm_cache, visited);
+					  vinfo, perm_cache, compat_cache,
+					  visited);
 
   for (unsigned x = 0; x < num__slp_patterns; x++)
     {
-      vect_pattern *pattern = slp_patterns[x] (perm_cache, ref_node);
+      vect_pattern *pattern
+	= slp_patterns[x] (perm_cache, compat_cache, ref_node);
       if (pattern)
 	{
 	  pattern->build (vinfo);
@@ -2943,7 +2946,8 @@ vect_match_slp_patterns_2 (slp_tree *ref_node, vec_info *vinfo,
 static bool
 vect_match_slp_patterns (slp_instance instance, vec_info *vinfo,
 			 hash_set<slp_tree> *visited,
-			 slp_tree_to_load_perm_map_t *perm_cache)
+			 slp_tree_to_load_perm_map_t *perm_cache,
+			 slp_compat_nodes_map_t *compat_cache)
 {
   DUMP_VECT_SCOPE ("vect_match_slp_patterns");
   slp_tree *ref_node = &SLP_INSTANCE_TREE (instance);
@@ -2953,7 +2957,8 @@ vect_match_slp_patterns (slp_instance instance, vec_info *vinfo,
 		     "Analyzing SLP tree %p for patterns\n",
 		     SLP_INSTANCE_TREE (instance));
 
-  return vect_match_slp_patterns_2 (ref_node, vinfo, perm_cache, visited);
+  return vect_match_slp_patterns_2 (ref_node, vinfo, perm_cache, compat_cache,
+				    visited);
 }
 
 /* STMT_INFO is a store group of size GROUP_SIZE that we are considering
@@ -3437,12 +3442,14 @@ vect_analyze_slp (vec_info *vinfo, unsigned max_tree_size)
 
   hash_set<slp_tree> visited_patterns;
   slp_tree_to_load_perm_map_t perm_cache;
+  slp_compat_nodes_map_t compat_cache;
 
   /* See if any patterns can be found in the SLP tree.  */
   bool pattern_found = false;
   FOR_EACH_VEC_ELT (LOOP_VINFO_SLP_INSTANCES (vinfo), i, instance)
     pattern_found |= vect_match_slp_patterns (instance, vinfo,
-					      &visited_patterns, &perm_cache);
+					      &visited_patterns, &perm_cache,
+					      &compat_cache);
 
   /* If any were found optimize permutations of loads.  */
   if (pattern_found)
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 2f6e1e268fb07e9de065ff9c45af87546e565d66..83cd0919c7838c65576e1debd881e0ec636a605a 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -2268,6 +2268,7 @@ extern void duplicate_and_interleave (vec_info *, gimple_seq *, tree,
 extern int vect_get_place_in_interleaving_chain (stmt_vec_info, stmt_vec_info);
 extern slp_tree vect_create_new_slp_node (unsigned, tree_code);
 extern void vect_free_slp_tree (slp_tree);
+extern bool compatible_calls_p (gcall *, gcall *);
 
 /* In tree-vect-patterns.c.  */
 extern void
@@ -2306,6 +2307,12 @@ typedef enum _complex_perm_kinds {
 typedef hash_map <slp_tree, complex_perm_kinds_t>
   slp_tree_to_load_perm_map_t;
 
+/* Cache from nodes pair to being compatible or not.  */
+typedef pair_hash <nofree_ptr_hash <_slp_tree>,
+		   nofree_ptr_hash <_slp_tree>> slp_node_hash;
+typedef hash_map <slp_node_hash, bool> slp_compat_nodes_map_t;
+
+
 /* Vector pattern matcher base class.  All SLP pattern matchers must inherit
    from this type.  */
 
@@ -2338,7 +2345,8 @@ class vect_pattern
   public:
 
     /* Create a new instance of the pattern matcher class of the given type.  */
-    static vect_pattern* recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
+    static vect_pattern* recognize (slp_tree_to_load_perm_map_t *,
+				    slp_compat_nodes_map_t *, slp_tree *);
 
     /* Build the pattern from the data collected so far.  */
     virtual void build (vec_info *) = 0;
@@ -2352,6 +2360,7 @@ class vect_pattern
 
 /* Function pointer to create a new pattern matcher from a generic type.  */
 typedef vect_pattern* (*vect_pattern_decl_t) (slp_tree_to_load_perm_map_t *,
+					      slp_compat_nodes_map_t *,
 					      slp_tree *);
 
 /* List of supported pattern matchers.  */


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [2/3 PATCH]AArch64 use canonical ordering for complex mul, fma and fms
  2021-12-17 15:42 [1/3 PATCH]middle-end vect: Simplify and extend the complex numbers validation routines Tamar Christina
@ 2021-12-17 15:42 ` Tamar Christina
  2021-12-17 16:24   ` Richard Sandiford
  2021-12-17 15:43 ` [3/3 PATCH][AArch32] " Tamar Christina
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: Tamar Christina @ 2021-12-17 15:42 UTC (permalink / raw)
  To: gcc-patches
  Cc: nd, Richard.Earnshaw, Marcus.Shawcroft, Kyrylo.Tkachov,
	richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 3591 bytes --]

Hi All,

After the first patch in the series this updates the optabs to expect the
canonical sequence.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master? and backport along with the first patch?

Thanks,
Tamar

gcc/ChangeLog:

	PR tree-optimization/102819
	PR tree-optimization/103169
	* config/aarch64/aarch64-simd.md (cml<fcmac1><conj_op><mode>4,
	cmul<conj_op><mode>3): Use canonical order.
	* config/aarch64/aarch64-sve.md (cml<fcmac1><conj_op><mode>4,
	cmul<conj_op><mode>3): Likewise.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index f95a7e1d91c97c9e981d75e71f0b49c02ef748ba..875896ee71324712c8034eeff9cfb5649f9b0e73 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -556,17 +556,17 @@ (define_insn "aarch64_fcmlaq_lane<rot><mode>"
 ;; remainder.  Because of this, expand early.
 (define_expand "cml<fcmac1><conj_op><mode>4"
   [(set (match_operand:VHSDF 0 "register_operand")
-	(plus:VHSDF (match_operand:VHSDF 1 "register_operand")
-		    (unspec:VHSDF [(match_operand:VHSDF 2 "register_operand")
-				   (match_operand:VHSDF 3 "register_operand")]
-				   FCMLA_OP)))]
+	(plus:VHSDF (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand")
+				   (match_operand:VHSDF 2 "register_operand")]
+				   FCMLA_OP)
+		    (match_operand:VHSDF 3 "register_operand")))]
   "TARGET_COMPLEX && !BYTES_BIG_ENDIAN"
 {
   rtx tmp = gen_reg_rtx (<MODE>mode);
-  emit_insn (gen_aarch64_fcmla<rotsplit1><mode> (tmp, operands[1],
-						 operands[3], operands[2]));
+  emit_insn (gen_aarch64_fcmla<rotsplit1><mode> (tmp, operands[3],
+						 operands[1], operands[2]));
   emit_insn (gen_aarch64_fcmla<rotsplit2><mode> (operands[0], tmp,
-						 operands[3], operands[2]));
+						 operands[1], operands[2]));
   DONE;
 })
 
@@ -583,9 +583,9 @@ (define_expand "cmul<conj_op><mode>3"
   rtx tmp = force_reg (<MODE>mode, CONST0_RTX (<MODE>mode));
   rtx res1 = gen_reg_rtx (<MODE>mode);
   emit_insn (gen_aarch64_fcmla<rotsplit1><mode> (res1, tmp,
-						 operands[2], operands[1]));
+						 operands[1], operands[2]));
   emit_insn (gen_aarch64_fcmla<rotsplit2><mode> (operands[0], res1,
-						 operands[2], operands[1]));
+						 operands[1], operands[2]));
   DONE;
 })
 
diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md
index 9ef968840c20a3049901b3f8a919cf27ded1da3e..96a57442c7eb5f1080c8014a2f0311b2350de852 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -7278,11 +7278,11 @@ (define_expand "cml<fcmac1><conj_op><mode>4"
   rtx tmp = gen_reg_rtx (<MODE>mode);
   emit_insn
     (gen_aarch64_pred_fcmla<sve_rot1><mode> (tmp, operands[4],
-					     operands[3], operands[2],
-					     operands[1], operands[5]));
+					     operands[1], operands[2],
+					     operands[3], operands[5]));
   emit_insn
     (gen_aarch64_pred_fcmla<sve_rot2><mode> (operands[0], operands[4],
-					     operands[3], operands[2],
+					     operands[1], operands[2],
 					     tmp, operands[5]));
   DONE;
 })
@@ -7305,11 +7305,11 @@ (define_expand "cmul<conj_op><mode>3"
   rtx tmp = gen_reg_rtx (<MODE>mode);
   emit_insn
     (gen_aarch64_pred_fcmla<sve_rot1><mode> (tmp, pred_reg,
-					     operands[2], operands[1],
+					     operands[1], operands[2],
 					     accum, gp_mode));
   emit_insn
     (gen_aarch64_pred_fcmla<sve_rot2><mode> (operands[0], pred_reg,
-					     operands[2], operands[1],
+					     operands[1], operands[2],
 					     tmp, gp_mode));
   DONE;
 })


-- 

[-- Attachment #2: rb15164.patch --]
[-- Type: text/x-diff, Size: 3035 bytes --]

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index f95a7e1d91c97c9e981d75e71f0b49c02ef748ba..875896ee71324712c8034eeff9cfb5649f9b0e73 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -556,17 +556,17 @@ (define_insn "aarch64_fcmlaq_lane<rot><mode>"
 ;; remainder.  Because of this, expand early.
 (define_expand "cml<fcmac1><conj_op><mode>4"
   [(set (match_operand:VHSDF 0 "register_operand")
-	(plus:VHSDF (match_operand:VHSDF 1 "register_operand")
-		    (unspec:VHSDF [(match_operand:VHSDF 2 "register_operand")
-				   (match_operand:VHSDF 3 "register_operand")]
-				   FCMLA_OP)))]
+	(plus:VHSDF (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand")
+				   (match_operand:VHSDF 2 "register_operand")]
+				   FCMLA_OP)
+		    (match_operand:VHSDF 3 "register_operand")))]
   "TARGET_COMPLEX && !BYTES_BIG_ENDIAN"
 {
   rtx tmp = gen_reg_rtx (<MODE>mode);
-  emit_insn (gen_aarch64_fcmla<rotsplit1><mode> (tmp, operands[1],
-						 operands[3], operands[2]));
+  emit_insn (gen_aarch64_fcmla<rotsplit1><mode> (tmp, operands[3],
+						 operands[1], operands[2]));
   emit_insn (gen_aarch64_fcmla<rotsplit2><mode> (operands[0], tmp,
-						 operands[3], operands[2]));
+						 operands[1], operands[2]));
   DONE;
 })
 
@@ -583,9 +583,9 @@ (define_expand "cmul<conj_op><mode>3"
   rtx tmp = force_reg (<MODE>mode, CONST0_RTX (<MODE>mode));
   rtx res1 = gen_reg_rtx (<MODE>mode);
   emit_insn (gen_aarch64_fcmla<rotsplit1><mode> (res1, tmp,
-						 operands[2], operands[1]));
+						 operands[1], operands[2]));
   emit_insn (gen_aarch64_fcmla<rotsplit2><mode> (operands[0], res1,
-						 operands[2], operands[1]));
+						 operands[1], operands[2]));
   DONE;
 })
 
diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md
index 9ef968840c20a3049901b3f8a919cf27ded1da3e..96a57442c7eb5f1080c8014a2f0311b2350de852 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -7278,11 +7278,11 @@ (define_expand "cml<fcmac1><conj_op><mode>4"
   rtx tmp = gen_reg_rtx (<MODE>mode);
   emit_insn
     (gen_aarch64_pred_fcmla<sve_rot1><mode> (tmp, operands[4],
-					     operands[3], operands[2],
-					     operands[1], operands[5]));
+					     operands[1], operands[2],
+					     operands[3], operands[5]));
   emit_insn
     (gen_aarch64_pred_fcmla<sve_rot2><mode> (operands[0], operands[4],
-					     operands[3], operands[2],
+					     operands[1], operands[2],
 					     tmp, operands[5]));
   DONE;
 })
@@ -7305,11 +7305,11 @@ (define_expand "cmul<conj_op><mode>3"
   rtx tmp = gen_reg_rtx (<MODE>mode);
   emit_insn
     (gen_aarch64_pred_fcmla<sve_rot1><mode> (tmp, pred_reg,
-					     operands[2], operands[1],
+					     operands[1], operands[2],
 					     accum, gp_mode));
   emit_insn
     (gen_aarch64_pred_fcmla<sve_rot2><mode> (operands[0], pred_reg,
-					     operands[2], operands[1],
+					     operands[1], operands[2],
 					     tmp, gp_mode));
   DONE;
 })


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [3/3 PATCH][AArch32] use canonical ordering for complex mul, fma and fms
  2021-12-17 15:42 [1/3 PATCH]middle-end vect: Simplify and extend the complex numbers validation routines Tamar Christina
  2021-12-17 15:42 ` [2/3 PATCH]AArch64 use canonical ordering for complex mul, fma and fms Tamar Christina
@ 2021-12-17 15:43 ` Tamar Christina
  2021-12-20 16:22   ` Tamar Christina
  2021-12-17 16:18 ` [1/3 PATCH]middle-end vect: Simplify and extend the complex numbers validation routines Richard Sandiford
  2022-01-10 13:00 ` Richard Biener
  3 siblings, 1 reply; 18+ messages in thread
From: Tamar Christina @ 2021-12-17 15:43 UTC (permalink / raw)
  To: gcc-patches
  Cc: nd, Ramana.Radhakrishnan, Richard.Earnshaw, nickc, Kyrylo.Tkachov

[-- Attachment #1: Type: text/plain, Size: 3111 bytes --]

Hi All,

After the first patch in the series this updates the optabs to expect the
canonical sequence.

Bootstrapped Regtested on arm-none-linux-gnueabihf and no issues.

Ok for master? and backport along with the first patch?

Thanks,
Tamar

gcc/ChangeLog:

	PR tree-optimization/102819
	PR tree-optimization/103169
	* config/arm/neon.md (cmul<conj_op><mode>3): Use canon order.
	* config/arm/vec-common.md (cmul<conj_op><mode>3,
	cml<fcmac1><conj_op><mode>4): Likewise.

--- inline copy of patch -- 
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 8b0a396947cc8e7345f178b926128d7224fb218a..2b6ae67a7ec6bef505c2eaef0ec495d14c656495 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -2859,9 +2859,9 @@ (define_expand "cmul<conj_op><mode>3"
   rtx res1 = gen_reg_rtx (<MODE>mode);
   rtx tmp = force_reg (<MODE>mode, CONST0_RTX (<MODE>mode));
   emit_insn (gen_neon_vcmla<rotsplit1><mode> (res1, tmp,
-					      operands[2], operands[1]));
+					      operands[1], operands[2]));
   emit_insn (gen_neon_vcmla<rotsplit2><mode> (operands[0], res1,
-					      operands[2], operands[1]));
+					      operands[1], operands[2]));
   DONE;
 })
 
diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index e71d9b3811fde62159f5c21944fef9fe3f97b4bd..0940e987de53e191f4abdd248c654aed69f016f7 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -239,14 +239,14 @@ (define_expand "cmul<conj_op><mode>3"
     {
       rtx tmp = force_reg (<MODE>mode, CONST0_RTX (<MODE>mode));
       emit_insn (gen_arm_vcmla<rotsplit1><mode> (res1, tmp,
-						 operands[2], operands[1]));
+						 operands[1], operands[2]));
     }
   else
     emit_insn (gen_arm_vcmla<rotsplit1><mode> (res1, CONST0_RTX (<MODE>mode),
-					       operands[2], operands[1]));
+					       operands[1], operands[2]));
 
   emit_insn (gen_arm_vcmla<rotsplit2><mode> (operands[0], res1,
-					     operands[2], operands[1]));
+					     operands[1], operands[2]));
   DONE;
 })
 
@@ -265,18 +265,18 @@ (define_expand "arm_vcmla<rot><mode>"
 ;; remainder.  Because of this, expand early.
 (define_expand "cml<fcmac1><conj_op><mode>4"
   [(set (match_operand:VF 0 "register_operand")
-	(plus:VF (match_operand:VF 1 "register_operand")
-		 (unspec:VF [(match_operand:VF 2 "register_operand")
-			     (match_operand:VF 3 "register_operand")]
-			    VCMLA_OP)))]
+	(plus:VF (unspec:VF [(match_operand:VF 1 "register_operand")
+			     (match_operand:VF 2 "register_operand")]
+			    VCMLA_OP)
+		 (match_operand:VF 3 "register_operand")))]
   "(TARGET_COMPLEX || (TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT
 		      && ARM_HAVE_<MODE>_ARITH)) && !BYTES_BIG_ENDIAN"
 {
   rtx tmp = gen_reg_rtx (<MODE>mode);
-  emit_insn (gen_arm_vcmla<rotsplit1><mode> (tmp, operands[1],
-					     operands[3], operands[2]));
+  emit_insn (gen_arm_vcmla<rotsplit1><mode> (tmp, operands[3],
+					     operands[1], operands[2]));
   emit_insn (gen_arm_vcmla<rotsplit2><mode> (operands[0], tmp,
-					     operands[3], operands[2]));
+					     operands[1], operands[2]));
   DONE;
 })
 


-- 

[-- Attachment #2: rb15165.patch --]
[-- Type: text/x-diff, Size: 2604 bytes --]

diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 8b0a396947cc8e7345f178b926128d7224fb218a..2b6ae67a7ec6bef505c2eaef0ec495d14c656495 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -2859,9 +2859,9 @@ (define_expand "cmul<conj_op><mode>3"
   rtx res1 = gen_reg_rtx (<MODE>mode);
   rtx tmp = force_reg (<MODE>mode, CONST0_RTX (<MODE>mode));
   emit_insn (gen_neon_vcmla<rotsplit1><mode> (res1, tmp,
-					      operands[2], operands[1]));
+					      operands[1], operands[2]));
   emit_insn (gen_neon_vcmla<rotsplit2><mode> (operands[0], res1,
-					      operands[2], operands[1]));
+					      operands[1], operands[2]));
   DONE;
 })
 
diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index e71d9b3811fde62159f5c21944fef9fe3f97b4bd..0940e987de53e191f4abdd248c654aed69f016f7 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -239,14 +239,14 @@ (define_expand "cmul<conj_op><mode>3"
     {
       rtx tmp = force_reg (<MODE>mode, CONST0_RTX (<MODE>mode));
       emit_insn (gen_arm_vcmla<rotsplit1><mode> (res1, tmp,
-						 operands[2], operands[1]));
+						 operands[1], operands[2]));
     }
   else
     emit_insn (gen_arm_vcmla<rotsplit1><mode> (res1, CONST0_RTX (<MODE>mode),
-					       operands[2], operands[1]));
+					       operands[1], operands[2]));
 
   emit_insn (gen_arm_vcmla<rotsplit2><mode> (operands[0], res1,
-					     operands[2], operands[1]));
+					     operands[1], operands[2]));
   DONE;
 })
 
@@ -265,18 +265,18 @@ (define_expand "arm_vcmla<rot><mode>"
 ;; remainder.  Because of this, expand early.
 (define_expand "cml<fcmac1><conj_op><mode>4"
   [(set (match_operand:VF 0 "register_operand")
-	(plus:VF (match_operand:VF 1 "register_operand")
-		 (unspec:VF [(match_operand:VF 2 "register_operand")
-			     (match_operand:VF 3 "register_operand")]
-			    VCMLA_OP)))]
+	(plus:VF (unspec:VF [(match_operand:VF 1 "register_operand")
+			     (match_operand:VF 2 "register_operand")]
+			    VCMLA_OP)
+		 (match_operand:VF 3 "register_operand")))]
   "(TARGET_COMPLEX || (TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT
 		      && ARM_HAVE_<MODE>_ARITH)) && !BYTES_BIG_ENDIAN"
 {
   rtx tmp = gen_reg_rtx (<MODE>mode);
-  emit_insn (gen_arm_vcmla<rotsplit1><mode> (tmp, operands[1],
-					     operands[3], operands[2]));
+  emit_insn (gen_arm_vcmla<rotsplit1><mode> (tmp, operands[3],
+					     operands[1], operands[2]));
   emit_insn (gen_arm_vcmla<rotsplit2><mode> (operands[0], tmp,
-					     operands[3], operands[2]));
+					     operands[1], operands[2]));
   DONE;
 })
 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [1/3 PATCH]middle-end vect: Simplify and extend the complex numbers validation routines.
  2021-12-17 15:42 [1/3 PATCH]middle-end vect: Simplify and extend the complex numbers validation routines Tamar Christina
  2021-12-17 15:42 ` [2/3 PATCH]AArch64 use canonical ordering for complex mul, fma and fms Tamar Christina
  2021-12-17 15:43 ` [3/3 PATCH][AArch32] " Tamar Christina
@ 2021-12-17 16:18 ` Richard Sandiford
  2021-12-20 16:18   ` Tamar Christina
  2022-01-10 13:00 ` Richard Biener
  3 siblings, 1 reply; 18+ messages in thread
From: Richard Sandiford @ 2021-12-17 16:18 UTC (permalink / raw)
  To: Tamar Christina via Gcc-patches; +Cc: Tamar Christina, nd, rguenther

Just a comment on the documentation:

Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 9ec051e94e10cca9eec2773e1b8c01b74b6ea4db..60dc5b3ea6087c2824ad1467bc66e9cfebe9dcfc 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -6325,12 +6325,12 @@ Perform a vector multiply and accumulate that is semantically the same as
>  a multiply and accumulate of complex numbers.
>  
>  @smallexample
> -  complex TYPE c[N];
> -  complex TYPE a[N];
> -  complex TYPE b[N];
> +  complex TYPE op0[N];
> +  complex TYPE op1[N];
> +  complex TYPE op2[N];
>    for (int i = 0; i < N; i += 1)
>      @{
> -      c[i] += a[i] * b[i];
> +      op2[i] += op1[i] * op2[i];
>      @}

I think this should be:

  op0[i] = op1[i] * op2[i] + op3[i];

since operand 0 is the output and operand 3 is the accumulator input.

Same idea for the others.  For:

> @@ -6415,12 +6415,12 @@ Perform a vector multiply that is semantically the same as multiply of
>  complex numbers.
>  
>  @smallexample
> -  complex TYPE c[N];
> -  complex TYPE a[N];
> -  complex TYPE b[N];
> +  complex TYPE op0[N];
> +  complex TYPE op1[N];
> +  complex TYPE op2[N];
>    for (int i = 0; i < N; i += 1)
>      @{
> -      c[i] = a[i] * b[i];
> +      op2[i] = op0[i] * op1[i];

…this I think it should be:

  op0[i] = op1[i] * op2[i];

Thanks,
Richard

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [2/3 PATCH]AArch64 use canonical ordering for complex mul, fma and fms
  2021-12-17 15:42 ` [2/3 PATCH]AArch64 use canonical ordering for complex mul, fma and fms Tamar Christina
@ 2021-12-17 16:24   ` Richard Sandiford
  2021-12-17 16:48     ` Richard Sandiford
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Sandiford @ 2021-12-17 16:24 UTC (permalink / raw)
  To: Tamar Christina
  Cc: gcc-patches, nd, Richard.Earnshaw, Marcus.Shawcroft, Kyrylo.Tkachov

Tamar Christina <tamar.christina@arm.com> writes:
> Hi All,
>
> After the first patch in the series this updates the optabs to expect the
> canonical sequence.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master? and backport along with the first patch?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> 	PR tree-optimization/102819
> 	PR tree-optimization/103169
> 	* config/aarch64/aarch64-simd.md (cml<fcmac1><conj_op><mode>4,
> 	cmul<conj_op><mode>3): Use canonical order.
> 	* config/aarch64/aarch64-sve.md (cml<fcmac1><conj_op><mode>4,
> 	cmul<conj_op><mode>3): Likewise.
>
> --- inline copy of patch -- 
> diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
> index f95a7e1d91c97c9e981d75e71f0b49c02ef748ba..875896ee71324712c8034eeff9cfb5649f9b0e73 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -556,17 +556,17 @@ (define_insn "aarch64_fcmlaq_lane<rot><mode>"
>  ;; remainder.  Because of this, expand early.
>  (define_expand "cml<fcmac1><conj_op><mode>4"
>    [(set (match_operand:VHSDF 0 "register_operand")
> -	(plus:VHSDF (match_operand:VHSDF 1 "register_operand")
> -		    (unspec:VHSDF [(match_operand:VHSDF 2 "register_operand")
> -				   (match_operand:VHSDF 3 "register_operand")]
> -				   FCMLA_OP)))]
> +	(plus:VHSDF (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand")
> +				   (match_operand:VHSDF 2 "register_operand")]
> +				   FCMLA_OP)
> +		    (match_operand:VHSDF 3 "register_operand")))]
>    "TARGET_COMPLEX && !BYTES_BIG_ENDIAN"
>  {
>    rtx tmp = gen_reg_rtx (<MODE>mode);
> -  emit_insn (gen_aarch64_fcmla<rotsplit1><mode> (tmp, operands[1],
> -						 operands[3], operands[2]));
> +  emit_insn (gen_aarch64_fcmla<rotsplit1><mode> (tmp, operands[3],
> +						 operands[1], operands[2]));
>    emit_insn (gen_aarch64_fcmla<rotsplit2><mode> (operands[0], tmp,
> -						 operands[3], operands[2]));
> +						 operands[1], operands[2]));
>    DONE;
>  })
>  
> @@ -583,9 +583,9 @@ (define_expand "cmul<conj_op><mode>3"
>    rtx tmp = force_reg (<MODE>mode, CONST0_RTX (<MODE>mode));
>    rtx res1 = gen_reg_rtx (<MODE>mode);
>    emit_insn (gen_aarch64_fcmla<rotsplit1><mode> (res1, tmp,
> -						 operands[2], operands[1]));
> +						 operands[1], operands[2]));
>    emit_insn (gen_aarch64_fcmla<rotsplit2><mode> (operands[0], res1,
> -						 operands[2], operands[1]));
> +						 operands[1], operands[2]));

This doesn't look right.  Going from the documentation, patch 1 isn't
changing the operand order for CMUL: the conjugated operand (if there
is one) is still operand 2.  The FCMLA sequences use the opposite order,
where the conjugated operand (if there is one) is operand 1.  So I think
the reversal here is still needed.

Same for the multiplication operands in CML* above.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [2/3 PATCH]AArch64 use canonical ordering for complex mul, fma and fms
  2021-12-17 16:24   ` Richard Sandiford
@ 2021-12-17 16:48     ` Richard Sandiford
  2021-12-20 16:20       ` Tamar Christina
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Sandiford @ 2021-12-17 16:48 UTC (permalink / raw)
  To: Tamar Christina
  Cc: gcc-patches, nd, Richard.Earnshaw, Marcus.Shawcroft, Kyrylo.Tkachov

Richard Sandiford <richard.sandiford@arm.com> writes:
> Tamar Christina <tamar.christina@arm.com> writes:
>> Hi All,
>>
>> After the first patch in the series this updates the optabs to expect the
>> canonical sequence.
>>
>> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>>
>> Ok for master? and backport along with the first patch?
>>
>> Thanks,
>> Tamar
>>
>> gcc/ChangeLog:
>>
>> 	PR tree-optimization/102819
>> 	PR tree-optimization/103169
>> 	* config/aarch64/aarch64-simd.md (cml<fcmac1><conj_op><mode>4,
>> 	cmul<conj_op><mode>3): Use canonical order.
>> 	* config/aarch64/aarch64-sve.md (cml<fcmac1><conj_op><mode>4,
>> 	cmul<conj_op><mode>3): Likewise.
>>
>> --- inline copy of patch -- 
>> diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
>> index f95a7e1d91c97c9e981d75e71f0b49c02ef748ba..875896ee71324712c8034eeff9cfb5649f9b0e73 100644
>> --- a/gcc/config/aarch64/aarch64-simd.md
>> +++ b/gcc/config/aarch64/aarch64-simd.md
>> @@ -556,17 +556,17 @@ (define_insn "aarch64_fcmlaq_lane<rot><mode>"
>>  ;; remainder.  Because of this, expand early.
>>  (define_expand "cml<fcmac1><conj_op><mode>4"
>>    [(set (match_operand:VHSDF 0 "register_operand")
>> -	(plus:VHSDF (match_operand:VHSDF 1 "register_operand")
>> -		    (unspec:VHSDF [(match_operand:VHSDF 2 "register_operand")
>> -				   (match_operand:VHSDF 3 "register_operand")]
>> -				   FCMLA_OP)))]
>> +	(plus:VHSDF (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand")
>> +				   (match_operand:VHSDF 2 "register_operand")]
>> +				   FCMLA_OP)
>> +		    (match_operand:VHSDF 3 "register_operand")))]
>>    "TARGET_COMPLEX && !BYTES_BIG_ENDIAN"
>>  {
>>    rtx tmp = gen_reg_rtx (<MODE>mode);
>> -  emit_insn (gen_aarch64_fcmla<rotsplit1><mode> (tmp, operands[1],
>> -						 operands[3], operands[2]));
>> +  emit_insn (gen_aarch64_fcmla<rotsplit1><mode> (tmp, operands[3],
>> +						 operands[1], operands[2]));
>>    emit_insn (gen_aarch64_fcmla<rotsplit2><mode> (operands[0], tmp,
>> -						 operands[3], operands[2]));
>> +						 operands[1], operands[2]));
>>    DONE;
>>  })
>>  
>> @@ -583,9 +583,9 @@ (define_expand "cmul<conj_op><mode>3"
>>    rtx tmp = force_reg (<MODE>mode, CONST0_RTX (<MODE>mode));
>>    rtx res1 = gen_reg_rtx (<MODE>mode);
>>    emit_insn (gen_aarch64_fcmla<rotsplit1><mode> (res1, tmp,
>> -						 operands[2], operands[1]));
>> +						 operands[1], operands[2]));
>>    emit_insn (gen_aarch64_fcmla<rotsplit2><mode> (operands[0], res1,
>> -						 operands[2], operands[1]));
>> +						 operands[1], operands[2]));
>
> This doesn't look right.  Going from the documentation, patch 1 isn't
> changing the operand order for CMUL: the conjugated operand (if there
> is one) is still operand 2.  The FCMLA sequences use the opposite order,
> where the conjugated operand (if there is one) is operand 1.  So I think

I meant “the first multiplication operand” rather than “operand 1” here.

> the reversal here is still needed.
>
> Same for the multiplication operands in CML* above.
>
> Thanks,
> Richard

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [1/3 PATCH]middle-end vect: Simplify and extend the complex numbers validation routines.
  2021-12-17 16:18 ` [1/3 PATCH]middle-end vect: Simplify and extend the complex numbers validation routines Richard Sandiford
@ 2021-12-20 16:18   ` Tamar Christina
  2022-01-10 10:16     ` Tamar Christina
  0 siblings, 1 reply; 18+ messages in thread
From: Tamar Christina @ 2021-12-20 16:18 UTC (permalink / raw)
  To: Richard Sandiford, Tamar Christina via Gcc-patches; +Cc: nd, rguenther

[-- Attachment #1: Type: text/plain, Size: 46136 bytes --]



> -----Original Message-----
> From: Richard Sandiford <richard.sandiford@arm.com>
> Sent: Friday, December 17, 2021 4:19 PM
> To: Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org>
> Cc: Tamar Christina <Tamar.Christina@arm.com>; nd <nd@arm.com>;
> rguenther@suse.de
> Subject: Re: [1/3 PATCH]middle-end vect: Simplify and extend the complex
> numbers validation routines.
> 
> Just a comment on the documentation:
> 
> Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index
> >
> 9ec051e94e10cca9eec2773e1b8c01b74b6ea4db..60dc5b3ea6087c2824ad1467
> bc66
> > e9cfebe9dcfc 100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -6325,12 +6325,12 @@ Perform a vector multiply and accumulate that
> > is semantically the same as  a multiply and accumulate of complex numbers.
> >
> >  @smallexample
> > -  complex TYPE c[N];
> > -  complex TYPE a[N];
> > -  complex TYPE b[N];
> > +  complex TYPE op0[N];
> > +  complex TYPE op1[N];
> > +  complex TYPE op2[N];
> >    for (int i = 0; i < N; i += 1)
> >      @{
> > -      c[i] += a[i] * b[i];
> > +      op2[i] += op1[i] * op2[i];
> >      @}
> 
> I think this should be:
> 
>   op0[i] = op1[i] * op2[i] + op3[i];
> 
> since operand 0 is the output and operand 3 is the accumulator input.
> 
> Same idea for the others.  For:
> 
> > @@ -6415,12 +6415,12 @@ Perform a vector multiply that is semantically
> > the same as multiply of  complex numbers.
> >
> >  @smallexample
> > -  complex TYPE c[N];
> > -  complex TYPE a[N];
> > -  complex TYPE b[N];
> > +  complex TYPE op0[N];
> > +  complex TYPE op1[N];
> > +  complex TYPE op2[N];
> >    for (int i = 0; i < N; i += 1)
> >      @{
> > -      c[i] = a[i] * b[i];
> > +      op2[i] = op0[i] * op1[i];
> 
> …this I think it should be:
> 
>   op0[i] = op1[i] * op2[i];

Updated patch attached.

Bootstrapped Regtested on aarch64-none-linux-gnu,
x86_64-pc-linux-gnu and no regressions.

Ok for master? and backport to GCC 11 after some stew?

Thanks,
Tamar

gcc/ChangeLog:

	PR tree-optimization/102819
	PR tree-optimization/103169
	* doc/md.texi: Update docs for cfms, cfma.
	* tree-data-ref.h (same_data_refs): Accept optional offset.
	* tree-vect-slp-patterns.c (is_linear_load_p): Fix issue with repeating
	patterns.
	(vect_normalize_conj_loc): Remove.
	(is_eq_or_top): Change to take two nodes.
	(enum _conj_status, compatible_complex_nodes_p,
	vect_validate_multiplication): New.
	(class complex_add_pattern, complex_add_pattern::matches,
	complex_add_pattern::recognize, class complex_mul_pattern,
	complex_mul_pattern::recognize, class complex_fms_pattern,
	complex_fms_pattern::recognize, class complex_operations_pattern,
	complex_operations_pattern::recognize, addsub_pattern::recognize): Pass
	new cache.
	(complex_fms_pattern::matches, complex_mul_pattern::matches): Pass new
	cache and use new validation code.
	* tree-vect-slp.c (vect_match_slp_patterns_2, vect_match_slp_patterns,
	vect_analyze_slp): Pass along cache.
	(compatible_calls_p): Expose.
	* tree-vectorizer.h (compatible_calls_p, slp_node_hash,
	slp_compat_nodes_map_t): New.
	(class vect_pattern): Update signatures include new cache.

gcc/testsuite/ChangeLog:

	PR tree-optimization/102819
	PR tree-optimization/103169
	* g++.dg/vect/pr99149.cc: xfail for now.
	* gcc.dg/vect/complex/pr102819-1.c: New test.
	* gcc.dg/vect/complex/pr102819-2.c: New test.
	* gcc.dg/vect/complex/pr102819-3.c: New test.
	* gcc.dg/vect/complex/pr102819-4.c: New test.
	* gcc.dg/vect/complex/pr102819-5.c: New test.
	* gcc.dg/vect/complex/pr102819-6.c: New test.
	* gcc.dg/vect/complex/pr102819-7.c: New test.
	* gcc.dg/vect/complex/pr102819-8.c: New test.
	* gcc.dg/vect/complex/pr102819-9.c: New test.
	* gcc.dg/vect/complex/pr103169.c: New test.

--- inline copy of patch ---

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 9ec051e94e10cca9eec2773e1b8c01b74b6ea4db..ad06b02d36876082afe4c3f3fb51887f7a522b23 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -6325,12 +6325,13 @@ Perform a vector multiply and accumulate that is semantically the same as
 a multiply and accumulate of complex numbers.
 
 @smallexample
-  complex TYPE c[N];
-  complex TYPE a[N];
-  complex TYPE b[N];
+  complex TYPE op0[N];
+  complex TYPE op1[N];
+  complex TYPE op2[N];
+  complex TYPE op3[N];
   for (int i = 0; i < N; i += 1)
     @{
-      c[i] += a[i] * b[i];
+      op0[i] = op1[i] * op2[i] + op3[i];
     @}
 @end smallexample
 
@@ -6348,12 +6349,13 @@ the same as a multiply and accumulate of complex numbers where the second
 multiply arguments is conjugated.
 
 @smallexample
-  complex TYPE c[N];
-  complex TYPE a[N];
-  complex TYPE b[N];
+  complex TYPE op0[N];
+  complex TYPE op1[N];
+  complex TYPE op2[N];
+  complex TYPE op3[N];
   for (int i = 0; i < N; i += 1)
     @{
-      c[i] += a[i] * conj (b[i]);
+      op0[i] = op1[i] * conj (op2[i]) + op3[i];
     @}
 @end smallexample
 
@@ -6370,12 +6372,13 @@ Perform a vector multiply and subtract that is semantically the same as
 a multiply and subtract of complex numbers.
 
 @smallexample
-  complex TYPE c[N];
-  complex TYPE a[N];
-  complex TYPE b[N];
+  complex TYPE op0[N];
+  complex TYPE op1[N];
+  complex TYPE op2[N];
+  complex TYPE op3[N];
   for (int i = 0; i < N; i += 1)
     @{
-      c[i] -= a[i] * b[i];
+      op0[i] = op1[i] * op2[i] - op3[i];
     @}
 @end smallexample
 
@@ -6393,12 +6396,13 @@ the same as a multiply and subtract of complex numbers where the second
 multiply arguments is conjugated.
 
 @smallexample
-  complex TYPE c[N];
-  complex TYPE a[N];
-  complex TYPE b[N];
+  complex TYPE op0[N];
+  complex TYPE op1[N];
+  complex TYPE op2[N];
+  complex TYPE op3[N];
   for (int i = 0; i < N; i += 1)
     @{
-      c[i] -= a[i] * conj (b[i]);
+      op0[i] = op1[i] * conj (op2[i]) - op3[i];
     @}
 @end smallexample
 
@@ -6415,12 +6419,12 @@ Perform a vector multiply that is semantically the same as multiply of
 complex numbers.
 
 @smallexample
-  complex TYPE c[N];
-  complex TYPE a[N];
-  complex TYPE b[N];
+  complex TYPE op0[N];
+  complex TYPE op1[N];
+  complex TYPE op2[N];
   for (int i = 0; i < N; i += 1)
     @{
-      c[i] = a[i] * b[i];
+      op0[i] = op1[i] * op2[i];
     @}
 @end smallexample
 
@@ -6437,12 +6441,12 @@ Perform a vector multiply by conjugate that is semantically the same as a
 multiply of complex numbers where the second multiply arguments is conjugated.
 
 @smallexample
-  complex TYPE c[N];
-  complex TYPE a[N];
-  complex TYPE b[N];
+  complex TYPE op0[N];
+  complex TYPE op1[N];
+  complex TYPE op2[N];
   for (int i = 0; i < N; i += 1)
     @{
-      c[i] = a[i] * conj (b[i]);
+      op0[i] = op1[i] * conj (op2[i]);
     @}
 @end smallexample
 
diff --git a/gcc/testsuite/g++.dg/vect/pr99149.cc b/gcc/testsuite/g++.dg/vect/pr99149.cc
index e6e0594a336fa053ffba64a12e2de43a4e373f49..bb9f5fa89f12b184368bf5488d6e9432c2166463 100755
--- a/gcc/testsuite/g++.dg/vect/pr99149.cc
+++ b/gcc/testsuite/g++.dg/vect/pr99149.cc
@@ -24,4 +24,4 @@ public:
 } n;
 main() { n.j(); }
 
-/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_MUL" 1 "slp2" } } */
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_MUL" 1 "slp2" { xfail { vect_float } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-1.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..46b9a55f05279d732fa1418e02f779cf693ede07
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-1.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+
+float f[12][100];
+
+void bad1(float v1, float v2)
+{
+  for (int r = 0; r < 100; r += 4)
+    {
+      int i = r + 1;
+      f[0][r] = f[1][r] * (f[2][r] + v2) - f[1][i] * (f[2][i] + v1);
+      f[0][i] = f[1][r] * (f[2][i] + v1) + f[1][i] * (f[2][r] + v2);
+      f[0][r+2] = f[1][r+2] * (f[2][r+2] + v2) - f[1][i+2] * (f[2][i+2] + v1);
+      f[0][i+2] = f[1][r+2] * (f[2][i+2] + v1) + f[1][i+2] * (f[2][r+2] + v2);
+      //                  ^^^^^^^             ^^^^^^^
+    }
+}
+
+/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-2.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..ffe646efe57f7ad07541b0fb96601596f46dc5f8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-2.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+
+float f[12][100];
+
+void bad1(float v1, float v2)
+{
+  for (int r = 0; r < 100; r += 2)
+    {
+      int i = r + 1;
+      f[0][r] = f[1][r] * (f[2][r] + v1) - f[1][i] * (f[2][i] + v2);
+      f[0][i] = f[1][r] * (f[2][i] + v1) + f[1][i] * (f[2][r] + v2);
+    }
+}
+
+/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-3.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-3.c
new file mode 100644
index 0000000000000000000000000000000000000000..5f98aa204d8b11b0cb433f8965dbb72cf8940de1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-3.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+
+float f[12][100];
+
+void good1(float v1, float v2)
+{
+  for (int r = 0; r < 100; r += 2)
+    {
+      int i = r + 1;
+      f[0][r] = f[1][r] * (f[2][r] + v2) - f[1][i] * (f[2][i] + v1);
+      f[0][i] = f[1][r] * (f[2][i] + v1) + f[1][i] * (f[2][r] + v2);
+    }
+}
+
+/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-4.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-4.c
new file mode 100644
index 0000000000000000000000000000000000000000..882851789c5085e734000609114be480d3b08bd0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-4.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+
+float f[12][100];
+
+void good1()
+{
+  for (int r = 0; r < 100; r += 2)
+    {
+      int i = r + 1;
+      f[0][r] = f[1][r] * f[2][r] - f[1][i] * f[2][i];
+      f[0][i] = f[1][r] * f[2][i] + f[1][i] * f[2][r];
+    }
+}
+
+/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-5.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-5.c
new file mode 100644
index 0000000000000000000000000000000000000000..6a2d549d65f3f27d407fb0bd469473e6a5c333ae
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-5.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+
+float f[12][100];
+
+void good2()
+{
+  for (int r = 0; r < 100; r += 2)
+    {
+      int i = r + 1;
+      f[0][r] = f[1][r] * (f[2][r] + 1) - f[1][i] * (f[2][i] + 1);
+      f[0][i] = f[1][r] * (f[2][i] + 1) + f[1][i] * (f[2][r] + 1);
+    }
+}
+
+/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-6.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-6.c
new file mode 100644
index 0000000000000000000000000000000000000000..71e66dbe3b29eec1fffb8df9b216022fdc0af54e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-6.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+
+float f[12][100];
+
+void bad1()
+{
+  for (int r = 0; r < 100; r += 2)
+    {
+      int i = r + 1;
+      f[0][r] = f[1][r] * f[2][r] - f[1][i] * f[3][i];
+      f[0][i] = f[1][r] * f[2][i] + f[1][i] * f[3][r];
+      //                  ^^^^^^^             ^^^^^^^
+    }
+}
+
+/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-7.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-7.c
new file mode 100644
index 0000000000000000000000000000000000000000..536672f3c8bb474ad5fa4bb61b3a36b555acf3cf
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-7.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+
+float f[12][100];
+
+void bad2()
+{
+  for (int r = 0; r < 100; r += 2)
+    {
+      int i = r + 1;
+      f[0][r] = f[1][r] * (f[2][r] + 1) - f[1][i] * f[2][i];
+      f[0][i] = f[1][r] * (f[2][i] + 1) + f[1][i] * f[2][r];
+      //                          ^^^^
+    }
+}
+
+/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-8.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-8.c
new file mode 100644
index 0000000000000000000000000000000000000000..07b48148688b7d530e5891d023d558b58a485c23
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-8.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+
+float f[12][100];
+
+void bad3()
+{
+  for (int r = 0; r < 100; r += 2)
+    {
+      int i = r + 1;
+      f[0][r] = f[1][r] * f[2][r] - f[1][r] * f[2][i];
+      f[0][i] = f[1][r] * f[2][i] + f[1][i] * f[2][r];
+      //                            ^^^^^^^
+    }
+}
+
+/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-9.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-9.c
new file mode 100644
index 0000000000000000000000000000000000000000..7655852434b21b381fe7ee316e8caf3d485b8ee1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-9.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+
+#include <stdio.h>
+#include <complex.h>
+
+#define N 200
+#define TYPE float
+#define TYPE2 float
+
+void g (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
+{
+  for (int i=0; i < N; i++)
+    {
+      c[i] -=  a[i] * b[0];
+    }
+}
+
+/* The pattern overlaps with COMPLEX_ADD so we need to support consuming ADDs in COMPLEX_FMS.  */
+
+/* { dg-final { scan-tree-dump "Found COMPLEX_FMS" "vect" { xfail { vect_float } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr103169.c b/gcc/testsuite/gcc.dg/vect/complex/pr103169.c
new file mode 100644
index 0000000000000000000000000000000000000000..1bfabbd85a0eedfb4156a82574324126e9083fc5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/pr103169.c
@@ -0,0 +1,12 @@
+/* { dg-do compile { target { vect_double } } } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+/* { dg-additional-options "-O2 -fvect-cost-model=unlimited" } */
+
+_Complex double b_0, c_0;
+
+void
+mul270snd (void)
+{
+  c_0 = b_0 * 1.0iF * 1.0iF;
+}
+
diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h
index 74f579c9f3f23bac25d21546068c2ab43209aa2b..8ad5fa521279b20fa5e63eecf442d5dc5c16e7ee 100644
--- a/gcc/tree-data-ref.h
+++ b/gcc/tree-data-ref.h
@@ -600,10 +600,11 @@ same_data_refs_base_objects (data_reference_p a, data_reference_p b)
 }
 
 /* Return true when the data references A and B are accessing the same
-   memory object with the same access functions.  */
+   memory object with the same access functions.  Optionally skip the
+   last OFFSET dimensions in the data reference.  */
 
 static inline bool
-same_data_refs (data_reference_p a, data_reference_p b)
+same_data_refs (data_reference_p a, data_reference_p b, int offset = 0)
 {
   unsigned int i;
 
@@ -614,7 +615,7 @@ same_data_refs (data_reference_p a, data_reference_p b)
   if (!same_data_refs_base_objects (a, b))
     return false;
 
-  for (i = 0; i < DR_NUM_DIMENSIONS (a); i++)
+  for (i = offset; i < DR_NUM_DIMENSIONS (a); i++)
     if (!eq_evolutions_p (DR_ACCESS_FN (a, i), DR_ACCESS_FN (b, i)))
       return false;
 
diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
index 0350441fad9690cd5d04337171ca3470a064a571..020c29bba08c5bd80503a2dbc04292f8fd310b3c 100644
--- a/gcc/tree-vect-slp-patterns.c
+++ b/gcc/tree-vect-slp-patterns.c
@@ -149,12 +149,13 @@ is_linear_load_p (load_permutation_t loads)
   int valid_patterns = 4;
   FOR_EACH_VEC_ELT (loads, i, load)
     {
-      if (candidates[0] != PERM_UNKNOWN && load != 1)
+      unsigned adj_load = load % 2;
+      if (candidates[0] != PERM_UNKNOWN && adj_load != 1)
 	{
 	  candidates[0] = PERM_UNKNOWN;
 	  valid_patterns--;
 	}
-      if (candidates[1] != PERM_UNKNOWN && load != 0)
+      if (candidates[1] != PERM_UNKNOWN && adj_load != 0)
 	{
 	  candidates[1] = PERM_UNKNOWN;
 	  valid_patterns--;
@@ -596,11 +597,12 @@ class complex_add_pattern : public complex_pattern
   public:
     void build (vec_info *);
     static internal_fn
-    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, slp_tree *,
-	     vec<slp_tree> *);
+    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
+	     slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *);
 
     static vect_pattern*
-    recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
+    recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t *,
+	       slp_tree *);
 
     static vect_pattern*
     mkInstance (slp_tree *node, vec<slp_tree> *m_ops, internal_fn ifn)
@@ -647,6 +649,7 @@ complex_add_pattern::build (vec_info *vinfo)
 internal_fn
 complex_add_pattern::matches (complex_operation_t op,
 			      slp_tree_to_load_perm_map_t *perm_cache,
+			      slp_compat_nodes_map_t * /* compat_cache */,
 			      slp_tree *node, vec<slp_tree> *ops)
 {
   internal_fn ifn = IFN_LAST;
@@ -692,13 +695,14 @@ complex_add_pattern::matches (complex_operation_t op,
 
 vect_pattern*
 complex_add_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache,
+				slp_compat_nodes_map_t *compat_cache,
 				slp_tree *node)
 {
   auto_vec<slp_tree> ops;
   complex_operation_t op
     = vect_detect_pair_op (*node, true, &ops);
   internal_fn ifn
-    = complex_add_pattern::matches (op, perm_cache, node, &ops);
+    = complex_add_pattern::matches (op, perm_cache, compat_cache, node, &ops);
   if (ifn == IFN_LAST)
     return NULL;
 
@@ -709,147 +713,214 @@ complex_add_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache,
  * complex_mul_pattern
  ******************************************************************************/
 
-/* Check to see if either of the trees in ARGS are a NEGATE_EXPR.  If the first
-   child (args[0]) is a NEGATE_EXPR then NEG_FIRST_P is set to TRUE.
-
-   If a negate is found then the values in ARGS are reordered such that the
-   negate node is always the second one and the entry is replaced by the child
-   of the negate node.  */
+/* Helper function to check if PERM is KIND or PERM_TOP.  */
 
 static inline bool
-vect_normalize_conj_loc (vec<slp_tree> &args, bool *neg_first_p = NULL)
+is_eq_or_top (slp_tree_to_load_perm_map_t *perm_cache,
+	      slp_tree op1, complex_perm_kinds_t kind1,
+	      slp_tree op2, complex_perm_kinds_t kind2)
 {
-  gcc_assert (args.length () == 2);
-  bool neg_found = false;
-
-  if (vect_match_expression_p (args[0], NEGATE_EXPR))
-    {
-      std::swap (args[0], args[1]);
-      neg_found = true;
-      if (neg_first_p)
-	*neg_first_p = true;
-    }
-  else if (vect_match_expression_p (args[1], NEGATE_EXPR))
-    {
-      neg_found = true;
-      if (neg_first_p)
-	*neg_first_p = false;
-    }
+  complex_perm_kinds_t perm1 = linear_loads_p (perm_cache, op1);
+  if (perm1 != kind1 && perm1 != PERM_TOP)
+    return false;
 
-  if (neg_found)
-    args[1] = SLP_TREE_CHILDREN (args[1])[0];
+  complex_perm_kinds_t perm2 = linear_loads_p (perm_cache, op2);
+  if (perm2 != kind2 && perm2 != PERM_TOP)
+    return false;
 
-  return neg_found;
+  return true;
 }
 
-/* Helper function to check if PERM is KIND or PERM_TOP.  */
+enum _conj_status { CONJ_NONE, CONJ_FST, CONJ_SND };
 
 static inline bool
-is_eq_or_top (complex_perm_kinds_t perm, complex_perm_kinds_t kind)
+compatible_complex_nodes_p (slp_compat_nodes_map_t *compat_cache,
+			    slp_tree a, int *pa, slp_tree b, int *pb)
 {
-  return perm == kind || perm == PERM_TOP;
-}
+  bool *tmp;
+  std::pair<slp_tree, slp_tree> key = std::make_pair(a, b);
+  if ((tmp = compat_cache->get (key)) != NULL)
+    return *tmp;
 
-/* Helper function that checks to see if LEFT_OP and RIGHT_OP are both MULT_EXPR
-   nodes but also that they represent an operation that is either a complex
-   multiplication or a complex multiplication by conjugated value.
+   compat_cache->put (key, false);
 
-   Of the negation is expected to be in the first half of the tree (As required
-   by an FMS pattern) then NEG_FIRST is true.  If the operation is a conjugate
-   operation then CONJ_FIRST_OPERAND is set to indicate whether the first or
-   second operand contains the conjugate operation.  */
+  if (SLP_TREE_CHILDREN (a).length () != SLP_TREE_CHILDREN (b).length ())
+    return false;
 
-static inline bool
-vect_validate_multiplication (slp_tree_to_load_perm_map_t *perm_cache,
-			      const vec<slp_tree> &left_op,
-			      const vec<slp_tree> &right_op,
-			     bool neg_first, bool *conj_first_operand,
-			     bool fms)
-{
-  /* The presence of a negation indicates that we have either a conjugate or a
-     rotation.  We need to distinguish which one.  */
-  *conj_first_operand = false;
-  complex_perm_kinds_t kind;
-
-  /* Complex conjugates have the negation on the imaginary part of the
-     number where rotations affect the real component.  So check if the
-     negation is on a dup of lane 1.  */
-  if (fms)
+  if (SLP_TREE_DEF_TYPE (a) != SLP_TREE_DEF_TYPE (b))
+    return false;
+
+  /* Only internal nodes can be loads, as such we can't check further if they
+     are externals.  */
+  if (SLP_TREE_DEF_TYPE (a) != vect_internal_def)
     {
-      /* Canonicalization for fms is not consistent. So have to test both
-	 variants to be sure.  This needs to be fixed in the mid-end so
-	 this part can be simpler.  */
-      kind = linear_loads_p (perm_cache, right_op[0]);
-      if (!((is_eq_or_top (linear_loads_p (perm_cache, right_op[0]), PERM_ODDODD)
-	   && is_eq_or_top (linear_loads_p (perm_cache, right_op[1]),
-			     PERM_ODDEVEN))
-	  || (kind == PERM_ODDEVEN
-	      && is_eq_or_top (linear_loads_p (perm_cache, right_op[1]),
-			     PERM_ODDODD))))
-	return false;
+      for (unsigned i = 0; i < SLP_TREE_SCALAR_OPS (a).length (); i++)
+	{
+	  tree op1 = SLP_TREE_SCALAR_OPS (a)[pa[i % 2]];
+	  tree op2 = SLP_TREE_SCALAR_OPS (b)[pb[i % 2]];
+	  if (!operand_equal_p (op1, op2, 0))
+	    return false;
+	}
+
+      compat_cache->put (key, true);
+      return true;
     }
+
+  auto a_stmt = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE (a));
+  auto b_stmt = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE (b));
+
+  if (gimple_code (a_stmt) != gimple_code (b_stmt))
+    return false;
+
+  /* code, children, type, externals, loads, constants  */
+  if (gimple_num_args (a_stmt) != gimple_num_args (b_stmt))
+    return false;
+
+  /* At this point, a and b are known to be the same gimple operations.  */
+  if (is_gimple_call (a_stmt))
+    {
+	if (!compatible_calls_p (dyn_cast <gcall *> (a_stmt),
+				 dyn_cast <gcall *> (b_stmt)))
+	  return false;
+    }
+  else if (!is_gimple_assign (a_stmt))
+    return false;
   else
     {
-      if (linear_loads_p (perm_cache, right_op[1]) != PERM_ODDODD
-	  && !is_eq_or_top (linear_loads_p (perm_cache, right_op[0]),
-			    PERM_ODDEVEN))
+      tree_code acode = gimple_assign_rhs_code (a_stmt);
+      tree_code bcode = gimple_assign_rhs_code (b_stmt);
+      if ((acode == REALPART_EXPR || acode == IMAGPART_EXPR)
+	  && (bcode == REALPART_EXPR || bcode == IMAGPART_EXPR))
+	return true;
+
+      if (acode != bcode)
 	return false;
     }
 
-  /* Deal with differences in indexes.  */
-  int index1 = fms ? 1 : 0;
-  int index2 = fms ? 0 : 1;
-
-  /* Check if the conjugate is on the second first or second operand.  The
-     order of the node with the conjugate value determines this, and the dup
-     node must be one of lane 0 of the same DR as the neg node.  */
-  kind = linear_loads_p (perm_cache, left_op[index1]);
-  if (kind == PERM_TOP)
+  if (!SLP_TREE_LOAD_PERMUTATION (a).exists ()
+      || !SLP_TREE_LOAD_PERMUTATION (b).exists ())
     {
-      if (linear_loads_p (perm_cache, left_op[index2]) == PERM_EVENODD)
-	return true;
+      for (unsigned i = 0; i < gimple_num_args (a_stmt); i++)
+	{
+	  tree t1 = gimple_arg (a_stmt, i);
+	  tree t2 = gimple_arg (b_stmt, i);
+	  if (TREE_CODE (t1) != TREE_CODE (t2))
+	    return false;
+
+	  /* If SSA name then we will need to inspect the children
+	     so we can punt here.  */
+	  if (TREE_CODE (t1) == SSA_NAME)
+	    continue;
+
+	  if (!operand_equal_p (t1, t2, 0))
+	    return false;
+	}
     }
-  else if (kind == PERM_EVENODD && !neg_first)
+  else
     {
-      if ((kind = linear_loads_p (perm_cache, left_op[index2])) != PERM_EVENEVEN)
+      auto dr1 = STMT_VINFO_DATA_REF (SLP_TREE_REPRESENTATIVE (a));
+      auto dr2 = STMT_VINFO_DATA_REF (SLP_TREE_REPRESENTATIVE (b));
+      /* Don't check the last dimension as that's checked by the lineary
+	 checks.  This check is also much stricter than what we need
+	 because it doesn't consider loading from adjacent elements
+	 in the same struct as loading from the same base object.
+	 But for now, I'll play it safe.  */
+      if (!same_data_refs (dr1, dr2, 1))
 	return false;
-      return true;
     }
-  else if (kind == PERM_EVENEVEN && neg_first)
+
+  for (unsigned i = 0; i < SLP_TREE_CHILDREN (a).length (); i++)
     {
-      if ((kind = linear_loads_p (perm_cache, left_op[index2])) != PERM_EVENODD)
+      if (!compatible_complex_nodes_p (compat_cache,
+				       SLP_TREE_CHILDREN (a)[i], pa,
+				       SLP_TREE_CHILDREN (b)[i], pb))
 	return false;
-
-      *conj_first_operand = true;
-      return true;
     }
-  else
-    return false;
-
-  if (kind != PERM_EVENEVEN)
-    return false;
 
+  compat_cache->put (key, true);
   return true;
 }
 
-/* Helper function to help distinguish between a conjugate and a rotation in a
-   complex multiplication.  The operations have similar shapes but the order of
-   the load permutes are different.  This function returns TRUE when the order
-   is consistent with a multiplication or multiplication by conjugated
-   operand but returns FALSE if it's a multiplication by rotated operand.  */
-
 static inline bool
 vect_validate_multiplication (slp_tree_to_load_perm_map_t *perm_cache,
-			      const vec<slp_tree> &op,
-			      complex_perm_kinds_t permKind)
+			      slp_compat_nodes_map_t *compat_cache,
+			      vec<slp_tree> &left_op,
+			      vec<slp_tree> &right_op,
+			      bool subtract,
+			      enum _conj_status *_status)
 {
-  /* The left node is the more common case, test it first.  */
-  if (!is_eq_or_top (linear_loads_p (perm_cache, op[0]), permKind))
+  auto_vec<slp_tree> ops;
+  enum _conj_status stats = CONJ_NONE;
+
+  /* The complex operations can occur in two layouts and two permute sequences
+     so declare them and re-use them.  */
+  int styles[][4] = { { 0, 2, 1, 3} /* {L1, R1} + {L2, R2}.  */
+		    , { 0, 3, 1, 2} /* {L1, R2} + {L2, R1}.  */
+		    };
+
+  /* Now for the corresponding permutes that go with these values.  */
+  complex_perm_kinds_t perms[][4]
+    = { { PERM_EVENEVEN, PERM_ODDODD, PERM_EVENODD, PERM_ODDEVEN }
+      , { PERM_EVENODD, PERM_ODDEVEN, PERM_EVENEVEN, PERM_ODDODD }
+      };
+
+  /* These permutes are used during comparisons of externals on which
+     we require strict equality.  */
+  int cq[][4][2]
+    = { { { 0, 0 }, { 1, 1 }, { 0, 1 }, { 1, 0 } }
+      , { { 0, 1 }, { 1, 0 }, { 0, 0 }, { 1, 1 } }
+      };
+
+  /* Default to style and perm 0, most operations use this one.  */
+  int style = 0;
+  int perm = subtract ? 1 : 0;
+
+  /* Check if we have a negate operation, if so absorb the node and continue
+     looking.  */
+  bool neg0 = vect_match_expression_p (right_op[0], NEGATE_EXPR);
+  bool neg1 = vect_match_expression_p (right_op[1], NEGATE_EXPR);
+
+  /* Determine which style we're looking at.  We only have different ones
+     whenever a conjugate is involved.  */
+  if (neg0 && neg1)
+    ;
+  else if (neg0)
     {
-      if (!is_eq_or_top (linear_loads_p (perm_cache, op[1]), permKind))
-	return false;
+      right_op[0] = SLP_TREE_CHILDREN (right_op[0])[0];
+      stats = CONJ_FST;
+      if (subtract)
+	perm = 0;
     }
-  return true;
+  else if (neg1)
+    {
+      right_op[1] = SLP_TREE_CHILDREN (right_op[1])[0];
+      stats = CONJ_SND;
+      perm = 1;
+    }
+
+  *_status = stats;
+
+  /* Flatten the inputs after we've remapped them.  */
+  ops.create (4);
+  ops.safe_splice (left_op);
+  ops.safe_splice (right_op);
+
+  /* Extract out the elements to check.  */
+  slp_tree op0 = ops[styles[style][0]];
+  slp_tree op1 = ops[styles[style][1]];
+  slp_tree op2 = ops[styles[style][2]];
+  slp_tree op3 = ops[styles[style][3]];
+
+  /* Do cheapest test first.  If failed no need to analyze further.  */
+  if (linear_loads_p (perm_cache, op0) != perms[perm][0]
+      || linear_loads_p (perm_cache, op1) != perms[perm][1]
+      || !is_eq_or_top (perm_cache, op2, perms[perm][2], op3, perms[perm][3]))
+    return false;
+
+  return compatible_complex_nodes_p (compat_cache, op0, cq[perm][0], op1,
+				     cq[perm][1])
+	 && compatible_complex_nodes_p (compat_cache, op2, cq[perm][2], op3,
+					cq[perm][3]);
 }
 
 /* This function combines two nodes containing only even and only odd lanes
@@ -908,11 +979,12 @@ class complex_mul_pattern : public complex_pattern
   public:
     void build (vec_info *);
     static internal_fn
-    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, slp_tree *,
-	     vec<slp_tree> *);
+    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
+	     slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *);
 
     static vect_pattern*
-    recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
+    recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t *,
+	       slp_tree *);
 
     static vect_pattern*
     mkInstance (slp_tree *node, vec<slp_tree> *m_ops, internal_fn ifn)
@@ -943,6 +1015,7 @@ class complex_mul_pattern : public complex_pattern
 internal_fn
 complex_mul_pattern::matches (complex_operation_t op,
 			      slp_tree_to_load_perm_map_t *perm_cache,
+			      slp_compat_nodes_map_t *compat_cache,
 			      slp_tree *node, vec<slp_tree> *ops)
 {
   internal_fn ifn = IFN_LAST;
@@ -990,17 +1063,13 @@ complex_mul_pattern::matches (complex_operation_t op,
       || linear_loads_p (perm_cache, left_op[1]) == PERM_ODDEVEN)
     return IFN_LAST;
 
-  bool neg_first = false;
-  bool conj_first_operand = false;
-  bool is_neg = vect_normalize_conj_loc (right_op, &neg_first);
+  enum _conj_status status;
+  if (!vect_validate_multiplication (perm_cache, compat_cache, left_op,
+				     right_op, false, &status))
+    return IFN_LAST;
 
-  if (!is_neg)
+  if (status == CONJ_NONE)
     {
-      /* A multiplication needs to multiply agains the real pair, otherwise
-	 the pattern matches that of FMS.   */
-      if (!vect_validate_multiplication (perm_cache, left_op, PERM_EVENEVEN)
-	  || vect_normalize_conj_loc (left_op))
-	return IFN_LAST;
       if (add0)
 	ifn = IFN_COMPLEX_FMA;
       else
@@ -1008,11 +1077,6 @@ complex_mul_pattern::matches (complex_operation_t op,
     }
   else
     {
-      if (!vect_validate_multiplication (perm_cache, left_op, right_op,
-					 neg_first, &conj_first_operand,
-					 false))
-	return IFN_LAST;
-
       if(add0)
 	ifn = IFN_COMPLEX_FMA_CONJ;
       else
@@ -1029,19 +1093,13 @@ complex_mul_pattern::matches (complex_operation_t op,
     ops->quick_push (add0);
 
   complex_perm_kinds_t kind = linear_loads_p (perm_cache, left_op[0]);
-  if (kind == PERM_EVENODD)
+  if (kind == PERM_EVENODD || kind == PERM_TOP)
     {
       ops->quick_push (left_op[1]);
       ops->quick_push (right_op[1]);
       ops->quick_push (left_op[0]);
     }
-  else if (kind == PERM_TOP)
-    {
-      ops->quick_push (left_op[1]);
-      ops->quick_push (right_op[1]);
-      ops->quick_push (left_op[0]);
-    }
-  else if (kind == PERM_EVENEVEN && !conj_first_operand)
+  else if (kind == PERM_EVENEVEN && status != CONJ_SND)
     {
       ops->quick_push (left_op[0]);
       ops->quick_push (right_op[0]);
@@ -1061,13 +1119,14 @@ complex_mul_pattern::matches (complex_operation_t op,
 
 vect_pattern*
 complex_mul_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache,
+				slp_compat_nodes_map_t *compat_cache,
 				slp_tree *node)
 {
   auto_vec<slp_tree> ops;
   complex_operation_t op
     = vect_detect_pair_op (*node, true, &ops);
   internal_fn ifn
-    = complex_mul_pattern::matches (op, perm_cache, node, &ops);
+    = complex_mul_pattern::matches (op, perm_cache, compat_cache, node, &ops);
   if (ifn == IFN_LAST)
     return NULL;
 
@@ -1115,9 +1174,9 @@ complex_mul_pattern::build (vec_info *vinfo)
 
 	/* First re-arrange the children.  */
 	SLP_TREE_CHILDREN (*this->m_node).safe_grow (3);
-	SLP_TREE_CHILDREN (*this->m_node)[0] = this->m_ops[0];
-	SLP_TREE_CHILDREN (*this->m_node)[1] = this->m_ops[3];
-	SLP_TREE_CHILDREN (*this->m_node)[2] = newnode;
+	SLP_TREE_CHILDREN (*this->m_node)[0] = this->m_ops[3];
+	SLP_TREE_CHILDREN (*this->m_node)[1] = newnode;
+	SLP_TREE_CHILDREN (*this->m_node)[2] = this->m_ops[0];
 
 	/* Tell the builder to expect an extra argument.  */
 	this->m_num_args++;
@@ -1147,11 +1206,12 @@ class complex_fms_pattern : public complex_pattern
   public:
     void build (vec_info *);
     static internal_fn
-    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, slp_tree *,
-	     vec<slp_tree> *);
+    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
+	     slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *);
 
     static vect_pattern*
-    recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
+    recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t *,
+	       slp_tree *);
 
     static vect_pattern*
     mkInstance (slp_tree *node, vec<slp_tree> *m_ops, internal_fn ifn)
@@ -1182,6 +1242,7 @@ class complex_fms_pattern : public complex_pattern
 internal_fn
 complex_fms_pattern::matches (complex_operation_t op,
 			      slp_tree_to_load_perm_map_t *perm_cache,
+			      slp_compat_nodes_map_t *compat_cache,
 			      slp_tree * ref_node, vec<slp_tree> *ops)
 {
   internal_fn ifn = IFN_LAST;
@@ -1197,6 +1258,8 @@ complex_fms_pattern::matches (complex_operation_t op,
   if (!vect_match_expression_p (root, MINUS_EXPR))
     return IFN_LAST;
 
+  /* TODO: Support invariants here, with the new layout CADD now
+	   can match before we get a chance to try CFMS.  */
   auto nodes = SLP_TREE_CHILDREN (root);
   if (!vect_match_expression_p (nodes[1], MULT_EXPR)
       || vect_detect_pair_op (nodes[0]) != PLUS_MINUS)
@@ -1217,16 +1280,14 @@ complex_fms_pattern::matches (complex_operation_t op,
       || !vect_match_expression_p (l0node[1], MULT_EXPR))
     return IFN_LAST;
 
-  bool is_neg = vect_normalize_conj_loc (left_op);
-
-  bool conj_first_operand = false;
-  if (!vect_validate_multiplication (perm_cache, right_op, left_op, false,
-				     &conj_first_operand, true))
+  enum _conj_status status;
+  if (!vect_validate_multiplication (perm_cache, compat_cache, right_op,
+				     left_op, true, &status))
     return IFN_LAST;
 
-  if (!is_neg)
+  if (status == CONJ_NONE)
     ifn = IFN_COMPLEX_FMS;
-  else if (is_neg)
+  else
     ifn = IFN_COMPLEX_FMS_CONJ;
 
   if (!vect_pattern_validate_optab (ifn, *ref_node))
@@ -1243,26 +1304,12 @@ complex_fms_pattern::matches (complex_operation_t op,
       ops->quick_push (right_op[1]);
       ops->quick_push (left_op[1]);
     }
-  else if (kind == PERM_TOP)
-    {
-      ops->quick_push (l0node[0]);
-      ops->quick_push (right_op[1]);
-      ops->quick_push (right_op[0]);
-      ops->quick_push (left_op[0]);
-    }
-  else if (kind == PERM_EVENEVEN && !is_neg)
-    {
-      ops->quick_push (l0node[0]);
-      ops->quick_push (right_op[1]);
-      ops->quick_push (right_op[0]);
-      ops->quick_push (left_op[0]);
-    }
   else
     {
       ops->quick_push (l0node[0]);
       ops->quick_push (right_op[1]);
       ops->quick_push (right_op[0]);
-      ops->quick_push (left_op[1]);
+      ops->quick_push (left_op[0]);
     }
 
   return ifn;
@@ -1272,13 +1319,14 @@ complex_fms_pattern::matches (complex_operation_t op,
 
 vect_pattern*
 complex_fms_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache,
+				slp_compat_nodes_map_t *compat_cache,
 				slp_tree *node)
 {
   auto_vec<slp_tree> ops;
   complex_operation_t op
     = vect_detect_pair_op (*node, true, &ops);
   internal_fn ifn
-    = complex_fms_pattern::matches (op, perm_cache, node, &ops);
+    = complex_fms_pattern::matches (op, perm_cache, compat_cache, node, &ops);
   if (ifn == IFN_LAST)
     return NULL;
 
@@ -1305,9 +1353,9 @@ complex_fms_pattern::build (vec_info *vinfo)
   SLP_TREE_CHILDREN (*this->m_node).create (3);
 
   /* First re-arrange the children.  */
-  SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[0]);
   SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[1]);
   SLP_TREE_CHILDREN (*this->m_node).quick_push (newnode);
+  SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[0]);
 
   /* And then rewrite the node itself.  */
   complex_pattern::build (vinfo);
@@ -1334,11 +1382,12 @@ class complex_operations_pattern : public complex_pattern
   public:
     void build (vec_info *);
     static internal_fn
-    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, slp_tree *,
-	     vec<slp_tree> *);
+    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
+	     slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *);
 
     static vect_pattern*
-    recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
+    recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t *,
+	       slp_tree *);
 };
 
 /* Dummy matches implementation for proxy object.  */
@@ -1347,6 +1396,7 @@ internal_fn
 complex_operations_pattern::
 matches (complex_operation_t /* op */,
 	 slp_tree_to_load_perm_map_t * /* perm_cache */,
+	 slp_compat_nodes_map_t * /* compat_cache */,
 	 slp_tree * /* ref_node */, vec<slp_tree> * /* ops */)
 {
   return IFN_LAST;
@@ -1356,6 +1406,7 @@ matches (complex_operation_t /* op */,
 
 vect_pattern*
 complex_operations_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache,
+				       slp_compat_nodes_map_t *ccache,
 				       slp_tree *node)
 {
   auto_vec<slp_tree> ops;
@@ -1363,15 +1414,15 @@ complex_operations_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache,
     = vect_detect_pair_op (*node, true, &ops);
   internal_fn ifn = IFN_LAST;
 
-  ifn  = complex_fms_pattern::matches (op, perm_cache, node, &ops);
+  ifn  = complex_fms_pattern::matches (op, perm_cache, ccache, node, &ops);
   if (ifn != IFN_LAST)
     return complex_fms_pattern::mkInstance (node, &ops, ifn);
 
-  ifn  = complex_mul_pattern::matches (op, perm_cache, node, &ops);
+  ifn  = complex_mul_pattern::matches (op, perm_cache, ccache, node, &ops);
   if (ifn != IFN_LAST)
     return complex_mul_pattern::mkInstance (node, &ops, ifn);
 
-  ifn  = complex_add_pattern::matches (op, perm_cache, node, &ops);
+  ifn  = complex_add_pattern::matches (op, perm_cache, ccache, node, &ops);
   if (ifn != IFN_LAST)
     return complex_add_pattern::mkInstance (node, &ops, ifn);
 
@@ -1398,11 +1449,13 @@ class addsub_pattern : public vect_pattern
     void build (vec_info *);
 
     static vect_pattern*
-    recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
+    recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t *,
+	       slp_tree *);
 };
 
 vect_pattern *
-addsub_pattern::recognize (slp_tree_to_load_perm_map_t *, slp_tree *node_)
+addsub_pattern::recognize (slp_tree_to_load_perm_map_t *,
+			   slp_compat_nodes_map_t *, slp_tree *node_)
 {
   slp_tree node = *node_;
   if (SLP_TREE_CODE (node) != VEC_PERM_EXPR
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index b912c3577df61a694d5bb9e22c5303fe6a48ab6e..cb577f8a612d583254e42bb06a6d7a0875de5e75 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -804,7 +804,7 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned char swap,
 /* Return true if call statements CALL1 and CALL2 are similar enough
    to be combined into the same SLP group.  */
 
-static bool
+bool
 compatible_calls_p (gcall *call1, gcall *call2)
 {
   unsigned int nargs = gimple_call_num_args (call1);
@@ -2907,6 +2907,7 @@ optimize_load_redistribution (scalar_stmts_to_slp_tree_map_t *bst_map,
 static bool
 vect_match_slp_patterns_2 (slp_tree *ref_node, vec_info *vinfo,
 			   slp_tree_to_load_perm_map_t *perm_cache,
+			   slp_compat_nodes_map_t *compat_cache,
 			   hash_set<slp_tree> *visited)
 {
   unsigned i;
@@ -2918,11 +2919,13 @@ vect_match_slp_patterns_2 (slp_tree *ref_node, vec_info *vinfo,
   slp_tree child;
   FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
     found_p |= vect_match_slp_patterns_2 (&SLP_TREE_CHILDREN (node)[i],
-					  vinfo, perm_cache, visited);
+					  vinfo, perm_cache, compat_cache,
+					  visited);
 
   for (unsigned x = 0; x < num__slp_patterns; x++)
     {
-      vect_pattern *pattern = slp_patterns[x] (perm_cache, ref_node);
+      vect_pattern *pattern
+	= slp_patterns[x] (perm_cache, compat_cache, ref_node);
       if (pattern)
 	{
 	  pattern->build (vinfo);
@@ -2943,7 +2946,8 @@ vect_match_slp_patterns_2 (slp_tree *ref_node, vec_info *vinfo,
 static bool
 vect_match_slp_patterns (slp_instance instance, vec_info *vinfo,
 			 hash_set<slp_tree> *visited,
-			 slp_tree_to_load_perm_map_t *perm_cache)
+			 slp_tree_to_load_perm_map_t *perm_cache,
+			 slp_compat_nodes_map_t *compat_cache)
 {
   DUMP_VECT_SCOPE ("vect_match_slp_patterns");
   slp_tree *ref_node = &SLP_INSTANCE_TREE (instance);
@@ -2953,7 +2957,8 @@ vect_match_slp_patterns (slp_instance instance, vec_info *vinfo,
 		     "Analyzing SLP tree %p for patterns\n",
 		     SLP_INSTANCE_TREE (instance));
 
-  return vect_match_slp_patterns_2 (ref_node, vinfo, perm_cache, visited);
+  return vect_match_slp_patterns_2 (ref_node, vinfo, perm_cache, compat_cache,
+				    visited);
 }
 
 /* STMT_INFO is a store group of size GROUP_SIZE that we are considering
@@ -3437,12 +3442,14 @@ vect_analyze_slp (vec_info *vinfo, unsigned max_tree_size)
 
   hash_set<slp_tree> visited_patterns;
   slp_tree_to_load_perm_map_t perm_cache;
+  slp_compat_nodes_map_t compat_cache;
 
   /* See if any patterns can be found in the SLP tree.  */
   bool pattern_found = false;
   FOR_EACH_VEC_ELT (LOOP_VINFO_SLP_INSTANCES (vinfo), i, instance)
     pattern_found |= vect_match_slp_patterns (instance, vinfo,
-					      &visited_patterns, &perm_cache);
+					      &visited_patterns, &perm_cache,
+					      &compat_cache);
 
   /* If any were found optimize permutations of loads.  */
   if (pattern_found)
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 2f6e1e268fb07e9de065ff9c45af87546e565d66..83cd0919c7838c65576e1debd881e0ec636a605a 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -2268,6 +2268,7 @@ extern void duplicate_and_interleave (vec_info *, gimple_seq *, tree,
 extern int vect_get_place_in_interleaving_chain (stmt_vec_info, stmt_vec_info);
 extern slp_tree vect_create_new_slp_node (unsigned, tree_code);
 extern void vect_free_slp_tree (slp_tree);
+extern bool compatible_calls_p (gcall *, gcall *);
 
 /* In tree-vect-patterns.c.  */
 extern void
@@ -2306,6 +2307,12 @@ typedef enum _complex_perm_kinds {
 typedef hash_map <slp_tree, complex_perm_kinds_t>
   slp_tree_to_load_perm_map_t;
 
+/* Cache from nodes pair to being compatible or not.  */
+typedef pair_hash <nofree_ptr_hash <_slp_tree>,
+		   nofree_ptr_hash <_slp_tree>> slp_node_hash;
+typedef hash_map <slp_node_hash, bool> slp_compat_nodes_map_t;
+
+
 /* Vector pattern matcher base class.  All SLP pattern matchers must inherit
    from this type.  */
 
@@ -2338,7 +2345,8 @@ class vect_pattern
   public:
 
     /* Create a new instance of the pattern matcher class of the given type.  */
-    static vect_pattern* recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
+    static vect_pattern* recognize (slp_tree_to_load_perm_map_t *,
+				    slp_compat_nodes_map_t *, slp_tree *);
 
     /* Build the pattern from the data collected so far.  */
     virtual void build (vec_info *) = 0;
@@ -2352,6 +2360,7 @@ class vect_pattern
 
 /* Function pointer to create a new pattern matcher from a generic type.  */
 typedef vect_pattern* (*vect_pattern_decl_t) (slp_tree_to_load_perm_map_t *,
+					      slp_compat_nodes_map_t *,
 					      slp_tree *);
 
 /* List of supported pattern matchers.  */

[-- Attachment #2: rb15145.patch --]
[-- Type: application/octet-stream, Size: 41026 bytes --]

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 9ec051e94e10cca9eec2773e1b8c01b74b6ea4db..ad06b02d36876082afe4c3f3fb51887f7a522b23 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -6325,12 +6325,13 @@ Perform a vector multiply and accumulate that is semantically the same as
 a multiply and accumulate of complex numbers.
 
 @smallexample
-  complex TYPE c[N];
-  complex TYPE a[N];
-  complex TYPE b[N];
+  complex TYPE op0[N];
+  complex TYPE op1[N];
+  complex TYPE op2[N];
+  complex TYPE op3[N];
   for (int i = 0; i < N; i += 1)
     @{
-      c[i] += a[i] * b[i];
+      op0[i] = op1[i] * op2[i] + op3[i];
     @}
 @end smallexample
 
@@ -6348,12 +6349,13 @@ the same as a multiply and accumulate of complex numbers where the second
 multiply arguments is conjugated.
 
 @smallexample
-  complex TYPE c[N];
-  complex TYPE a[N];
-  complex TYPE b[N];
+  complex TYPE op0[N];
+  complex TYPE op1[N];
+  complex TYPE op2[N];
+  complex TYPE op3[N];
   for (int i = 0; i < N; i += 1)
     @{
-      c[i] += a[i] * conj (b[i]);
+      op0[i] = op1[i] * conj (op2[i]) + op3[i];
     @}
 @end smallexample
 
@@ -6370,12 +6372,13 @@ Perform a vector multiply and subtract that is semantically the same as
 a multiply and subtract of complex numbers.
 
 @smallexample
-  complex TYPE c[N];
-  complex TYPE a[N];
-  complex TYPE b[N];
+  complex TYPE op0[N];
+  complex TYPE op1[N];
+  complex TYPE op2[N];
+  complex TYPE op3[N];
   for (int i = 0; i < N; i += 1)
     @{
-      c[i] -= a[i] * b[i];
+      op0[i] = op1[i] * op2[i] - op3[i];
     @}
 @end smallexample
 
@@ -6393,12 +6396,13 @@ the same as a multiply and subtract of complex numbers where the second
 multiply arguments is conjugated.
 
 @smallexample
-  complex TYPE c[N];
-  complex TYPE a[N];
-  complex TYPE b[N];
+  complex TYPE op0[N];
+  complex TYPE op1[N];
+  complex TYPE op2[N];
+  complex TYPE op3[N];
   for (int i = 0; i < N; i += 1)
     @{
-      c[i] -= a[i] * conj (b[i]);
+      op0[i] = op1[i] * conj (op2[i]) - op3[i];
     @}
 @end smallexample
 
@@ -6415,12 +6419,12 @@ Perform a vector multiply that is semantically the same as multiply of
 complex numbers.
 
 @smallexample
-  complex TYPE c[N];
-  complex TYPE a[N];
-  complex TYPE b[N];
+  complex TYPE op0[N];
+  complex TYPE op1[N];
+  complex TYPE op2[N];
   for (int i = 0; i < N; i += 1)
     @{
-      c[i] = a[i] * b[i];
+      op0[i] = op1[i] * op2[i];
     @}
 @end smallexample
 
@@ -6437,12 +6441,12 @@ Perform a vector multiply by conjugate that is semantically the same as a
 multiply of complex numbers where the second multiply arguments is conjugated.
 
 @smallexample
-  complex TYPE c[N];
-  complex TYPE a[N];
-  complex TYPE b[N];
+  complex TYPE op0[N];
+  complex TYPE op1[N];
+  complex TYPE op2[N];
   for (int i = 0; i < N; i += 1)
     @{
-      c[i] = a[i] * conj (b[i]);
+      op0[i] = op1[i] * conj (op2[i]);
     @}
 @end smallexample
 
diff --git a/gcc/testsuite/g++.dg/vect/pr99149.cc b/gcc/testsuite/g++.dg/vect/pr99149.cc
index e6e0594a336fa053ffba64a12e2de43a4e373f49..bb9f5fa89f12b184368bf5488d6e9432c2166463 100755
--- a/gcc/testsuite/g++.dg/vect/pr99149.cc
+++ b/gcc/testsuite/g++.dg/vect/pr99149.cc
@@ -24,4 +24,4 @@ public:
 } n;
 main() { n.j(); }
 
-/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_MUL" 1 "slp2" } } */
+/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_MUL" 1 "slp2" { xfail { vect_float } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-1.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..46b9a55f05279d732fa1418e02f779cf693ede07
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-1.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+
+float f[12][100];
+
+void bad1(float v1, float v2)
+{
+  for (int r = 0; r < 100; r += 4)
+    {
+      int i = r + 1;
+      f[0][r] = f[1][r] * (f[2][r] + v2) - f[1][i] * (f[2][i] + v1);
+      f[0][i] = f[1][r] * (f[2][i] + v1) + f[1][i] * (f[2][r] + v2);
+      f[0][r+2] = f[1][r+2] * (f[2][r+2] + v2) - f[1][i+2] * (f[2][i+2] + v1);
+      f[0][i+2] = f[1][r+2] * (f[2][i+2] + v1) + f[1][i+2] * (f[2][r+2] + v2);
+      //                  ^^^^^^^             ^^^^^^^
+    }
+}
+
+/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-2.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..ffe646efe57f7ad07541b0fb96601596f46dc5f8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-2.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+
+float f[12][100];
+
+void bad1(float v1, float v2)
+{
+  for (int r = 0; r < 100; r += 2)
+    {
+      int i = r + 1;
+      f[0][r] = f[1][r] * (f[2][r] + v1) - f[1][i] * (f[2][i] + v2);
+      f[0][i] = f[1][r] * (f[2][i] + v1) + f[1][i] * (f[2][r] + v2);
+    }
+}
+
+/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-3.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-3.c
new file mode 100644
index 0000000000000000000000000000000000000000..5f98aa204d8b11b0cb433f8965dbb72cf8940de1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-3.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+
+float f[12][100];
+
+void good1(float v1, float v2)
+{
+  for (int r = 0; r < 100; r += 2)
+    {
+      int i = r + 1;
+      f[0][r] = f[1][r] * (f[2][r] + v2) - f[1][i] * (f[2][i] + v1);
+      f[0][i] = f[1][r] * (f[2][i] + v1) + f[1][i] * (f[2][r] + v2);
+    }
+}
+
+/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-4.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-4.c
new file mode 100644
index 0000000000000000000000000000000000000000..882851789c5085e734000609114be480d3b08bd0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-4.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+
+float f[12][100];
+
+void good1()
+{
+  for (int r = 0; r < 100; r += 2)
+    {
+      int i = r + 1;
+      f[0][r] = f[1][r] * f[2][r] - f[1][i] * f[2][i];
+      f[0][i] = f[1][r] * f[2][i] + f[1][i] * f[2][r];
+    }
+}
+
+/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-5.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-5.c
new file mode 100644
index 0000000000000000000000000000000000000000..6a2d549d65f3f27d407fb0bd469473e6a5c333ae
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-5.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+
+float f[12][100];
+
+void good2()
+{
+  for (int r = 0; r < 100; r += 2)
+    {
+      int i = r + 1;
+      f[0][r] = f[1][r] * (f[2][r] + 1) - f[1][i] * (f[2][i] + 1);
+      f[0][i] = f[1][r] * (f[2][i] + 1) + f[1][i] * (f[2][r] + 1);
+    }
+}
+
+/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-6.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-6.c
new file mode 100644
index 0000000000000000000000000000000000000000..71e66dbe3b29eec1fffb8df9b216022fdc0af54e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-6.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+
+float f[12][100];
+
+void bad1()
+{
+  for (int r = 0; r < 100; r += 2)
+    {
+      int i = r + 1;
+      f[0][r] = f[1][r] * f[2][r] - f[1][i] * f[3][i];
+      f[0][i] = f[1][r] * f[2][i] + f[1][i] * f[3][r];
+      //                  ^^^^^^^             ^^^^^^^
+    }
+}
+
+/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-7.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-7.c
new file mode 100644
index 0000000000000000000000000000000000000000..536672f3c8bb474ad5fa4bb61b3a36b555acf3cf
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-7.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+
+float f[12][100];
+
+void bad2()
+{
+  for (int r = 0; r < 100; r += 2)
+    {
+      int i = r + 1;
+      f[0][r] = f[1][r] * (f[2][r] + 1) - f[1][i] * f[2][i];
+      f[0][i] = f[1][r] * (f[2][i] + 1) + f[1][i] * f[2][r];
+      //                          ^^^^
+    }
+}
+
+/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-8.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-8.c
new file mode 100644
index 0000000000000000000000000000000000000000..07b48148688b7d530e5891d023d558b58a485c23
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-8.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+
+float f[12][100];
+
+void bad3()
+{
+  for (int r = 0; r < 100; r += 2)
+    {
+      int i = r + 1;
+      f[0][r] = f[1][r] * f[2][r] - f[1][r] * f[2][i];
+      f[0][i] = f[1][r] * f[2][i] + f[1][i] * f[2][r];
+      //                            ^^^^^^^
+    }
+}
+
+/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-9.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-9.c
new file mode 100644
index 0000000000000000000000000000000000000000..7655852434b21b381fe7ee316e8caf3d485b8ee1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-9.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+
+#include <stdio.h>
+#include <complex.h>
+
+#define N 200
+#define TYPE float
+#define TYPE2 float
+
+void g (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
+{
+  for (int i=0; i < N; i++)
+    {
+      c[i] -=  a[i] * b[0];
+    }
+}
+
+/* The pattern overlaps with COMPLEX_ADD so we need to support consuming ADDs in COMPLEX_FMS.  */
+
+/* { dg-final { scan-tree-dump "Found COMPLEX_FMS" "vect" { xfail { vect_float } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr103169.c b/gcc/testsuite/gcc.dg/vect/complex/pr103169.c
new file mode 100644
index 0000000000000000000000000000000000000000..1bfabbd85a0eedfb4156a82574324126e9083fc5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/complex/pr103169.c
@@ -0,0 +1,12 @@
+/* { dg-do compile { target { vect_double } } } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+/* { dg-additional-options "-O2 -fvect-cost-model=unlimited" } */
+
+_Complex double b_0, c_0;
+
+void
+mul270snd (void)
+{
+  c_0 = b_0 * 1.0iF * 1.0iF;
+}
+
diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h
index 74f579c9f3f23bac25d21546068c2ab43209aa2b..8ad5fa521279b20fa5e63eecf442d5dc5c16e7ee 100644
--- a/gcc/tree-data-ref.h
+++ b/gcc/tree-data-ref.h
@@ -600,10 +600,11 @@ same_data_refs_base_objects (data_reference_p a, data_reference_p b)
 }
 
 /* Return true when the data references A and B are accessing the same
-   memory object with the same access functions.  */
+   memory object with the same access functions.  Optionally skip the
+   last OFFSET dimensions in the data reference.  */
 
 static inline bool
-same_data_refs (data_reference_p a, data_reference_p b)
+same_data_refs (data_reference_p a, data_reference_p b, int offset = 0)
 {
   unsigned int i;
 
@@ -614,7 +615,7 @@ same_data_refs (data_reference_p a, data_reference_p b)
   if (!same_data_refs_base_objects (a, b))
     return false;
 
-  for (i = 0; i < DR_NUM_DIMENSIONS (a); i++)
+  for (i = offset; i < DR_NUM_DIMENSIONS (a); i++)
     if (!eq_evolutions_p (DR_ACCESS_FN (a, i), DR_ACCESS_FN (b, i)))
       return false;
 
diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
index 0350441fad9690cd5d04337171ca3470a064a571..020c29bba08c5bd80503a2dbc04292f8fd310b3c 100644
--- a/gcc/tree-vect-slp-patterns.c
+++ b/gcc/tree-vect-slp-patterns.c
@@ -149,12 +149,13 @@ is_linear_load_p (load_permutation_t loads)
   int valid_patterns = 4;
   FOR_EACH_VEC_ELT (loads, i, load)
     {
-      if (candidates[0] != PERM_UNKNOWN && load != 1)
+      unsigned adj_load = load % 2;
+      if (candidates[0] != PERM_UNKNOWN && adj_load != 1)
 	{
 	  candidates[0] = PERM_UNKNOWN;
 	  valid_patterns--;
 	}
-      if (candidates[1] != PERM_UNKNOWN && load != 0)
+      if (candidates[1] != PERM_UNKNOWN && adj_load != 0)
 	{
 	  candidates[1] = PERM_UNKNOWN;
 	  valid_patterns--;
@@ -596,11 +597,12 @@ class complex_add_pattern : public complex_pattern
   public:
     void build (vec_info *);
     static internal_fn
-    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, slp_tree *,
-	     vec<slp_tree> *);
+    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
+	     slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *);
 
     static vect_pattern*
-    recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
+    recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t *,
+	       slp_tree *);
 
     static vect_pattern*
     mkInstance (slp_tree *node, vec<slp_tree> *m_ops, internal_fn ifn)
@@ -647,6 +649,7 @@ complex_add_pattern::build (vec_info *vinfo)
 internal_fn
 complex_add_pattern::matches (complex_operation_t op,
 			      slp_tree_to_load_perm_map_t *perm_cache,
+			      slp_compat_nodes_map_t * /* compat_cache */,
 			      slp_tree *node, vec<slp_tree> *ops)
 {
   internal_fn ifn = IFN_LAST;
@@ -692,13 +695,14 @@ complex_add_pattern::matches (complex_operation_t op,
 
 vect_pattern*
 complex_add_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache,
+				slp_compat_nodes_map_t *compat_cache,
 				slp_tree *node)
 {
   auto_vec<slp_tree> ops;
   complex_operation_t op
     = vect_detect_pair_op (*node, true, &ops);
   internal_fn ifn
-    = complex_add_pattern::matches (op, perm_cache, node, &ops);
+    = complex_add_pattern::matches (op, perm_cache, compat_cache, node, &ops);
   if (ifn == IFN_LAST)
     return NULL;
 
@@ -709,147 +713,214 @@ complex_add_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache,
  * complex_mul_pattern
  ******************************************************************************/
 
-/* Check to see if either of the trees in ARGS are a NEGATE_EXPR.  If the first
-   child (args[0]) is a NEGATE_EXPR then NEG_FIRST_P is set to TRUE.
-
-   If a negate is found then the values in ARGS are reordered such that the
-   negate node is always the second one and the entry is replaced by the child
-   of the negate node.  */
+/* Helper function to check if PERM is KIND or PERM_TOP.  */
 
 static inline bool
-vect_normalize_conj_loc (vec<slp_tree> &args, bool *neg_first_p = NULL)
+is_eq_or_top (slp_tree_to_load_perm_map_t *perm_cache,
+	      slp_tree op1, complex_perm_kinds_t kind1,
+	      slp_tree op2, complex_perm_kinds_t kind2)
 {
-  gcc_assert (args.length () == 2);
-  bool neg_found = false;
-
-  if (vect_match_expression_p (args[0], NEGATE_EXPR))
-    {
-      std::swap (args[0], args[1]);
-      neg_found = true;
-      if (neg_first_p)
-	*neg_first_p = true;
-    }
-  else if (vect_match_expression_p (args[1], NEGATE_EXPR))
-    {
-      neg_found = true;
-      if (neg_first_p)
-	*neg_first_p = false;
-    }
+  complex_perm_kinds_t perm1 = linear_loads_p (perm_cache, op1);
+  if (perm1 != kind1 && perm1 != PERM_TOP)
+    return false;
 
-  if (neg_found)
-    args[1] = SLP_TREE_CHILDREN (args[1])[0];
+  complex_perm_kinds_t perm2 = linear_loads_p (perm_cache, op2);
+  if (perm2 != kind2 && perm2 != PERM_TOP)
+    return false;
 
-  return neg_found;
+  return true;
 }
 
-/* Helper function to check if PERM is KIND or PERM_TOP.  */
+enum _conj_status { CONJ_NONE, CONJ_FST, CONJ_SND };
 
 static inline bool
-is_eq_or_top (complex_perm_kinds_t perm, complex_perm_kinds_t kind)
+compatible_complex_nodes_p (slp_compat_nodes_map_t *compat_cache,
+			    slp_tree a, int *pa, slp_tree b, int *pb)
 {
-  return perm == kind || perm == PERM_TOP;
-}
+  bool *tmp;
+  std::pair<slp_tree, slp_tree> key = std::make_pair(a, b);
+  if ((tmp = compat_cache->get (key)) != NULL)
+    return *tmp;
 
-/* Helper function that checks to see if LEFT_OP and RIGHT_OP are both MULT_EXPR
-   nodes but also that they represent an operation that is either a complex
-   multiplication or a complex multiplication by conjugated value.
+   compat_cache->put (key, false);
 
-   Of the negation is expected to be in the first half of the tree (As required
-   by an FMS pattern) then NEG_FIRST is true.  If the operation is a conjugate
-   operation then CONJ_FIRST_OPERAND is set to indicate whether the first or
-   second operand contains the conjugate operation.  */
+  if (SLP_TREE_CHILDREN (a).length () != SLP_TREE_CHILDREN (b).length ())
+    return false;
 
-static inline bool
-vect_validate_multiplication (slp_tree_to_load_perm_map_t *perm_cache,
-			      const vec<slp_tree> &left_op,
-			      const vec<slp_tree> &right_op,
-			     bool neg_first, bool *conj_first_operand,
-			     bool fms)
-{
-  /* The presence of a negation indicates that we have either a conjugate or a
-     rotation.  We need to distinguish which one.  */
-  *conj_first_operand = false;
-  complex_perm_kinds_t kind;
-
-  /* Complex conjugates have the negation on the imaginary part of the
-     number where rotations affect the real component.  So check if the
-     negation is on a dup of lane 1.  */
-  if (fms)
+  if (SLP_TREE_DEF_TYPE (a) != SLP_TREE_DEF_TYPE (b))
+    return false;
+
+  /* Only internal nodes can be loads, as such we can't check further if they
+     are externals.  */
+  if (SLP_TREE_DEF_TYPE (a) != vect_internal_def)
     {
-      /* Canonicalization for fms is not consistent. So have to test both
-	 variants to be sure.  This needs to be fixed in the mid-end so
-	 this part can be simpler.  */
-      kind = linear_loads_p (perm_cache, right_op[0]);
-      if (!((is_eq_or_top (linear_loads_p (perm_cache, right_op[0]), PERM_ODDODD)
-	   && is_eq_or_top (linear_loads_p (perm_cache, right_op[1]),
-			     PERM_ODDEVEN))
-	  || (kind == PERM_ODDEVEN
-	      && is_eq_or_top (linear_loads_p (perm_cache, right_op[1]),
-			     PERM_ODDODD))))
-	return false;
+      for (unsigned i = 0; i < SLP_TREE_SCALAR_OPS (a).length (); i++)
+	{
+	  tree op1 = SLP_TREE_SCALAR_OPS (a)[pa[i % 2]];
+	  tree op2 = SLP_TREE_SCALAR_OPS (b)[pb[i % 2]];
+	  if (!operand_equal_p (op1, op2, 0))
+	    return false;
+	}
+
+      compat_cache->put (key, true);
+      return true;
     }
+
+  auto a_stmt = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE (a));
+  auto b_stmt = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE (b));
+
+  if (gimple_code (a_stmt) != gimple_code (b_stmt))
+    return false;
+
+  /* code, children, type, externals, loads, constants  */
+  if (gimple_num_args (a_stmt) != gimple_num_args (b_stmt))
+    return false;
+
+  /* At this point, a and b are known to be the same gimple operations.  */
+  if (is_gimple_call (a_stmt))
+    {
+	if (!compatible_calls_p (dyn_cast <gcall *> (a_stmt),
+				 dyn_cast <gcall *> (b_stmt)))
+	  return false;
+    }
+  else if (!is_gimple_assign (a_stmt))
+    return false;
   else
     {
-      if (linear_loads_p (perm_cache, right_op[1]) != PERM_ODDODD
-	  && !is_eq_or_top (linear_loads_p (perm_cache, right_op[0]),
-			    PERM_ODDEVEN))
+      tree_code acode = gimple_assign_rhs_code (a_stmt);
+      tree_code bcode = gimple_assign_rhs_code (b_stmt);
+      if ((acode == REALPART_EXPR || acode == IMAGPART_EXPR)
+	  && (bcode == REALPART_EXPR || bcode == IMAGPART_EXPR))
+	return true;
+
+      if (acode != bcode)
 	return false;
     }
 
-  /* Deal with differences in indexes.  */
-  int index1 = fms ? 1 : 0;
-  int index2 = fms ? 0 : 1;
-
-  /* Check if the conjugate is on the second first or second operand.  The
-     order of the node with the conjugate value determines this, and the dup
-     node must be one of lane 0 of the same DR as the neg node.  */
-  kind = linear_loads_p (perm_cache, left_op[index1]);
-  if (kind == PERM_TOP)
+  if (!SLP_TREE_LOAD_PERMUTATION (a).exists ()
+      || !SLP_TREE_LOAD_PERMUTATION (b).exists ())
     {
-      if (linear_loads_p (perm_cache, left_op[index2]) == PERM_EVENODD)
-	return true;
+      for (unsigned i = 0; i < gimple_num_args (a_stmt); i++)
+	{
+	  tree t1 = gimple_arg (a_stmt, i);
+	  tree t2 = gimple_arg (b_stmt, i);
+	  if (TREE_CODE (t1) != TREE_CODE (t2))
+	    return false;
+
+	  /* If SSA name then we will need to inspect the children
+	     so we can punt here.  */
+	  if (TREE_CODE (t1) == SSA_NAME)
+	    continue;
+
+	  if (!operand_equal_p (t1, t2, 0))
+	    return false;
+	}
     }
-  else if (kind == PERM_EVENODD && !neg_first)
+  else
     {
-      if ((kind = linear_loads_p (perm_cache, left_op[index2])) != PERM_EVENEVEN)
+      auto dr1 = STMT_VINFO_DATA_REF (SLP_TREE_REPRESENTATIVE (a));
+      auto dr2 = STMT_VINFO_DATA_REF (SLP_TREE_REPRESENTATIVE (b));
+      /* Don't check the last dimension as that's checked by the lineary
+	 checks.  This check is also much stricter than what we need
+	 because it doesn't consider loading from adjacent elements
+	 in the same struct as loading from the same base object.
+	 But for now, I'll play it safe.  */
+      if (!same_data_refs (dr1, dr2, 1))
 	return false;
-      return true;
     }
-  else if (kind == PERM_EVENEVEN && neg_first)
+
+  for (unsigned i = 0; i < SLP_TREE_CHILDREN (a).length (); i++)
     {
-      if ((kind = linear_loads_p (perm_cache, left_op[index2])) != PERM_EVENODD)
+      if (!compatible_complex_nodes_p (compat_cache,
+				       SLP_TREE_CHILDREN (a)[i], pa,
+				       SLP_TREE_CHILDREN (b)[i], pb))
 	return false;
-
-      *conj_first_operand = true;
-      return true;
     }
-  else
-    return false;
-
-  if (kind != PERM_EVENEVEN)
-    return false;
 
+  compat_cache->put (key, true);
   return true;
 }
 
-/* Helper function to help distinguish between a conjugate and a rotation in a
-   complex multiplication.  The operations have similar shapes but the order of
-   the load permutes are different.  This function returns TRUE when the order
-   is consistent with a multiplication or multiplication by conjugated
-   operand but returns FALSE if it's a multiplication by rotated operand.  */
-
 static inline bool
 vect_validate_multiplication (slp_tree_to_load_perm_map_t *perm_cache,
-			      const vec<slp_tree> &op,
-			      complex_perm_kinds_t permKind)
+			      slp_compat_nodes_map_t *compat_cache,
+			      vec<slp_tree> &left_op,
+			      vec<slp_tree> &right_op,
+			      bool subtract,
+			      enum _conj_status *_status)
 {
-  /* The left node is the more common case, test it first.  */
-  if (!is_eq_or_top (linear_loads_p (perm_cache, op[0]), permKind))
+  auto_vec<slp_tree> ops;
+  enum _conj_status stats = CONJ_NONE;
+
+  /* The complex operations can occur in two layouts and two permute sequences
+     so declare them and re-use them.  */
+  int styles[][4] = { { 0, 2, 1, 3} /* {L1, R1} + {L2, R2}.  */
+		    , { 0, 3, 1, 2} /* {L1, R2} + {L2, R1}.  */
+		    };
+
+  /* Now for the corresponding permutes that go with these values.  */
+  complex_perm_kinds_t perms[][4]
+    = { { PERM_EVENEVEN, PERM_ODDODD, PERM_EVENODD, PERM_ODDEVEN }
+      , { PERM_EVENODD, PERM_ODDEVEN, PERM_EVENEVEN, PERM_ODDODD }
+      };
+
+  /* These permutes are used during comparisons of externals on which
+     we require strict equality.  */
+  int cq[][4][2]
+    = { { { 0, 0 }, { 1, 1 }, { 0, 1 }, { 1, 0 } }
+      , { { 0, 1 }, { 1, 0 }, { 0, 0 }, { 1, 1 } }
+      };
+
+  /* Default to style and perm 0, most operations use this one.  */
+  int style = 0;
+  int perm = subtract ? 1 : 0;
+
+  /* Check if we have a negate operation, if so absorb the node and continue
+     looking.  */
+  bool neg0 = vect_match_expression_p (right_op[0], NEGATE_EXPR);
+  bool neg1 = vect_match_expression_p (right_op[1], NEGATE_EXPR);
+
+  /* Determine which style we're looking at.  We only have different ones
+     whenever a conjugate is involved.  */
+  if (neg0 && neg1)
+    ;
+  else if (neg0)
     {
-      if (!is_eq_or_top (linear_loads_p (perm_cache, op[1]), permKind))
-	return false;
+      right_op[0] = SLP_TREE_CHILDREN (right_op[0])[0];
+      stats = CONJ_FST;
+      if (subtract)
+	perm = 0;
     }
-  return true;
+  else if (neg1)
+    {
+      right_op[1] = SLP_TREE_CHILDREN (right_op[1])[0];
+      stats = CONJ_SND;
+      perm = 1;
+    }
+
+  *_status = stats;
+
+  /* Flatten the inputs after we've remapped them.  */
+  ops.create (4);
+  ops.safe_splice (left_op);
+  ops.safe_splice (right_op);
+
+  /* Extract out the elements to check.  */
+  slp_tree op0 = ops[styles[style][0]];
+  slp_tree op1 = ops[styles[style][1]];
+  slp_tree op2 = ops[styles[style][2]];
+  slp_tree op3 = ops[styles[style][3]];
+
+  /* Do cheapest test first.  If failed no need to analyze further.  */
+  if (linear_loads_p (perm_cache, op0) != perms[perm][0]
+      || linear_loads_p (perm_cache, op1) != perms[perm][1]
+      || !is_eq_or_top (perm_cache, op2, perms[perm][2], op3, perms[perm][3]))
+    return false;
+
+  return compatible_complex_nodes_p (compat_cache, op0, cq[perm][0], op1,
+				     cq[perm][1])
+	 && compatible_complex_nodes_p (compat_cache, op2, cq[perm][2], op3,
+					cq[perm][3]);
 }
 
 /* This function combines two nodes containing only even and only odd lanes
@@ -908,11 +979,12 @@ class complex_mul_pattern : public complex_pattern
   public:
     void build (vec_info *);
     static internal_fn
-    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, slp_tree *,
-	     vec<slp_tree> *);
+    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
+	     slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *);
 
     static vect_pattern*
-    recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
+    recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t *,
+	       slp_tree *);
 
     static vect_pattern*
     mkInstance (slp_tree *node, vec<slp_tree> *m_ops, internal_fn ifn)
@@ -943,6 +1015,7 @@ class complex_mul_pattern : public complex_pattern
 internal_fn
 complex_mul_pattern::matches (complex_operation_t op,
 			      slp_tree_to_load_perm_map_t *perm_cache,
+			      slp_compat_nodes_map_t *compat_cache,
 			      slp_tree *node, vec<slp_tree> *ops)
 {
   internal_fn ifn = IFN_LAST;
@@ -990,17 +1063,13 @@ complex_mul_pattern::matches (complex_operation_t op,
       || linear_loads_p (perm_cache, left_op[1]) == PERM_ODDEVEN)
     return IFN_LAST;
 
-  bool neg_first = false;
-  bool conj_first_operand = false;
-  bool is_neg = vect_normalize_conj_loc (right_op, &neg_first);
+  enum _conj_status status;
+  if (!vect_validate_multiplication (perm_cache, compat_cache, left_op,
+				     right_op, false, &status))
+    return IFN_LAST;
 
-  if (!is_neg)
+  if (status == CONJ_NONE)
     {
-      /* A multiplication needs to multiply agains the real pair, otherwise
-	 the pattern matches that of FMS.   */
-      if (!vect_validate_multiplication (perm_cache, left_op, PERM_EVENEVEN)
-	  || vect_normalize_conj_loc (left_op))
-	return IFN_LAST;
       if (add0)
 	ifn = IFN_COMPLEX_FMA;
       else
@@ -1008,11 +1077,6 @@ complex_mul_pattern::matches (complex_operation_t op,
     }
   else
     {
-      if (!vect_validate_multiplication (perm_cache, left_op, right_op,
-					 neg_first, &conj_first_operand,
-					 false))
-	return IFN_LAST;
-
       if(add0)
 	ifn = IFN_COMPLEX_FMA_CONJ;
       else
@@ -1029,19 +1093,13 @@ complex_mul_pattern::matches (complex_operation_t op,
     ops->quick_push (add0);
 
   complex_perm_kinds_t kind = linear_loads_p (perm_cache, left_op[0]);
-  if (kind == PERM_EVENODD)
+  if (kind == PERM_EVENODD || kind == PERM_TOP)
     {
       ops->quick_push (left_op[1]);
       ops->quick_push (right_op[1]);
       ops->quick_push (left_op[0]);
     }
-  else if (kind == PERM_TOP)
-    {
-      ops->quick_push (left_op[1]);
-      ops->quick_push (right_op[1]);
-      ops->quick_push (left_op[0]);
-    }
-  else if (kind == PERM_EVENEVEN && !conj_first_operand)
+  else if (kind == PERM_EVENEVEN && status != CONJ_SND)
     {
       ops->quick_push (left_op[0]);
       ops->quick_push (right_op[0]);
@@ -1061,13 +1119,14 @@ complex_mul_pattern::matches (complex_operation_t op,
 
 vect_pattern*
 complex_mul_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache,
+				slp_compat_nodes_map_t *compat_cache,
 				slp_tree *node)
 {
   auto_vec<slp_tree> ops;
   complex_operation_t op
     = vect_detect_pair_op (*node, true, &ops);
   internal_fn ifn
-    = complex_mul_pattern::matches (op, perm_cache, node, &ops);
+    = complex_mul_pattern::matches (op, perm_cache, compat_cache, node, &ops);
   if (ifn == IFN_LAST)
     return NULL;
 
@@ -1115,9 +1174,9 @@ complex_mul_pattern::build (vec_info *vinfo)
 
 	/* First re-arrange the children.  */
 	SLP_TREE_CHILDREN (*this->m_node).safe_grow (3);
-	SLP_TREE_CHILDREN (*this->m_node)[0] = this->m_ops[0];
-	SLP_TREE_CHILDREN (*this->m_node)[1] = this->m_ops[3];
-	SLP_TREE_CHILDREN (*this->m_node)[2] = newnode;
+	SLP_TREE_CHILDREN (*this->m_node)[0] = this->m_ops[3];
+	SLP_TREE_CHILDREN (*this->m_node)[1] = newnode;
+	SLP_TREE_CHILDREN (*this->m_node)[2] = this->m_ops[0];
 
 	/* Tell the builder to expect an extra argument.  */
 	this->m_num_args++;
@@ -1147,11 +1206,12 @@ class complex_fms_pattern : public complex_pattern
   public:
     void build (vec_info *);
     static internal_fn
-    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, slp_tree *,
-	     vec<slp_tree> *);
+    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
+	     slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *);
 
     static vect_pattern*
-    recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
+    recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t *,
+	       slp_tree *);
 
     static vect_pattern*
     mkInstance (slp_tree *node, vec<slp_tree> *m_ops, internal_fn ifn)
@@ -1182,6 +1242,7 @@ class complex_fms_pattern : public complex_pattern
 internal_fn
 complex_fms_pattern::matches (complex_operation_t op,
 			      slp_tree_to_load_perm_map_t *perm_cache,
+			      slp_compat_nodes_map_t *compat_cache,
 			      slp_tree * ref_node, vec<slp_tree> *ops)
 {
   internal_fn ifn = IFN_LAST;
@@ -1197,6 +1258,8 @@ complex_fms_pattern::matches (complex_operation_t op,
   if (!vect_match_expression_p (root, MINUS_EXPR))
     return IFN_LAST;
 
+  /* TODO: Support invariants here, with the new layout CADD now
+	   can match before we get a chance to try CFMS.  */
   auto nodes = SLP_TREE_CHILDREN (root);
   if (!vect_match_expression_p (nodes[1], MULT_EXPR)
       || vect_detect_pair_op (nodes[0]) != PLUS_MINUS)
@@ -1217,16 +1280,14 @@ complex_fms_pattern::matches (complex_operation_t op,
       || !vect_match_expression_p (l0node[1], MULT_EXPR))
     return IFN_LAST;
 
-  bool is_neg = vect_normalize_conj_loc (left_op);
-
-  bool conj_first_operand = false;
-  if (!vect_validate_multiplication (perm_cache, right_op, left_op, false,
-				     &conj_first_operand, true))
+  enum _conj_status status;
+  if (!vect_validate_multiplication (perm_cache, compat_cache, right_op,
+				     left_op, true, &status))
     return IFN_LAST;
 
-  if (!is_neg)
+  if (status == CONJ_NONE)
     ifn = IFN_COMPLEX_FMS;
-  else if (is_neg)
+  else
     ifn = IFN_COMPLEX_FMS_CONJ;
 
   if (!vect_pattern_validate_optab (ifn, *ref_node))
@@ -1243,26 +1304,12 @@ complex_fms_pattern::matches (complex_operation_t op,
       ops->quick_push (right_op[1]);
       ops->quick_push (left_op[1]);
     }
-  else if (kind == PERM_TOP)
-    {
-      ops->quick_push (l0node[0]);
-      ops->quick_push (right_op[1]);
-      ops->quick_push (right_op[0]);
-      ops->quick_push (left_op[0]);
-    }
-  else if (kind == PERM_EVENEVEN && !is_neg)
-    {
-      ops->quick_push (l0node[0]);
-      ops->quick_push (right_op[1]);
-      ops->quick_push (right_op[0]);
-      ops->quick_push (left_op[0]);
-    }
   else
     {
       ops->quick_push (l0node[0]);
       ops->quick_push (right_op[1]);
       ops->quick_push (right_op[0]);
-      ops->quick_push (left_op[1]);
+      ops->quick_push (left_op[0]);
     }
 
   return ifn;
@@ -1272,13 +1319,14 @@ complex_fms_pattern::matches (complex_operation_t op,
 
 vect_pattern*
 complex_fms_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache,
+				slp_compat_nodes_map_t *compat_cache,
 				slp_tree *node)
 {
   auto_vec<slp_tree> ops;
   complex_operation_t op
     = vect_detect_pair_op (*node, true, &ops);
   internal_fn ifn
-    = complex_fms_pattern::matches (op, perm_cache, node, &ops);
+    = complex_fms_pattern::matches (op, perm_cache, compat_cache, node, &ops);
   if (ifn == IFN_LAST)
     return NULL;
 
@@ -1305,9 +1353,9 @@ complex_fms_pattern::build (vec_info *vinfo)
   SLP_TREE_CHILDREN (*this->m_node).create (3);
 
   /* First re-arrange the children.  */
-  SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[0]);
   SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[1]);
   SLP_TREE_CHILDREN (*this->m_node).quick_push (newnode);
+  SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[0]);
 
   /* And then rewrite the node itself.  */
   complex_pattern::build (vinfo);
@@ -1334,11 +1382,12 @@ class complex_operations_pattern : public complex_pattern
   public:
     void build (vec_info *);
     static internal_fn
-    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, slp_tree *,
-	     vec<slp_tree> *);
+    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
+	     slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *);
 
     static vect_pattern*
-    recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
+    recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t *,
+	       slp_tree *);
 };
 
 /* Dummy matches implementation for proxy object.  */
@@ -1347,6 +1396,7 @@ internal_fn
 complex_operations_pattern::
 matches (complex_operation_t /* op */,
 	 slp_tree_to_load_perm_map_t * /* perm_cache */,
+	 slp_compat_nodes_map_t * /* compat_cache */,
 	 slp_tree * /* ref_node */, vec<slp_tree> * /* ops */)
 {
   return IFN_LAST;
@@ -1356,6 +1406,7 @@ matches (complex_operation_t /* op */,
 
 vect_pattern*
 complex_operations_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache,
+				       slp_compat_nodes_map_t *ccache,
 				       slp_tree *node)
 {
   auto_vec<slp_tree> ops;
@@ -1363,15 +1414,15 @@ complex_operations_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache,
     = vect_detect_pair_op (*node, true, &ops);
   internal_fn ifn = IFN_LAST;
 
-  ifn  = complex_fms_pattern::matches (op, perm_cache, node, &ops);
+  ifn  = complex_fms_pattern::matches (op, perm_cache, ccache, node, &ops);
   if (ifn != IFN_LAST)
     return complex_fms_pattern::mkInstance (node, &ops, ifn);
 
-  ifn  = complex_mul_pattern::matches (op, perm_cache, node, &ops);
+  ifn  = complex_mul_pattern::matches (op, perm_cache, ccache, node, &ops);
   if (ifn != IFN_LAST)
     return complex_mul_pattern::mkInstance (node, &ops, ifn);
 
-  ifn  = complex_add_pattern::matches (op, perm_cache, node, &ops);
+  ifn  = complex_add_pattern::matches (op, perm_cache, ccache, node, &ops);
   if (ifn != IFN_LAST)
     return complex_add_pattern::mkInstance (node, &ops, ifn);
 
@@ -1398,11 +1449,13 @@ class addsub_pattern : public vect_pattern
     void build (vec_info *);
 
     static vect_pattern*
-    recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
+    recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t *,
+	       slp_tree *);
 };
 
 vect_pattern *
-addsub_pattern::recognize (slp_tree_to_load_perm_map_t *, slp_tree *node_)
+addsub_pattern::recognize (slp_tree_to_load_perm_map_t *,
+			   slp_compat_nodes_map_t *, slp_tree *node_)
 {
   slp_tree node = *node_;
   if (SLP_TREE_CODE (node) != VEC_PERM_EXPR
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index b912c3577df61a694d5bb9e22c5303fe6a48ab6e..cb577f8a612d583254e42bb06a6d7a0875de5e75 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -804,7 +804,7 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned char swap,
 /* Return true if call statements CALL1 and CALL2 are similar enough
    to be combined into the same SLP group.  */
 
-static bool
+bool
 compatible_calls_p (gcall *call1, gcall *call2)
 {
   unsigned int nargs = gimple_call_num_args (call1);
@@ -2907,6 +2907,7 @@ optimize_load_redistribution (scalar_stmts_to_slp_tree_map_t *bst_map,
 static bool
 vect_match_slp_patterns_2 (slp_tree *ref_node, vec_info *vinfo,
 			   slp_tree_to_load_perm_map_t *perm_cache,
+			   slp_compat_nodes_map_t *compat_cache,
 			   hash_set<slp_tree> *visited)
 {
   unsigned i;
@@ -2918,11 +2919,13 @@ vect_match_slp_patterns_2 (slp_tree *ref_node, vec_info *vinfo,
   slp_tree child;
   FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
     found_p |= vect_match_slp_patterns_2 (&SLP_TREE_CHILDREN (node)[i],
-					  vinfo, perm_cache, visited);
+					  vinfo, perm_cache, compat_cache,
+					  visited);
 
   for (unsigned x = 0; x < num__slp_patterns; x++)
     {
-      vect_pattern *pattern = slp_patterns[x] (perm_cache, ref_node);
+      vect_pattern *pattern
+	= slp_patterns[x] (perm_cache, compat_cache, ref_node);
       if (pattern)
 	{
 	  pattern->build (vinfo);
@@ -2943,7 +2946,8 @@ vect_match_slp_patterns_2 (slp_tree *ref_node, vec_info *vinfo,
 static bool
 vect_match_slp_patterns (slp_instance instance, vec_info *vinfo,
 			 hash_set<slp_tree> *visited,
-			 slp_tree_to_load_perm_map_t *perm_cache)
+			 slp_tree_to_load_perm_map_t *perm_cache,
+			 slp_compat_nodes_map_t *compat_cache)
 {
   DUMP_VECT_SCOPE ("vect_match_slp_patterns");
   slp_tree *ref_node = &SLP_INSTANCE_TREE (instance);
@@ -2953,7 +2957,8 @@ vect_match_slp_patterns (slp_instance instance, vec_info *vinfo,
 		     "Analyzing SLP tree %p for patterns\n",
 		     SLP_INSTANCE_TREE (instance));
 
-  return vect_match_slp_patterns_2 (ref_node, vinfo, perm_cache, visited);
+  return vect_match_slp_patterns_2 (ref_node, vinfo, perm_cache, compat_cache,
+				    visited);
 }
 
 /* STMT_INFO is a store group of size GROUP_SIZE that we are considering
@@ -3437,12 +3442,14 @@ vect_analyze_slp (vec_info *vinfo, unsigned max_tree_size)
 
   hash_set<slp_tree> visited_patterns;
   slp_tree_to_load_perm_map_t perm_cache;
+  slp_compat_nodes_map_t compat_cache;
 
   /* See if any patterns can be found in the SLP tree.  */
   bool pattern_found = false;
   FOR_EACH_VEC_ELT (LOOP_VINFO_SLP_INSTANCES (vinfo), i, instance)
     pattern_found |= vect_match_slp_patterns (instance, vinfo,
-					      &visited_patterns, &perm_cache);
+					      &visited_patterns, &perm_cache,
+					      &compat_cache);
 
   /* If any were found optimize permutations of loads.  */
   if (pattern_found)
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 2f6e1e268fb07e9de065ff9c45af87546e565d66..83cd0919c7838c65576e1debd881e0ec636a605a 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -2268,6 +2268,7 @@ extern void duplicate_and_interleave (vec_info *, gimple_seq *, tree,
 extern int vect_get_place_in_interleaving_chain (stmt_vec_info, stmt_vec_info);
 extern slp_tree vect_create_new_slp_node (unsigned, tree_code);
 extern void vect_free_slp_tree (slp_tree);
+extern bool compatible_calls_p (gcall *, gcall *);
 
 /* In tree-vect-patterns.c.  */
 extern void
@@ -2306,6 +2307,12 @@ typedef enum _complex_perm_kinds {
 typedef hash_map <slp_tree, complex_perm_kinds_t>
   slp_tree_to_load_perm_map_t;
 
+/* Cache from nodes pair to being compatible or not.  */
+typedef pair_hash <nofree_ptr_hash <_slp_tree>,
+		   nofree_ptr_hash <_slp_tree>> slp_node_hash;
+typedef hash_map <slp_node_hash, bool> slp_compat_nodes_map_t;
+
+
 /* Vector pattern matcher base class.  All SLP pattern matchers must inherit
    from this type.  */
 
@@ -2338,7 +2345,8 @@ class vect_pattern
   public:
 
     /* Create a new instance of the pattern matcher class of the given type.  */
-    static vect_pattern* recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
+    static vect_pattern* recognize (slp_tree_to_load_perm_map_t *,
+				    slp_compat_nodes_map_t *, slp_tree *);
 
     /* Build the pattern from the data collected so far.  */
     virtual void build (vec_info *) = 0;
@@ -2352,6 +2360,7 @@ class vect_pattern
 
 /* Function pointer to create a new pattern matcher from a generic type.  */
 typedef vect_pattern* (*vect_pattern_decl_t) (slp_tree_to_load_perm_map_t *,
+					      slp_compat_nodes_map_t *,
 					      slp_tree *);
 
 /* List of supported pattern matchers.  */

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [2/3 PATCH]AArch64 use canonical ordering for complex mul, fma and fms
  2021-12-17 16:48     ` Richard Sandiford
@ 2021-12-20 16:20       ` Tamar Christina
  2022-01-11  7:10         ` Tamar Christina
  2022-02-01 11:04         ` Richard Sandiford
  0 siblings, 2 replies; 18+ messages in thread
From: Tamar Christina @ 2021-12-20 16:20 UTC (permalink / raw)
  To: Richard Sandiford
  Cc: gcc-patches, nd, Richard Earnshaw, Marcus Shawcroft, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 6536 bytes --]



> -----Original Message-----
> From: Richard Sandiford <richard.sandiford@arm.com>
> Sent: Friday, December 17, 2021 4:49 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Marcus Shawcroft
> <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> Subject: Re: [2/3 PATCH]AArch64 use canonical ordering for complex mul,
> fma and fms
> 
> Richard Sandiford <richard.sandiford@arm.com> writes:
> > Tamar Christina <tamar.christina@arm.com> writes:
> >> Hi All,
> >>
> >> After the first patch in the series this updates the optabs to expect
> >> the canonical sequence.
> >>
> >> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >>
> >> Ok for master? and backport along with the first patch?
> >>
> >> Thanks,
> >> Tamar
> >>
> >> gcc/ChangeLog:
> >>
> >> 	PR tree-optimization/102819
> >> 	PR tree-optimization/103169
> >> 	* config/aarch64/aarch64-simd.md
> (cml<fcmac1><conj_op><mode>4,
> >> 	cmul<conj_op><mode>3): Use canonical order.
> >> 	* config/aarch64/aarch64-sve.md (cml<fcmac1><conj_op><mode>4,
> >> 	cmul<conj_op><mode>3): Likewise.
> >>
> >> --- inline copy of patch --
> >> diff --git a/gcc/config/aarch64/aarch64-simd.md
> >> b/gcc/config/aarch64/aarch64-simd.md
> >> index
> >>
> f95a7e1d91c97c9e981d75e71f0b49c02ef748ba..875896ee71324712c8034eeff9
> c
> >> fb5649f9b0e73 100644
> >> --- a/gcc/config/aarch64/aarch64-simd.md
> >> +++ b/gcc/config/aarch64/aarch64-simd.md
> >> @@ -556,17 +556,17 @@ (define_insn
> "aarch64_fcmlaq_lane<rot><mode>"
> >>  ;; remainder.  Because of this, expand early.
> >>  (define_expand "cml<fcmac1><conj_op><mode>4"
> >>    [(set (match_operand:VHSDF 0 "register_operand")
> >> -	(plus:VHSDF (match_operand:VHSDF 1 "register_operand")
> >> -		    (unspec:VHSDF [(match_operand:VHSDF 2
> "register_operand")
> >> -				   (match_operand:VHSDF 3
> "register_operand")]
> >> -				   FCMLA_OP)))]
> >> +	(plus:VHSDF (unspec:VHSDF [(match_operand:VHSDF 1
> "register_operand")
> >> +				   (match_operand:VHSDF 2
> "register_operand")]
> >> +				   FCMLA_OP)
> >> +		    (match_operand:VHSDF 3 "register_operand")))]
> >>    "TARGET_COMPLEX && !BYTES_BIG_ENDIAN"
> >>  {
> >>    rtx tmp = gen_reg_rtx (<MODE>mode);
> >> -  emit_insn (gen_aarch64_fcmla<rotsplit1><mode> (tmp, operands[1],
> >> -						 operands[3], operands[2]));
> >> +  emit_insn (gen_aarch64_fcmla<rotsplit1><mode> (tmp, operands[3],
> >> +						 operands[1], operands[2]));
> >>    emit_insn (gen_aarch64_fcmla<rotsplit2><mode> (operands[0], tmp,
> >> -						 operands[3], operands[2]));
> >> +						 operands[1], operands[2]));
> >>    DONE;
> >>  })
> >>
> >> @@ -583,9 +583,9 @@ (define_expand "cmul<conj_op><mode>3"
> >>    rtx tmp = force_reg (<MODE>mode, CONST0_RTX (<MODE>mode));
> >>    rtx res1 = gen_reg_rtx (<MODE>mode);
> >>    emit_insn (gen_aarch64_fcmla<rotsplit1><mode> (res1, tmp,
> >> -						 operands[2], operands[1]));
> >> +						 operands[1], operands[2]));
> >>    emit_insn (gen_aarch64_fcmla<rotsplit2><mode> (operands[0], res1,
> >> -						 operands[2], operands[1]));
> >> +						 operands[1], operands[2]));
> >
> > This doesn't look right.  Going from the documentation, patch 1 isn't
> > changing the operand order for CMUL: the conjugated operand (if there
> > is one) is still operand 2.  The FCMLA sequences use the opposite
> > order, where the conjugated operand (if there is one) is operand 1.
> > So I think
> 
> I meant “the first multiplication operand” rather than “operand 1” here.
> 
> > the reversal here is still needed.
> >
> > Same for the multiplication operands in CML* above.

I did actually change the order in patch 1, but didn't update the docs..
That was done because I followed the SLP order again, but now I've updated
them to do what the docs say.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master? and backport along with the first patch?

Thanks,
Tamar

gcc/ChangeLog:

	PR tree-optimization/102819
	PR tree-optimization/103169
	* config/aarch64/aarch64-simd.md (cml<fcmac1><conj_op><mode>4): Use
	canonical order.
	* config/aarch64/aarch64-sve.md (cml<fcmac1><conj_op><mode>4): Likewise.

--- inline copy of patch ---

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index f95a7e1d91c97c9e981d75e71f0b49c02ef748ba..9e41610fba85862ef7675bea1e5731b14cab59ce 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -556,17 +556,17 @@ (define_insn "aarch64_fcmlaq_lane<rot><mode>"
 ;; remainder.  Because of this, expand early.
 (define_expand "cml<fcmac1><conj_op><mode>4"
   [(set (match_operand:VHSDF 0 "register_operand")
-	(plus:VHSDF (match_operand:VHSDF 1 "register_operand")
-		    (unspec:VHSDF [(match_operand:VHSDF 2 "register_operand")
-				   (match_operand:VHSDF 3 "register_operand")]
-				   FCMLA_OP)))]
+	(plus:VHSDF (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand")
+				   (match_operand:VHSDF 2 "register_operand")]
+				   FCMLA_OP)
+		    (match_operand:VHSDF 3 "register_operand")))]
   "TARGET_COMPLEX && !BYTES_BIG_ENDIAN"
 {
   rtx tmp = gen_reg_rtx (<MODE>mode);
-  emit_insn (gen_aarch64_fcmla<rotsplit1><mode> (tmp, operands[1],
-						 operands[3], operands[2]));
+  emit_insn (gen_aarch64_fcmla<rotsplit1><mode> (tmp, operands[3],
+						 operands[2], operands[1]));
   emit_insn (gen_aarch64_fcmla<rotsplit2><mode> (operands[0], tmp,
-						 operands[3], operands[2]));
+						 operands[2], operands[1]));
   DONE;
 })
 
diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md
index 9ef968840c20a3049901b3f8a919cf27ded1da3e..9ed19017c480b88779e9e3b08c0e031be60a8c12 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -7278,11 +7278,11 @@ (define_expand "cml<fcmac1><conj_op><mode>4"
   rtx tmp = gen_reg_rtx (<MODE>mode);
   emit_insn
     (gen_aarch64_pred_fcmla<sve_rot1><mode> (tmp, operands[4],
-					     operands[3], operands[2],
-					     operands[1], operands[5]));
+					     operands[2], operands[1],
+					     operands[3], operands[5]));
   emit_insn
     (gen_aarch64_pred_fcmla<sve_rot2><mode> (operands[0], operands[4],
-					     operands[3], operands[2],
+					     operands[2], operands[1],
 					     tmp, operands[5]));
   DONE;
 })

[-- Attachment #2: rb15164.patch --]
[-- Type: application/octet-stream, Size: 2111 bytes --]

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index f95a7e1d91c97c9e981d75e71f0b49c02ef748ba..9e41610fba85862ef7675bea1e5731b14cab59ce 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -556,17 +556,17 @@ (define_insn "aarch64_fcmlaq_lane<rot><mode>"
 ;; remainder.  Because of this, expand early.
 (define_expand "cml<fcmac1><conj_op><mode>4"
   [(set (match_operand:VHSDF 0 "register_operand")
-	(plus:VHSDF (match_operand:VHSDF 1 "register_operand")
-		    (unspec:VHSDF [(match_operand:VHSDF 2 "register_operand")
-				   (match_operand:VHSDF 3 "register_operand")]
-				   FCMLA_OP)))]
+	(plus:VHSDF (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand")
+				   (match_operand:VHSDF 2 "register_operand")]
+				   FCMLA_OP)
+		    (match_operand:VHSDF 3 "register_operand")))]
   "TARGET_COMPLEX && !BYTES_BIG_ENDIAN"
 {
   rtx tmp = gen_reg_rtx (<MODE>mode);
-  emit_insn (gen_aarch64_fcmla<rotsplit1><mode> (tmp, operands[1],
-						 operands[3], operands[2]));
+  emit_insn (gen_aarch64_fcmla<rotsplit1><mode> (tmp, operands[3],
+						 operands[2], operands[1]));
   emit_insn (gen_aarch64_fcmla<rotsplit2><mode> (operands[0], tmp,
-						 operands[3], operands[2]));
+						 operands[2], operands[1]));
   DONE;
 })
 
diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md
index 9ef968840c20a3049901b3f8a919cf27ded1da3e..9ed19017c480b88779e9e3b08c0e031be60a8c12 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -7278,11 +7278,11 @@ (define_expand "cml<fcmac1><conj_op><mode>4"
   rtx tmp = gen_reg_rtx (<MODE>mode);
   emit_insn
     (gen_aarch64_pred_fcmla<sve_rot1><mode> (tmp, operands[4],
-					     operands[3], operands[2],
-					     operands[1], operands[5]));
+					     operands[2], operands[1],
+					     operands[3], operands[5]));
   emit_insn
     (gen_aarch64_pred_fcmla<sve_rot2><mode> (operands[0], operands[4],
-					     operands[3], operands[2],
+					     operands[2], operands[1],
 					     tmp, operands[5]));
   DONE;
 })

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [3/3 PATCH][AArch32] use canonical ordering for complex mul, fma and fms
  2021-12-17 15:43 ` [3/3 PATCH][AArch32] " Tamar Christina
@ 2021-12-20 16:22   ` Tamar Christina
  2022-01-11  7:10     ` Tamar Christina
  2022-02-01  9:56     ` Kyrylo Tkachov
  0 siblings, 2 replies; 18+ messages in thread
From: Tamar Christina @ 2021-12-20 16:22 UTC (permalink / raw)
  To: gcc-patches
  Cc: nd, Ramana Radhakrishnan, Richard Earnshaw, nickc, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 1749 bytes --]

Updated version of patch following AArch64 review.

Bootstrapped Regtested on arm-none-linux-gnueabihf and no issues.

Ok for master? and backport along with the first patch?

Thanks,
Tamar

gcc/ChangeLog:

	PR tree-optimization/102819
	PR tree-optimization/103169
	* config/arm/vec-common.md (cml<fcmac1><conj_op><mode>4): Use
	canonical order.

--- inline copy of patch ---

diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index e71d9b3811fde62159f5c21944fef9fe3f97b4bd..eab77ac8decce76d70f5b2594f4439e6ed363e6e 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -265,18 +265,18 @@ (define_expand "arm_vcmla<rot><mode>"
 ;; remainder.  Because of this, expand early.
 (define_expand "cml<fcmac1><conj_op><mode>4"
   [(set (match_operand:VF 0 "register_operand")
-	(plus:VF (match_operand:VF 1 "register_operand")
-		 (unspec:VF [(match_operand:VF 2 "register_operand")
-			     (match_operand:VF 3 "register_operand")]
-			    VCMLA_OP)))]
+	(plus:VF (unspec:VF [(match_operand:VF 1 "register_operand")
+			     (match_operand:VF 2 "register_operand")]
+			    VCMLA_OP)
+		 (match_operand:VF 3 "register_operand")))]
   "(TARGET_COMPLEX || (TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT
 		      && ARM_HAVE_<MODE>_ARITH)) && !BYTES_BIG_ENDIAN"
 {
   rtx tmp = gen_reg_rtx (<MODE>mode);
-  emit_insn (gen_arm_vcmla<rotsplit1><mode> (tmp, operands[1],
-					     operands[3], operands[2]));
+  emit_insn (gen_arm_vcmla<rotsplit1><mode> (tmp, operands[3],
+					     operands[2], operands[1]));
   emit_insn (gen_arm_vcmla<rotsplit2><mode> (operands[0], tmp,
-					     operands[3], operands[2]));
+					     operands[2], operands[1]));
   DONE;
 })


[-- Attachment #2: rb15165.patch --]
[-- Type: application/octet-stream, Size: 1325 bytes --]

diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index e71d9b3811fde62159f5c21944fef9fe3f97b4bd..eab77ac8decce76d70f5b2594f4439e6ed363e6e 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -265,18 +265,18 @@ (define_expand "arm_vcmla<rot><mode>"
 ;; remainder.  Because of this, expand early.
 (define_expand "cml<fcmac1><conj_op><mode>4"
   [(set (match_operand:VF 0 "register_operand")
-	(plus:VF (match_operand:VF 1 "register_operand")
-		 (unspec:VF [(match_operand:VF 2 "register_operand")
-			     (match_operand:VF 3 "register_operand")]
-			    VCMLA_OP)))]
+	(plus:VF (unspec:VF [(match_operand:VF 1 "register_operand")
+			     (match_operand:VF 2 "register_operand")]
+			    VCMLA_OP)
+		 (match_operand:VF 3 "register_operand")))]
   "(TARGET_COMPLEX || (TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT
 		      && ARM_HAVE_<MODE>_ARITH)) && !BYTES_BIG_ENDIAN"
 {
   rtx tmp = gen_reg_rtx (<MODE>mode);
-  emit_insn (gen_arm_vcmla<rotsplit1><mode> (tmp, operands[1],
-					     operands[3], operands[2]));
+  emit_insn (gen_arm_vcmla<rotsplit1><mode> (tmp, operands[3],
+					     operands[2], operands[1]));
   emit_insn (gen_arm_vcmla<rotsplit2><mode> (operands[0], tmp,
-					     operands[3], operands[2]));
+					     operands[2], operands[1]));
   DONE;
 })
 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [1/3 PATCH]middle-end vect: Simplify and extend the complex numbers validation routines.
  2021-12-20 16:18   ` Tamar Christina
@ 2022-01-10 10:16     ` Tamar Christina
  0 siblings, 0 replies; 18+ messages in thread
From: Tamar Christina @ 2022-01-10 10:16 UTC (permalink / raw)
  To: Richard Sandiford, GCC Patches; +Cc: nd, rguenther

ping

> -----Original Message-----
> From: Tamar Christina
> Sent: Monday, December 20, 2021 4:19 PM
> To: Richard Sandiford <richard.sandiford@arm.com>; Tamar Christina via 
> Gcc- patches <gcc-patches@gcc.gnu.org>
> Cc: nd <nd@arm.com>; rguenther@suse.de
> Subject: RE: [1/3 PATCH]middle-end vect: Simplify and extend the 
> complex numbers validation routines.
> 
> 
> 
> > -----Original Message-----
> > From: Richard Sandiford <richard.sandiford@arm.com>
> > Sent: Friday, December 17, 2021 4:19 PM
> > To: Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org>
> > Cc: Tamar Christina <Tamar.Christina@arm.com>; nd <nd@arm.com>; 
> > rguenther@suse.de
> > Subject: Re: [1/3 PATCH]middle-end vect: Simplify and extend the 
> > complex numbers validation routines.
> >
> > Just a comment on the documentation:
> >
> > Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index
> > >
> >
> 9ec051e94e10cca9eec2773e1b8c01b74b6ea4db..60dc5b3ea6087c2824ad1467
> > bc66
> > > e9cfebe9dcfc 100644
> > > --- a/gcc/doc/md.texi
> > > +++ b/gcc/doc/md.texi
> > > @@ -6325,12 +6325,12 @@ Perform a vector multiply and accumulate 
> > > that is semantically the same as  a multiply and accumulate of 
> > > complex
> numbers.
> > >
> > >  @smallexample
> > > -  complex TYPE c[N];
> > > -  complex TYPE a[N];
> > > -  complex TYPE b[N];
> > > +  complex TYPE op0[N];
> > > +  complex TYPE op1[N];
> > > +  complex TYPE op2[N];
> > >    for (int i = 0; i < N; i += 1)
> > >      @{
> > > -      c[i] += a[i] * b[i];
> > > +      op2[i] += op1[i] * op2[i];
> > >      @}
> >
> > I think this should be:
> >
> >   op0[i] = op1[i] * op2[i] + op3[i];
> >
> > since operand 0 is the output and operand 3 is the accumulator input.
> >
> > Same idea for the others.  For:
> >
> > > @@ -6415,12 +6415,12 @@ Perform a vector multiply that is 
> > > semantically the same as multiply of  complex numbers.
> > >
> > >  @smallexample
> > > -  complex TYPE c[N];
> > > -  complex TYPE a[N];
> > > -  complex TYPE b[N];
> > > +  complex TYPE op0[N];
> > > +  complex TYPE op1[N];
> > > +  complex TYPE op2[N];
> > >    for (int i = 0; i < N; i += 1)
> > >      @{
> > > -      c[i] = a[i] * b[i];
> > > +      op2[i] = op0[i] * op1[i];
> >
> > …this I think it should be:
> >
> >   op0[i] = op1[i] * op2[i];
> 
> Updated patch attached.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu 
> and no regressions.
> 
> Ok for master? and backport to GCC 11 after some stew?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	PR tree-optimization/102819
> 	PR tree-optimization/103169
> 	* doc/md.texi: Update docs for cfms, cfma.
> 	* tree-data-ref.h (same_data_refs): Accept optional offset.
> 	* tree-vect-slp-patterns.c (is_linear_load_p): Fix issue with repeating
> 	patterns.
> 	(vect_normalize_conj_loc): Remove.
> 	(is_eq_or_top): Change to take two nodes.
> 	(enum _conj_status, compatible_complex_nodes_p,
> 	vect_validate_multiplication): New.
> 	(class complex_add_pattern, complex_add_pattern::matches,
> 	complex_add_pattern::recognize, class complex_mul_pattern,
> 	complex_mul_pattern::recognize, class complex_fms_pattern,
> 	complex_fms_pattern::recognize, class complex_operations_pattern,
> 	complex_operations_pattern::recognize,
> addsub_pattern::recognize): Pass
> 	new cache.
> 	(complex_fms_pattern::matches, complex_mul_pattern::matches):
> Pass new
> 	cache and use new validation code.
> 	* tree-vect-slp.c (vect_match_slp_patterns_2, 
> vect_match_slp_patterns,
> 	vect_analyze_slp): Pass along cache.
> 	(compatible_calls_p): Expose.
> 	* tree-vectorizer.h (compatible_calls_p, slp_node_hash,
> 	slp_compat_nodes_map_t): New.
> 	(class vect_pattern): Update signatures include new cache.
> 
> gcc/testsuite/ChangeLog:
> 
> 	PR tree-optimization/102819
> 	PR tree-optimization/103169
> 	* g++.dg/vect/pr99149.cc: xfail for now.
> 	* gcc.dg/vect/complex/pr102819-1.c: New test.
> 	* gcc.dg/vect/complex/pr102819-2.c: New test.
> 	* gcc.dg/vect/complex/pr102819-3.c: New test.
> 	* gcc.dg/vect/complex/pr102819-4.c: New test.
> 	* gcc.dg/vect/complex/pr102819-5.c: New test.
> 	* gcc.dg/vect/complex/pr102819-6.c: New test.
> 	* gcc.dg/vect/complex/pr102819-7.c: New test.
> 	* gcc.dg/vect/complex/pr102819-8.c: New test.
> 	* gcc.dg/vect/complex/pr102819-9.c: New test.
> 	* gcc.dg/vect/complex/pr103169.c: New test.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 
> 9ec051e94e10cca9eec2773e1b8c01b74b6ea4db..ad06b02d36876082afe4c3f3f
> b51887f7a522b23 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -6325,12 +6325,13 @@ Perform a vector multiply and accumulate that 
> is semantically the same as  a multiply and accumulate of complex numbers.
> 
>  @smallexample
> -  complex TYPE c[N];
> -  complex TYPE a[N];
> -  complex TYPE b[N];
> +  complex TYPE op0[N];
> +  complex TYPE op1[N];
> +  complex TYPE op2[N];
> +  complex TYPE op3[N];
>    for (int i = 0; i < N; i += 1)
>      @{
> -      c[i] += a[i] * b[i];
> +      op0[i] = op1[i] * op2[i] + op3[i];
>      @}
>  @end smallexample
> 
> @@ -6348,12 +6349,13 @@ the same as a multiply and accumulate of 
> complex numbers where the second  multiply arguments is conjugated.
> 
>  @smallexample
> -  complex TYPE c[N];
> -  complex TYPE a[N];
> -  complex TYPE b[N];
> +  complex TYPE op0[N];
> +  complex TYPE op1[N];
> +  complex TYPE op2[N];
> +  complex TYPE op3[N];
>    for (int i = 0; i < N; i += 1)
>      @{
> -      c[i] += a[i] * conj (b[i]);
> +      op0[i] = op1[i] * conj (op2[i]) + op3[i];
>      @}
>  @end smallexample
> 
> @@ -6370,12 +6372,13 @@ Perform a vector multiply and subtract that is 
> semantically the same as  a multiply and subtract of complex numbers.
> 
>  @smallexample
> -  complex TYPE c[N];
> -  complex TYPE a[N];
> -  complex TYPE b[N];
> +  complex TYPE op0[N];
> +  complex TYPE op1[N];
> +  complex TYPE op2[N];
> +  complex TYPE op3[N];
>    for (int i = 0; i < N; i += 1)
>      @{
> -      c[i] -= a[i] * b[i];
> +      op0[i] = op1[i] * op2[i] - op3[i];
>      @}
>  @end smallexample
> 
> @@ -6393,12 +6396,13 @@ the same as a multiply and subtract of complex 
> numbers where the second  multiply arguments is conjugated.
> 
>  @smallexample
> -  complex TYPE c[N];
> -  complex TYPE a[N];
> -  complex TYPE b[N];
> +  complex TYPE op0[N];
> +  complex TYPE op1[N];
> +  complex TYPE op2[N];
> +  complex TYPE op3[N];
>    for (int i = 0; i < N; i += 1)
>      @{
> -      c[i] -= a[i] * conj (b[i]);
> +      op0[i] = op1[i] * conj (op2[i]) - op3[i];
>      @}
>  @end smallexample
> 
> @@ -6415,12 +6419,12 @@ Perform a vector multiply that is semantically 
> the same as multiply of  complex numbers.
> 
>  @smallexample
> -  complex TYPE c[N];
> -  complex TYPE a[N];
> -  complex TYPE b[N];
> +  complex TYPE op0[N];
> +  complex TYPE op1[N];
> +  complex TYPE op2[N];
>    for (int i = 0; i < N; i += 1)
>      @{
> -      c[i] = a[i] * b[i];
> +      op0[i] = op1[i] * op2[i];
>      @}
>  @end smallexample
> 
> @@ -6437,12 +6441,12 @@ Perform a vector multiply by conjugate that is 
> semantically the same as a  multiply of complex numbers where the 
> second multiply arguments is conjugated.
> 
>  @smallexample
> -  complex TYPE c[N];
> -  complex TYPE a[N];
> -  complex TYPE b[N];
> +  complex TYPE op0[N];
> +  complex TYPE op1[N];
> +  complex TYPE op2[N];
>    for (int i = 0; i < N; i += 1)
>      @{
> -      c[i] = a[i] * conj (b[i]);
> +      op0[i] = op1[i] * conj (op2[i]);
>      @}
>  @end smallexample
> 
> diff --git a/gcc/testsuite/g++.dg/vect/pr99149.cc
> b/gcc/testsuite/g++.dg/vect/pr99149.cc
> index
> e6e0594a336fa053ffba64a12e2de43a4e373f49..bb9f5fa89f12b184368bf5488d
> 6e9432c2166463 100755
> --- a/gcc/testsuite/g++.dg/vect/pr99149.cc
> +++ b/gcc/testsuite/g++.dg/vect/pr99149.cc
> @@ -24,4 +24,4 @@ public:
>  } n;
>  main() { n.j(); }
> 
> -/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_MUL" 1 "slp2" } } 
> */
> +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_MUL" 1 "slp2" { 
> +xfail { vect_float } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-1.c
> b/gcc/testsuite/gcc.dg/vect/complex/pr102819-1.c
> new file mode 100644
> index
> 0000000000000000000000000000000000000000..46b9a55f05279d732fa1418e02
> f779cf693ede07
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-1.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-add-options arm_v8_3a_complex_neon } */
> +
> +float f[12][100];
> +
> +void bad1(float v1, float v2)
> +{
> +  for (int r = 0; r < 100; r += 4)
> +    {
> +      int i = r + 1;
> +      f[0][r] = f[1][r] * (f[2][r] + v2) - f[1][i] * (f[2][i] + v1);
> +      f[0][i] = f[1][r] * (f[2][i] + v1) + f[1][i] * (f[2][r] + v2);
> +      f[0][r+2] = f[1][r+2] * (f[2][r+2] + v2) - f[1][i+2] * (f[2][i+2] + v1);
> +      f[0][i+2] = f[1][r+2] * (f[2][i+2] + v1) + f[1][i+2] * (f[2][r+2] + v2);
> +      //                  ^^^^^^^             ^^^^^^^
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target { 
> +vect_float } } } } */
> +
> diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-2.c
> b/gcc/testsuite/gcc.dg/vect/complex/pr102819-2.c
> new file mode 100644
> index
> 0000000000000000000000000000000000000000..ffe646efe57f7ad07541b0fb96
> 601596f46dc5f8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-2.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-add-options arm_v8_3a_complex_neon } */
> +
> +float f[12][100];
> +
> +void bad1(float v1, float v2)
> +{
> +  for (int r = 0; r < 100; r += 2)
> +    {
> +      int i = r + 1;
> +      f[0][r] = f[1][r] * (f[2][r] + v1) - f[1][i] * (f[2][i] + v2);
> +      f[0][i] = f[1][r] * (f[2][i] + v1) + f[1][i] * (f[2][r] + v2);
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { 
> +target { vect_float } } } } */
> +
> diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-3.c
> b/gcc/testsuite/gcc.dg/vect/complex/pr102819-3.c
> new file mode 100644
> index
> 0000000000000000000000000000000000000000..5f98aa204d8b11b0cb433f8965
> dbb72cf8940de1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-3.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-add-options arm_v8_3a_complex_neon } */
> +
> +float f[12][100];
> +
> +void good1(float v1, float v2)
> +{
> +  for (int r = 0; r < 100; r += 2)
> +    {
> +      int i = r + 1;
> +      f[0][r] = f[1][r] * (f[2][r] + v2) - f[1][i] * (f[2][i] + v1);
> +      f[0][i] = f[1][r] * (f[2][i] + v1) + f[1][i] * (f[2][r] + v2);
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target { 
> +vect_float } } } } */
> +
> diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-4.c
> b/gcc/testsuite/gcc.dg/vect/complex/pr102819-4.c
> new file mode 100644
> index
> 0000000000000000000000000000000000000000..882851789c5085e73400060911
> 4be480d3b08bd0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-4.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-add-options arm_v8_3a_complex_neon } */
> +
> +float f[12][100];
> +
> +void good1()
> +{
> +  for (int r = 0; r < 100; r += 2)
> +    {
> +      int i = r + 1;
> +      f[0][r] = f[1][r] * f[2][r] - f[1][i] * f[2][i];
> +      f[0][i] = f[1][r] * f[2][i] + f[1][i] * f[2][r];
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target { 
> +vect_float } } } } */
> +
> diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-5.c
> b/gcc/testsuite/gcc.dg/vect/complex/pr102819-5.c
> new file mode 100644
> index
> 0000000000000000000000000000000000000000..6a2d549d65f3f27d407fb0bd46
> 9473e6a5c333ae
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-5.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-add-options arm_v8_3a_complex_neon } */
> +
> +float f[12][100];
> +
> +void good2()
> +{
> +  for (int r = 0; r < 100; r += 2)
> +    {
> +      int i = r + 1;
> +      f[0][r] = f[1][r] * (f[2][r] + 1) - f[1][i] * (f[2][i] + 1);
> +      f[0][i] = f[1][r] * (f[2][i] + 1) + f[1][i] * (f[2][r] + 1);
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target { 
> +vect_float } } } } */
> +
> diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-6.c
> b/gcc/testsuite/gcc.dg/vect/complex/pr102819-6.c
> new file mode 100644
> index
> 0000000000000000000000000000000000000000..71e66dbe3b29eec1fffb8df9b
> 216022fdc0af54e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-6.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-add-options arm_v8_3a_complex_neon } */
> +
> +float f[12][100];
> +
> +void bad1()
> +{
> +  for (int r = 0; r < 100; r += 2)
> +    {
> +      int i = r + 1;
> +      f[0][r] = f[1][r] * f[2][r] - f[1][i] * f[3][i];
> +      f[0][i] = f[1][r] * f[2][i] + f[1][i] * f[3][r];
> +      //                  ^^^^^^^             ^^^^^^^
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { 
> +target { vect_float } } } } */
> +
> diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-7.c
> b/gcc/testsuite/gcc.dg/vect/complex/pr102819-7.c
> new file mode 100644
> index
> 0000000000000000000000000000000000000000..536672f3c8bb474ad5fa4bb61
> b3a36b555acf3cf
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-7.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-add-options arm_v8_3a_complex_neon } */
> +
> +float f[12][100];
> +
> +void bad2()
> +{
> +  for (int r = 0; r < 100; r += 2)
> +    {
> +      int i = r + 1;
> +      f[0][r] = f[1][r] * (f[2][r] + 1) - f[1][i] * f[2][i];
> +      f[0][i] = f[1][r] * (f[2][i] + 1) + f[1][i] * f[2][r];
> +      //                          ^^^^
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { 
> +target { vect_float } } } } */
> +
> diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-8.c
> b/gcc/testsuite/gcc.dg/vect/complex/pr102819-8.c
> new file mode 100644
> index
> 0000000000000000000000000000000000000000..07b48148688b7d530e5891d02
> 3d558b58a485c23
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-8.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-add-options arm_v8_3a_complex_neon } */
> +
> +float f[12][100];
> +
> +void bad3()
> +{
> +  for (int r = 0; r < 100; r += 2)
> +    {
> +      int i = r + 1;
> +      f[0][r] = f[1][r] * f[2][r] - f[1][r] * f[2][i];
> +      f[0][i] = f[1][r] * f[2][i] + f[1][i] * f[2][r];
> +      //                            ^^^^^^^
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { 
> +target { vect_float } } } } */
> +
> diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-9.c
> b/gcc/testsuite/gcc.dg/vect/complex/pr102819-9.c
> new file mode 100644
> index
> 0000000000000000000000000000000000000000..7655852434b21b381fe7ee316
> e8caf3d485b8ee1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-9.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-add-options arm_v8_3a_complex_neon } */
> +
> +#include <stdio.h>
> +#include <complex.h>
> +
> +#define N 200
> +#define TYPE float
> +#define TYPE2 float
> +
> +void g (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE 
> +complex c[restrict N]) {
> +  for (int i=0; i < N; i++)
> +    {
> +      c[i] -=  a[i] * b[0];
> +    }
> +}
> +
> +/* The pattern overlaps with COMPLEX_ADD so we need to support 
> +consuming ADDs in COMPLEX_FMS.  */
> +
> +/* { dg-final { scan-tree-dump "Found COMPLEX_FMS" "vect" { xfail { 
> +vect_float } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr103169.c
> b/gcc/testsuite/gcc.dg/vect/complex/pr103169.c
> new file mode 100644
> index
> 0000000000000000000000000000000000000000..1bfabbd85a0eedfb4156a8257
> 4324126e9083fc5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/complex/pr103169.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile { target { vect_double } } } */
> +/* { dg-add-options arm_v8_3a_complex_neon } */
> +/* { dg-additional-options "-O2 -fvect-cost-model=unlimited" } */
> +
> +_Complex double b_0, c_0;
> +
> +void
> +mul270snd (void)
> +{
> +  c_0 = b_0 * 1.0iF * 1.0iF;
> +}
> +
> diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h index
> 74f579c9f3f23bac25d21546068c2ab43209aa2b..8ad5fa521279b20fa5e63eecf44
> 2d5dc5c16e7ee 100644
> --- a/gcc/tree-data-ref.h
> +++ b/gcc/tree-data-ref.h
> @@ -600,10 +600,11 @@ same_data_refs_base_objects (data_reference_p a, 
> data_reference_p b)  }
> 
>  /* Return true when the data references A and B are accessing the same
> -   memory object with the same access functions.  */
> +   memory object with the same access functions.  Optionally skip the
> +   last OFFSET dimensions in the data reference.  */
> 
>  static inline bool
> -same_data_refs (data_reference_p a, data_reference_p b)
> +same_data_refs (data_reference_p a, data_reference_p b, int offset = 
> +0)
>  {
>    unsigned int i;
> 
> @@ -614,7 +615,7 @@ same_data_refs (data_reference_p a, 
> data_reference_p b)
>    if (!same_data_refs_base_objects (a, b))
>      return false;
> 
> -  for (i = 0; i < DR_NUM_DIMENSIONS (a); i++)
> +  for (i = offset; i < DR_NUM_DIMENSIONS (a); i++)
>      if (!eq_evolutions_p (DR_ACCESS_FN (a, i), DR_ACCESS_FN (b, i)))
>        return false;
> 
> diff --git a/gcc/tree-vect-slp-patterns.c 
> b/gcc/tree-vect-slp-patterns.c index 
> 0350441fad9690cd5d04337171ca3470a064a571..020c29bba08c5bd80503a2dbc
> 04292f8fd310b3c 100644
> --- a/gcc/tree-vect-slp-patterns.c
> +++ b/gcc/tree-vect-slp-patterns.c
> @@ -149,12 +149,13 @@ is_linear_load_p (load_permutation_t loads)
>    int valid_patterns = 4;
>    FOR_EACH_VEC_ELT (loads, i, load)
>      {
> -      if (candidates[0] != PERM_UNKNOWN && load != 1)
> +      unsigned adj_load = load % 2;
> +      if (candidates[0] != PERM_UNKNOWN && adj_load != 1)
>  	{
>  	  candidates[0] = PERM_UNKNOWN;
>  	  valid_patterns--;
>  	}
> -      if (candidates[1] != PERM_UNKNOWN && load != 0)
> +      if (candidates[1] != PERM_UNKNOWN && adj_load != 0)
>  	{
>  	  candidates[1] = PERM_UNKNOWN;
>  	  valid_patterns--;
> @@ -596,11 +597,12 @@ class complex_add_pattern : public 
> complex_pattern
>    public:
>      void build (vec_info *);
>      static internal_fn
> -    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
> slp_tree *,
> -	     vec<slp_tree> *);
> +    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
> +	     slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *);
> 
>      static vect_pattern*
> -    recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
> +    recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t
> *,
> +	       slp_tree *);
> 
>      static vect_pattern*
>      mkInstance (slp_tree *node, vec<slp_tree> *m_ops, internal_fn 
> ifn) @@
> -647,6 +649,7 @@ complex_add_pattern::build (vec_info *vinfo)  
> internal_fn complex_add_pattern::matches (complex_operation_t op,
>  			      slp_tree_to_load_perm_map_t *perm_cache,
> +			      slp_compat_nodes_map_t * /* compat_cache */,
>  			      slp_tree *node, vec<slp_tree> *ops)  {
>    internal_fn ifn = IFN_LAST;
> @@ -692,13 +695,14 @@ complex_add_pattern::matches 
> (complex_operation_t op,
> 
>  vect_pattern*
>  complex_add_pattern::recognize (slp_tree_to_load_perm_map_t 
> *perm_cache,
> +				slp_compat_nodes_map_t *compat_cache,
>  				slp_tree *node)
>  {
>    auto_vec<slp_tree> ops;
>    complex_operation_t op
>      = vect_detect_pair_op (*node, true, &ops);
>    internal_fn ifn
> -    = complex_add_pattern::matches (op, perm_cache, node, &ops);
> +    = complex_add_pattern::matches (op, perm_cache, compat_cache, 
> + node, &ops);
>    if (ifn == IFN_LAST)
>      return NULL;
> 
> @@ -709,147 +713,214 @@ complex_add_pattern::recognize 
> (slp_tree_to_load_perm_map_t *perm_cache,
>   * complex_mul_pattern
> 
> **********************************************************
> ********************/
> 
> -/* Check to see if either of the trees in ARGS are a NEGATE_EXPR.  If 
> the first
> -   child (args[0]) is a NEGATE_EXPR then NEG_FIRST_P is set to TRUE.
> -
> -   If a negate is found then the values in ARGS are reordered such that the
> -   negate node is always the second one and the entry is replaced by the
> child
> -   of the negate node.  */
> +/* Helper function to check if PERM is KIND or PERM_TOP.  */
> 
>  static inline bool
> -vect_normalize_conj_loc (vec<slp_tree> &args, bool *neg_first_p = 
> NULL)
> +is_eq_or_top (slp_tree_to_load_perm_map_t *perm_cache,
> +	      slp_tree op1, complex_perm_kinds_t kind1,
> +	      slp_tree op2, complex_perm_kinds_t kind2)
>  {
> -  gcc_assert (args.length () == 2);
> -  bool neg_found = false;
> -
> -  if (vect_match_expression_p (args[0], NEGATE_EXPR))
> -    {
> -      std::swap (args[0], args[1]);
> -      neg_found = true;
> -      if (neg_first_p)
> -	*neg_first_p = true;
> -    }
> -  else if (vect_match_expression_p (args[1], NEGATE_EXPR))
> -    {
> -      neg_found = true;
> -      if (neg_first_p)
> -	*neg_first_p = false;
> -    }
> +  complex_perm_kinds_t perm1 = linear_loads_p (perm_cache, op1);  if
> + (perm1 != kind1 && perm1 != PERM_TOP)
> +    return false;
> 
> -  if (neg_found)
> -    args[1] = SLP_TREE_CHILDREN (args[1])[0];
> +  complex_perm_kinds_t perm2 = linear_loads_p (perm_cache, op2);  if
> + (perm2 != kind2 && perm2 != PERM_TOP)
> +    return false;
> 
> -  return neg_found;
> +  return true;
>  }
> 
> -/* Helper function to check if PERM is KIND or PERM_TOP.  */
> +enum _conj_status { CONJ_NONE, CONJ_FST, CONJ_SND };
> 
>  static inline bool
> -is_eq_or_top (complex_perm_kinds_t perm, complex_perm_kinds_t kind)
> +compatible_complex_nodes_p (slp_compat_nodes_map_t
> *compat_cache,
> +			    slp_tree a, int *pa, slp_tree b, int *pb)
>  {
> -  return perm == kind || perm == PERM_TOP; -}
> +  bool *tmp;
> +  std::pair<slp_tree, slp_tree> key = std::make_pair(a, b);  if ((tmp 
> + = compat_cache->get (key)) != NULL)
> +    return *tmp;
> 
> -/* Helper function that checks to see if LEFT_OP and RIGHT_OP are 
> both MULT_EXPR
> -   nodes but also that they represent an operation that is either a complex
> -   multiplication or a complex multiplication by conjugated value.
> +   compat_cache->put (key, false);
> 
> -   Of the negation is expected to be in the first half of the tree (As required
> -   by an FMS pattern) then NEG_FIRST is true.  If the operation is a conjugate
> -   operation then CONJ_FIRST_OPERAND is set to indicate whether the first
> or
> -   second operand contains the conjugate operation.  */
> +  if (SLP_TREE_CHILDREN (a).length () != SLP_TREE_CHILDREN (b).length ())
> +    return false;
> 
> -static inline bool
> -vect_validate_multiplication (slp_tree_to_load_perm_map_t *perm_cache,
> -			      const vec<slp_tree> &left_op,
> -			      const vec<slp_tree> &right_op,
> -			     bool neg_first, bool *conj_first_operand,
> -			     bool fms)
> -{
> -  /* The presence of a negation indicates that we have either a 
> conjugate or a
> -     rotation.  We need to distinguish which one.  */
> -  *conj_first_operand = false;
> -  complex_perm_kinds_t kind;
> -
> -  /* Complex conjugates have the negation on the imaginary part of the
> -     number where rotations affect the real component.  So check if the
> -     negation is on a dup of lane 1.  */
> -  if (fms)
> +  if (SLP_TREE_DEF_TYPE (a) != SLP_TREE_DEF_TYPE (b))
> +    return false;
> +
> +  /* Only internal nodes can be loads, as such we can't check further if they
> +     are externals.  */
> +  if (SLP_TREE_DEF_TYPE (a) != vect_internal_def)
>      {
> -      /* Canonicalization for fms is not consistent. So have to test both
> -	 variants to be sure.  This needs to be fixed in the mid-end so
> -	 this part can be simpler.  */
> -      kind = linear_loads_p (perm_cache, right_op[0]);
> -      if (!((is_eq_or_top (linear_loads_p (perm_cache, right_op[0]),
> PERM_ODDODD)
> -	   && is_eq_or_top (linear_loads_p (perm_cache, right_op[1]),
> -			     PERM_ODDEVEN))
> -	  || (kind == PERM_ODDEVEN
> -	      && is_eq_or_top (linear_loads_p (perm_cache, right_op[1]),
> -			     PERM_ODDODD))))
> -	return false;
> +      for (unsigned i = 0; i < SLP_TREE_SCALAR_OPS (a).length (); i++)
> +	{
> +	  tree op1 = SLP_TREE_SCALAR_OPS (a)[pa[i % 2]];
> +	  tree op2 = SLP_TREE_SCALAR_OPS (b)[pb[i % 2]];
> +	  if (!operand_equal_p (op1, op2, 0))
> +	    return false;
> +	}
> +
> +      compat_cache->put (key, true);
> +      return true;
>      }
> +
> +  auto a_stmt = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE (a));  auto 
> + b_stmt = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE (b));
> +
> +  if (gimple_code (a_stmt) != gimple_code (b_stmt))
> +    return false;
> +
> +  /* code, children, type, externals, loads, constants  */  if 
> + (gimple_num_args (a_stmt) != gimple_num_args (b_stmt))
> +    return false;
> +
> +  /* At this point, a and b are known to be the same gimple operations.
> +*/
> +  if (is_gimple_call (a_stmt))
> +    {
> +	if (!compatible_calls_p (dyn_cast <gcall *> (a_stmt),
> +				 dyn_cast <gcall *> (b_stmt)))
> +	  return false;
> +    }
> +  else if (!is_gimple_assign (a_stmt))
> +    return false;
>    else
>      {
> -      if (linear_loads_p (perm_cache, right_op[1]) != PERM_ODDODD
> -	  && !is_eq_or_top (linear_loads_p (perm_cache, right_op[0]),
> -			    PERM_ODDEVEN))
> +      tree_code acode = gimple_assign_rhs_code (a_stmt);
> +      tree_code bcode = gimple_assign_rhs_code (b_stmt);
> +      if ((acode == REALPART_EXPR || acode == IMAGPART_EXPR)
> +	  && (bcode == REALPART_EXPR || bcode == IMAGPART_EXPR))
> +	return true;
> +
> +      if (acode != bcode)
>  	return false;
>      }
> 
> -  /* Deal with differences in indexes.  */
> -  int index1 = fms ? 1 : 0;
> -  int index2 = fms ? 0 : 1;
> -
> -  /* Check if the conjugate is on the second first or second operand.  The
> -     order of the node with the conjugate value determines this, and the dup
> -     node must be one of lane 0 of the same DR as the neg node.  */
> -  kind = linear_loads_p (perm_cache, left_op[index1]);
> -  if (kind == PERM_TOP)
> +  if (!SLP_TREE_LOAD_PERMUTATION (a).exists ()
> +      || !SLP_TREE_LOAD_PERMUTATION (b).exists ())
>      {
> -      if (linear_loads_p (perm_cache, left_op[index2]) == PERM_EVENODD)
> -	return true;
> +      for (unsigned i = 0; i < gimple_num_args (a_stmt); i++)
> +	{
> +	  tree t1 = gimple_arg (a_stmt, i);
> +	  tree t2 = gimple_arg (b_stmt, i);
> +	  if (TREE_CODE (t1) != TREE_CODE (t2))
> +	    return false;
> +
> +	  /* If SSA name then we will need to inspect the children
> +	     so we can punt here.  */
> +	  if (TREE_CODE (t1) == SSA_NAME)
> +	    continue;
> +
> +	  if (!operand_equal_p (t1, t2, 0))
> +	    return false;
> +	}
>      }
> -  else if (kind == PERM_EVENODD && !neg_first)
> +  else
>      {
> -      if ((kind = linear_loads_p (perm_cache, left_op[index2])) !=
> PERM_EVENEVEN)
> +      auto dr1 = STMT_VINFO_DATA_REF (SLP_TREE_REPRESENTATIVE (a));
> +      auto dr2 = STMT_VINFO_DATA_REF (SLP_TREE_REPRESENTATIVE (b));
> +      /* Don't check the last dimension as that's checked by the lineary
> +	 checks.  This check is also much stricter than what we need
> +	 because it doesn't consider loading from adjacent elements
> +	 in the same struct as loading from the same base object.
> +	 But for now, I'll play it safe.  */
> +      if (!same_data_refs (dr1, dr2, 1))
>  	return false;
> -      return true;
>      }
> -  else if (kind == PERM_EVENEVEN && neg_first)
> +
> +  for (unsigned i = 0; i < SLP_TREE_CHILDREN (a).length (); i++)
>      {
> -      if ((kind = linear_loads_p (perm_cache, left_op[index2])) !=
> PERM_EVENODD)
> +      if (!compatible_complex_nodes_p (compat_cache,
> +				       SLP_TREE_CHILDREN (a)[i], pa,
> +				       SLP_TREE_CHILDREN (b)[i], pb))
>  	return false;
> -
> -      *conj_first_operand = true;
> -      return true;
>      }
> -  else
> -    return false;
> -
> -  if (kind != PERM_EVENEVEN)
> -    return false;
> 
> +  compat_cache->put (key, true);
>    return true;
>  }
> 
> -/* Helper function to help distinguish between a conjugate and a 
> rotation in a
> -   complex multiplication.  The operations have similar shapes but the order
> of
> -   the load permutes are different.  This function returns TRUE when the
> order
> -   is consistent with a multiplication or multiplication by conjugated
> -   operand but returns FALSE if it's a multiplication by rotated operand.  */
> -
>  static inline bool
>  vect_validate_multiplication (slp_tree_to_load_perm_map_t *perm_cache,
> -			      const vec<slp_tree> &op,
> -			      complex_perm_kinds_t permKind)
> +			      slp_compat_nodes_map_t *compat_cache,
> +			      vec<slp_tree> &left_op,
> +			      vec<slp_tree> &right_op,
> +			      bool subtract,
> +			      enum _conj_status *_status)
>  {
> -  /* The left node is the more common case, test it first.  */
> -  if (!is_eq_or_top (linear_loads_p (perm_cache, op[0]), permKind))
> +  auto_vec<slp_tree> ops;
> +  enum _conj_status stats = CONJ_NONE;
> +
> +  /* The complex operations can occur in two layouts and two permute
> sequences
> +     so declare them and re-use them.  */
> +  int styles[][4] = { { 0, 2, 1, 3} /* {L1, R1} + {L2, R2}.  */
> +		    , { 0, 3, 1, 2} /* {L1, R2} + {L2, R1}.  */
> +		    };
> +
> +  /* Now for the corresponding permutes that go with these values.  
> + */ complex_perm_kinds_t perms[][4]
> +    = { { PERM_EVENEVEN, PERM_ODDODD, PERM_EVENODD,
> PERM_ODDEVEN }
> +      , { PERM_EVENODD, PERM_ODDEVEN, PERM_EVENEVEN,
> PERM_ODDODD }
> +      };
> +
> +  /* These permutes are used during comparisons of externals on which
> +     we require strict equality.  */
> +  int cq[][4][2]
> +    = { { { 0, 0 }, { 1, 1 }, { 0, 1 }, { 1, 0 } }
> +      , { { 0, 1 }, { 1, 0 }, { 0, 0 }, { 1, 1 } }
> +      };
> +
> +  /* Default to style and perm 0, most operations use this one.  */ 
> + int style = 0;  int perm = subtract ? 1 : 0;
> +
> +  /* Check if we have a negate operation, if so absorb the node and
> continue
> +     looking.  */
> +  bool neg0 = vect_match_expression_p (right_op[0], NEGATE_EXPR);  
> + bool
> + neg1 = vect_match_expression_p (right_op[1], NEGATE_EXPR);
> +
> +  /* Determine which style we're looking at.  We only have different ones
> +     whenever a conjugate is involved.  */  if (neg0 && neg1)
> +    ;
> +  else if (neg0)
>      {
> -      if (!is_eq_or_top (linear_loads_p (perm_cache, op[1]), permKind))
> -	return false;
> +      right_op[0] = SLP_TREE_CHILDREN (right_op[0])[0];
> +      stats = CONJ_FST;
> +      if (subtract)
> +	perm = 0;
>      }
> -  return true;
> +  else if (neg1)
> +    {
> +      right_op[1] = SLP_TREE_CHILDREN (right_op[1])[0];
> +      stats = CONJ_SND;
> +      perm = 1;
> +    }
> +
> +  *_status = stats;
> +
> +  /* Flatten the inputs after we've remapped them.  */  ops.create 
> + (4); ops.safe_splice (left_op);  ops.safe_splice (right_op);
> +
> +  /* Extract out the elements to check.  */  slp_tree op0 = 
> + ops[styles[style][0]];  slp_tree op1 = ops[styles[style][1]]; 
> + slp_tree op2 = ops[styles[style][2]];  slp_tree op3 = 
> + ops[styles[style][3]];
> +
> +  /* Do cheapest test first.  If failed no need to analyze further.  
> + */ if (linear_loads_p (perm_cache, op0) != perms[perm][0]
> +      || linear_loads_p (perm_cache, op1) != perms[perm][1]
> +      || !is_eq_or_top (perm_cache, op2, perms[perm][2], op3,
> perms[perm][3]))
> +    return false;
> +
> +  return compatible_complex_nodes_p (compat_cache, op0, cq[perm][0],
> op1,
> +				     cq[perm][1])
> +	 && compatible_complex_nodes_p (compat_cache, op2,
> cq[perm][2], op3,
> +					cq[perm][3]);
>  }
> 
>  /* This function combines two nodes containing only even and only odd 
> lanes @@ -908,11 +979,12 @@ class complex_mul_pattern : public 
> complex_pattern
>    public:
>      void build (vec_info *);
>      static internal_fn
> -    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
> slp_tree *,
> -	     vec<slp_tree> *);
> +    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
> +	     slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *);
> 
>      static vect_pattern*
> -    recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
> +    recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t
> *,
> +	       slp_tree *);
> 
>      static vect_pattern*
>      mkInstance (slp_tree *node, vec<slp_tree> *m_ops, internal_fn 
> ifn) @@
> -943,6 +1015,7 @@ class complex_mul_pattern : public complex_pattern 
> internal_fn  complex_mul_pattern::matches (complex_operation_t op,
>  			      slp_tree_to_load_perm_map_t *perm_cache,
> +			      slp_compat_nodes_map_t *compat_cache,
>  			      slp_tree *node, vec<slp_tree> *ops)  {
>    internal_fn ifn = IFN_LAST;
> @@ -990,17 +1063,13 @@ complex_mul_pattern::matches 
> (complex_operation_t op,
>        || linear_loads_p (perm_cache, left_op[1]) == PERM_ODDEVEN)
>      return IFN_LAST;
> 
> -  bool neg_first = false;
> -  bool conj_first_operand = false;
> -  bool is_neg = vect_normalize_conj_loc (right_op, &neg_first);
> +  enum _conj_status status;
> +  if (!vect_validate_multiplication (perm_cache, compat_cache, left_op,
> +				     right_op, false, &status))
> +    return IFN_LAST;
> 
> -  if (!is_neg)
> +  if (status == CONJ_NONE)
>      {
> -      /* A multiplication needs to multiply agains the real pair, otherwise
> -	 the pattern matches that of FMS.   */
> -      if (!vect_validate_multiplication (perm_cache, left_op, PERM_EVENEVEN)
> -	  || vect_normalize_conj_loc (left_op))
> -	return IFN_LAST;
>        if (add0)
>  	ifn = IFN_COMPLEX_FMA;
>        else
> @@ -1008,11 +1077,6 @@ complex_mul_pattern::matches 
> (complex_operation_t op,
>      }
>    else
>      {
> -      if (!vect_validate_multiplication (perm_cache, left_op, right_op,
> -					 neg_first, &conj_first_operand,
> -					 false))
> -	return IFN_LAST;
> -
>        if(add0)
>  	ifn = IFN_COMPLEX_FMA_CONJ;
>        else
> @@ -1029,19 +1093,13 @@ complex_mul_pattern::matches 
> (complex_operation_t op,
>      ops->quick_push (add0);
> 
>    complex_perm_kinds_t kind = linear_loads_p (perm_cache, 
> left_op[0]);
> -  if (kind == PERM_EVENODD)
> +  if (kind == PERM_EVENODD || kind == PERM_TOP)
>      {
>        ops->quick_push (left_op[1]);
>        ops->quick_push (right_op[1]);
>        ops->quick_push (left_op[0]);
>      }
> -  else if (kind == PERM_TOP)
> -    {
> -      ops->quick_push (left_op[1]);
> -      ops->quick_push (right_op[1]);
> -      ops->quick_push (left_op[0]);
> -    }
> -  else if (kind == PERM_EVENEVEN && !conj_first_operand)
> +  else if (kind == PERM_EVENEVEN && status != CONJ_SND)
>      {
>        ops->quick_push (left_op[0]);
>        ops->quick_push (right_op[0]);
> @@ -1061,13 +1119,14 @@ complex_mul_pattern::matches 
> (complex_operation_t op,
> 
>  vect_pattern*
>  complex_mul_pattern::recognize (slp_tree_to_load_perm_map_t 
> *perm_cache,
> +				slp_compat_nodes_map_t *compat_cache,
>  				slp_tree *node)
>  {
>    auto_vec<slp_tree> ops;
>    complex_operation_t op
>      = vect_detect_pair_op (*node, true, &ops);
>    internal_fn ifn
> -    = complex_mul_pattern::matches (op, perm_cache, node, &ops);
> +    = complex_mul_pattern::matches (op, perm_cache, compat_cache, 
> + node, &ops);
>    if (ifn == IFN_LAST)
>      return NULL;
> 
> @@ -1115,9 +1174,9 @@ complex_mul_pattern::build (vec_info *vinfo)
> 
>  	/* First re-arrange the children.  */
>  	SLP_TREE_CHILDREN (*this->m_node).safe_grow (3);
> -	SLP_TREE_CHILDREN (*this->m_node)[0] = this->m_ops[0];
> -	SLP_TREE_CHILDREN (*this->m_node)[1] = this->m_ops[3];
> -	SLP_TREE_CHILDREN (*this->m_node)[2] = newnode;
> +	SLP_TREE_CHILDREN (*this->m_node)[0] = this->m_ops[3];
> +	SLP_TREE_CHILDREN (*this->m_node)[1] = newnode;
> +	SLP_TREE_CHILDREN (*this->m_node)[2] = this->m_ops[0];
> 
>  	/* Tell the builder to expect an extra argument.  */
>  	this->m_num_args++;
> @@ -1147,11 +1206,12 @@ class complex_fms_pattern : public 
> complex_pattern
>    public:
>      void build (vec_info *);
>      static internal_fn
> -    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
> slp_tree *,
> -	     vec<slp_tree> *);
> +    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
> +	     slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *);
> 
>      static vect_pattern*
> -    recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
> +    recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t
> *,
> +	       slp_tree *);
> 
>      static vect_pattern*
>      mkInstance (slp_tree *node, vec<slp_tree> *m_ops, internal_fn 
> ifn) @@
> -1182,6 +1242,7 @@ class complex_fms_pattern : public complex_pattern 
> internal_fn  complex_fms_pattern::matches (complex_operation_t op,
>  			      slp_tree_to_load_perm_map_t *perm_cache,
> +			      slp_compat_nodes_map_t *compat_cache,
>  			      slp_tree * ref_node, vec<slp_tree> *ops)  {
>    internal_fn ifn = IFN_LAST;
> @@ -1197,6 +1258,8 @@ complex_fms_pattern::matches 
> (complex_operation_t op,
>    if (!vect_match_expression_p (root, MINUS_EXPR))
>      return IFN_LAST;
> 
> +  /* TODO: Support invariants here, with the new layout CADD now
> +	   can match before we get a chance to try CFMS.  */
>    auto nodes = SLP_TREE_CHILDREN (root);
>    if (!vect_match_expression_p (nodes[1], MULT_EXPR)
>        || vect_detect_pair_op (nodes[0]) != PLUS_MINUS) @@ -1217,16
> +1280,14 @@ complex_fms_pattern::matches (complex_operation_t op,
>        || !vect_match_expression_p (l0node[1], MULT_EXPR))
>      return IFN_LAST;
> 
> -  bool is_neg = vect_normalize_conj_loc (left_op);
> -
> -  bool conj_first_operand = false;
> -  if (!vect_validate_multiplication (perm_cache, right_op, left_op, false,
> -				     &conj_first_operand, true))
> +  enum _conj_status status;
> +  if (!vect_validate_multiplication (perm_cache, compat_cache, right_op,
> +				     left_op, true, &status))
>      return IFN_LAST;
> 
> -  if (!is_neg)
> +  if (status == CONJ_NONE)
>      ifn = IFN_COMPLEX_FMS;
> -  else if (is_neg)
> +  else
>      ifn = IFN_COMPLEX_FMS_CONJ;
> 
>    if (!vect_pattern_validate_optab (ifn, *ref_node)) @@ -1243,26 
> +1304,12 @@ complex_fms_pattern::matches (complex_operation_t op,
>        ops->quick_push (right_op[1]);
>        ops->quick_push (left_op[1]);
>      }
> -  else if (kind == PERM_TOP)
> -    {
> -      ops->quick_push (l0node[0]);
> -      ops->quick_push (right_op[1]);
> -      ops->quick_push (right_op[0]);
> -      ops->quick_push (left_op[0]);
> -    }
> -  else if (kind == PERM_EVENEVEN && !is_neg)
> -    {
> -      ops->quick_push (l0node[0]);
> -      ops->quick_push (right_op[1]);
> -      ops->quick_push (right_op[0]);
> -      ops->quick_push (left_op[0]);
> -    }
>    else
>      {
>        ops->quick_push (l0node[0]);
>        ops->quick_push (right_op[1]);
>        ops->quick_push (right_op[0]);
> -      ops->quick_push (left_op[1]);
> +      ops->quick_push (left_op[0]);
>      }
> 
>    return ifn;
> @@ -1272,13 +1319,14 @@ complex_fms_pattern::matches 
> (complex_operation_t op,
> 
>  vect_pattern*
>  complex_fms_pattern::recognize (slp_tree_to_load_perm_map_t 
> *perm_cache,
> +				slp_compat_nodes_map_t *compat_cache,
>  				slp_tree *node)
>  {
>    auto_vec<slp_tree> ops;
>    complex_operation_t op
>      = vect_detect_pair_op (*node, true, &ops);
>    internal_fn ifn
> -    = complex_fms_pattern::matches (op, perm_cache, node, &ops);
> +    = complex_fms_pattern::matches (op, perm_cache, compat_cache, 
> + node, &ops);
>    if (ifn == IFN_LAST)
>      return NULL;
> 
> @@ -1305,9 +1353,9 @@ complex_fms_pattern::build (vec_info *vinfo)
>    SLP_TREE_CHILDREN (*this->m_node).create (3);
> 
>    /* First re-arrange the children.  */
> -  SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[0]);
>    SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[1]);
>    SLP_TREE_CHILDREN (*this->m_node).quick_push (newnode);
> +  SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[0]);
> 
>    /* And then rewrite the node itself.  */
>    complex_pattern::build (vinfo);
> @@ -1334,11 +1382,12 @@ class complex_operations_pattern : public 
> complex_pattern
>    public:
>      void build (vec_info *);
>      static internal_fn
> -    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
> slp_tree *,
> -	     vec<slp_tree> *);
> +    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
> +	     slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *);
> 
>      static vect_pattern*
> -    recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
> +    recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t
> *,
> +	       slp_tree *);
>  };
> 
>  /* Dummy matches implementation for proxy object.  */ @@ -1347,6
> +1396,7 @@ internal_fn
>  complex_operations_pattern::
>  matches (complex_operation_t /* op */,
>  	 slp_tree_to_load_perm_map_t * /* perm_cache */,
> +	 slp_compat_nodes_map_t * /* compat_cache */,
>  	 slp_tree * /* ref_node */, vec<slp_tree> * /* ops */)  {
>    return IFN_LAST;
> @@ -1356,6 +1406,7 @@ matches (complex_operation_t /* op */,
> 
>  vect_pattern*
>  complex_operations_pattern::recognize (slp_tree_to_load_perm_map_t 
> *perm_cache,
> +				       slp_compat_nodes_map_t *ccache,
>  				       slp_tree *node)
>  {
>    auto_vec<slp_tree> ops;
> @@ -1363,15 +1414,15 @@ complex_operations_pattern::recognize
> (slp_tree_to_load_perm_map_t *perm_cache,
>      = vect_detect_pair_op (*node, true, &ops);
>    internal_fn ifn = IFN_LAST;
> 
> -  ifn  = complex_fms_pattern::matches (op, perm_cache, node, &ops);
> +  ifn  = complex_fms_pattern::matches (op, perm_cache, ccache, node, 
> + &ops);
>    if (ifn != IFN_LAST)
>      return complex_fms_pattern::mkInstance (node, &ops, ifn);
> 
> -  ifn  = complex_mul_pattern::matches (op, perm_cache, node, &ops);
> +  ifn  = complex_mul_pattern::matches (op, perm_cache, ccache, node, 
> + &ops);
>    if (ifn != IFN_LAST)
>      return complex_mul_pattern::mkInstance (node, &ops, ifn);
> 
> -  ifn  = complex_add_pattern::matches (op, perm_cache, node, &ops);
> +  ifn  = complex_add_pattern::matches (op, perm_cache, ccache, node, 
> + &ops);
>    if (ifn != IFN_LAST)
>      return complex_add_pattern::mkInstance (node, &ops, ifn);
> 
> @@ -1398,11 +1449,13 @@ class addsub_pattern : public vect_pattern
>      void build (vec_info *);
> 
>      static vect_pattern*
> -    recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
> +    recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t
> *,
> +	       slp_tree *);
>  };
> 
>  vect_pattern *
> -addsub_pattern::recognize (slp_tree_to_load_perm_map_t *, slp_tree
> *node_)
> +addsub_pattern::recognize (slp_tree_to_load_perm_map_t *,
> +			   slp_compat_nodes_map_t *, slp_tree *node_)
>  {
>    slp_tree node = *node_;
>    if (SLP_TREE_CODE (node) != VEC_PERM_EXPR diff --git 
> a/gcc/tree-vect- slp.c b/gcc/tree-vect-slp.c index
> b912c3577df61a694d5bb9e22c5303fe6a48ab6e..cb577f8a612d583254e42bb06
> a6d7a0875de5e75 100644
> --- a/gcc/tree-vect-slp.c
> +++ b/gcc/tree-vect-slp.c
> @@ -804,7 +804,7 @@ vect_get_and_check_slp_defs (vec_info *vinfo, 
> unsigned char swap,
>  /* Return true if call statements CALL1 and CALL2 are similar enough
>     to be combined into the same SLP group.  */
> 
> -static bool
> +bool
>  compatible_calls_p (gcall *call1, gcall *call2)  {
>    unsigned int nargs = gimple_call_num_args (call1); @@ -2907,6 
> +2907,7 @@ optimize_load_redistribution 
> (scalar_stmts_to_slp_tree_map_t *bst_map, static bool
>  vect_match_slp_patterns_2 (slp_tree *ref_node, vec_info *vinfo,
>  			   slp_tree_to_load_perm_map_t *perm_cache,
> +			   slp_compat_nodes_map_t *compat_cache,
>  			   hash_set<slp_tree> *visited)
>  {
>    unsigned i;
> @@ -2918,11 +2919,13 @@ vect_match_slp_patterns_2 (slp_tree *ref_node, 
> vec_info *vinfo,
>    slp_tree child;
>    FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
>      found_p |= vect_match_slp_patterns_2 (&SLP_TREE_CHILDREN (node)[i],
> -					  vinfo, perm_cache, visited);
> +					  vinfo, perm_cache, compat_cache,
> +					  visited);
> 
>    for (unsigned x = 0; x < num__slp_patterns; x++)
>      {
> -      vect_pattern *pattern = slp_patterns[x] (perm_cache, ref_node);
> +      vect_pattern *pattern
> +	= slp_patterns[x] (perm_cache, compat_cache, ref_node);
>        if (pattern)
>  	{
>  	  pattern->build (vinfo);
> @@ -2943,7 +2946,8 @@ vect_match_slp_patterns_2 (slp_tree *ref_node, 
> vec_info *vinfo,  static bool  vect_match_slp_patterns (slp_instance 
> instance, vec_info *vinfo,
>  			 hash_set<slp_tree> *visited,
> -			 slp_tree_to_load_perm_map_t *perm_cache)
> +			 slp_tree_to_load_perm_map_t *perm_cache,
> +			 slp_compat_nodes_map_t *compat_cache)
>  {
>    DUMP_VECT_SCOPE ("vect_match_slp_patterns");
>    slp_tree *ref_node = &SLP_INSTANCE_TREE (instance); @@ -2953,7
> +2957,8 @@ vect_match_slp_patterns (slp_instance instance, vec_info
> *vinfo,
>  		     "Analyzing SLP tree %p for patterns\n",
>  		     SLP_INSTANCE_TREE (instance));
> 
> -  return vect_match_slp_patterns_2 (ref_node, vinfo, perm_cache, 
> visited);
> +  return vect_match_slp_patterns_2 (ref_node, vinfo, perm_cache,
> compat_cache,
> +				    visited);
>  }
> 
>  /* STMT_INFO is a store group of size GROUP_SIZE that we are 
> considering @@ -3437,12 +3442,14 @@ vect_analyze_slp (vec_info *vinfo, 
> unsigned
> max_tree_size)
> 
>    hash_set<slp_tree> visited_patterns;
>    slp_tree_to_load_perm_map_t perm_cache;
> +  slp_compat_nodes_map_t compat_cache;
> 
>    /* See if any patterns can be found in the SLP tree.  */
>    bool pattern_found = false;
>    FOR_EACH_VEC_ELT (LOOP_VINFO_SLP_INSTANCES (vinfo), i, instance)
>      pattern_found |= vect_match_slp_patterns (instance, vinfo,
> -					      &visited_patterns, &perm_cache);
> +					      &visited_patterns, &perm_cache,
> +					      &compat_cache);
> 
>    /* If any were found optimize permutations of loads.  */
>    if (pattern_found)
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index
> 2f6e1e268fb07e9de065ff9c45af87546e565d66..83cd0919c7838c65576e1debd8
> 81e0ec636a605a 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -2268,6 +2268,7 @@ extern void duplicate_and_interleave (vec_info 
> *, gimple_seq *, tree,  extern int 
> vect_get_place_in_interleaving_chain
> (stmt_vec_info, stmt_vec_info);  extern slp_tree 
> vect_create_new_slp_node (unsigned, tree_code);  extern void 
> vect_free_slp_tree (slp_tree);
> +extern bool compatible_calls_p (gcall *, gcall *);
> 
>  /* In tree-vect-patterns.c.  */
>  extern void
> @@ -2306,6 +2307,12 @@ typedef enum _complex_perm_kinds {  typedef 
> hash_map <slp_tree, complex_perm_kinds_t>
>    slp_tree_to_load_perm_map_t;
> 
> +/* Cache from nodes pair to being compatible or not.  */ typedef 
> +pair_hash <nofree_ptr_hash <_slp_tree>,
> +		   nofree_ptr_hash <_slp_tree>> slp_node_hash; typedef
> hash_map
> +<slp_node_hash, bool> slp_compat_nodes_map_t;
> +
> +
>  /* Vector pattern matcher base class.  All SLP pattern matchers must inherit
>     from this type.  */
> 
> @@ -2338,7 +2345,8 @@ class vect_pattern
>    public:
> 
>      /* Create a new instance of the pattern matcher class of the given type.
> */
> -    static vect_pattern* recognize (slp_tree_to_load_perm_map_t *,
> slp_tree *);
> +    static vect_pattern* recognize (slp_tree_to_load_perm_map_t *,
> +				    slp_compat_nodes_map_t *, slp_tree *);
> 
>      /* Build the pattern from the data collected so far.  */
>      virtual void build (vec_info *) = 0; @@ -2352,6 +2360,7 @@ class 
> vect_pattern
> 
>  /* Function pointer to create a new pattern matcher from a generic 
> type.  */ typedef vect_pattern* (*vect_pattern_decl_t) 
> (slp_tree_to_load_perm_map_t *,
> +					      slp_compat_nodes_map_t *,
>  					      slp_tree *);
> 
>  /* List of supported pattern matchers.  */

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [1/3 PATCH]middle-end vect: Simplify and extend the complex numbers validation routines.
  2021-12-17 15:42 [1/3 PATCH]middle-end vect: Simplify and extend the complex numbers validation routines Tamar Christina
                   ` (2 preceding siblings ...)
  2021-12-17 16:18 ` [1/3 PATCH]middle-end vect: Simplify and extend the complex numbers validation routines Richard Sandiford
@ 2022-01-10 13:00 ` Richard Biener
  2022-01-11  7:31   ` Tamar Christina
  3 siblings, 1 reply; 18+ messages in thread
From: Richard Biener @ 2022-01-10 13:00 UTC (permalink / raw)
  To: Tamar Christina; +Cc: GCC Patches, nd, Richard Guenther

On Fri, Dec 17, 2021 at 4:44 PM Tamar Christina via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi All,
>
> This patch boosts the analysis for complex mul,fma and fms in order to ensure
> that it doesn't create an incorrect output.
>
> Essentially it adds an extra verification to check that the two nodes it's going
> to combine do the same operations on compatible values.  The reason it needs to
> do this is that if one computation differs from the other then with the current
> implementation we have no way to deal with it since we have to remove the
> permute.
>
> When we can keep the permute around we can probably handle these by unrolling.
>
> While implementing this since I have to do the traversal anyway I took advantage
> of it by simplifying the code a bit.  Previously we would determine whether
> something is a conjugate and then try to figure out which conjugate it is and
> then try to see if the permutes match what we expect.
>
> Now the code that does the traversal will detect this in one go and return to us
> whether the operation is something that can be combined and whether a conjugate
> is present.
>
> Secondly because it does this I can now simplify the checking code itself to
> essentially just try to apply fixed patterns to each operation.
>
> The patterns represent the order operations should appear in. For instance a
> complex MUL operation combines :
>
>   Left 1 + Right 1
>   Left 2 + Right 2
>
> with a permute on the nodes consisting of:
>
>   { Even, Even } + { Odd, Odd  }
>   { Even, Odd  } + { Odd, Even }
>
> By abstracting over these patterns the checking code becomes quite simple.
>
> As part of this I was checking the order of the operands which was left in
> "slp" order. as in, the same order they showed up in during SLP, which means
> that the accumulator is first.  However it looks like I didn't document this
> and the x86 optab was implemented assuming the same order as FMA, i.e. that
> the accumulator is last.
>
> I have this changed the order to match that of FMA and FMS which corrects the
> x86 codegen and will update the Arm targets.  This has now also been
> documented.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> x86_64-pc-linux-gnu and no regressions.
>
> Ok for master? and backport to GCC 11 after some stew?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>         PR tree-optimization/102819
>         PR tree-optimization/103169
>         * doc/md.texi: Update docs for cfms, cfma.
>         * tree-data-ref.h (same_data_refs): Accept optional offset.
>         * tree-vect-slp-patterns.c (is_linear_load_p): Fix issue with repeating
>         patterns.
>         (vect_normalize_conj_loc): Remove.
>         (is_eq_or_top): Change to take two nodes.
>         (enum _conj_status, compatible_complex_nodes_p,
>         vect_validate_multiplication): New.
>         (class complex_add_pattern, complex_add_pattern::matches,
>         complex_add_pattern::recognize, class complex_mul_pattern,
>         complex_mul_pattern::recognize, class complex_fms_pattern,
>         complex_fms_pattern::recognize, class complex_operations_pattern,
>         complex_operations_pattern::recognize, addsub_pattern::recognize): Pass
>         new cache.
>         (complex_fms_pattern::matches, complex_mul_pattern::matches): Pass new
>         cache and use new validation code.
>         * tree-vect-slp.c (vect_match_slp_patterns_2, vect_match_slp_patterns,
>         vect_analyze_slp): Pass along cache.
>         (compatible_calls_p): Expose.
>         * tree-vectorizer.h (compatible_calls_p, slp_node_hash,
>         slp_compat_nodes_map_t): New.
>         (class vect_pattern): Update signatures include new cache.
>
> gcc/testsuite/ChangeLog:
>
>         PR tree-optimization/102819
>         PR tree-optimization/103169
>         * g++.dg/vect/pr99149.cc: xfail for now.
>         * gcc.dg/vect/complex/pr102819-1.c: New test.
>         * gcc.dg/vect/complex/pr102819-2.c: New test.
>         * gcc.dg/vect/complex/pr102819-3.c: New test.
>         * gcc.dg/vect/complex/pr102819-4.c: New test.
>         * gcc.dg/vect/complex/pr102819-5.c: New test.
>         * gcc.dg/vect/complex/pr102819-6.c: New test.
>         * gcc.dg/vect/complex/pr102819-7.c: New test.
>         * gcc.dg/vect/complex/pr102819-8.c: New test.
>         * gcc.dg/vect/complex/pr102819-9.c: New test.
>         * gcc.dg/vect/complex/pr103169.c: New test.
>
> --- inline copy of patch --
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 9ec051e94e10cca9eec2773e1b8c01b74b6ea4db..60dc5b3ea6087c2824ad1467bc66e9cfebe9dcfc 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -6325,12 +6325,12 @@ Perform a vector multiply and accumulate that is semantically the same as
>  a multiply and accumulate of complex numbers.
>
>  @smallexample
> -  complex TYPE c[N];
> -  complex TYPE a[N];
> -  complex TYPE b[N];
> +  complex TYPE op0[N];
> +  complex TYPE op1[N];
> +  complex TYPE op2[N];
>    for (int i = 0; i < N; i += 1)
>      @{
> -      c[i] += a[i] * b[i];
> +      op2[i] += op1[i] * op2[i];
>      @}
>  @end smallexample
>
> @@ -6348,12 +6348,12 @@ the same as a multiply and accumulate of complex numbers where the second
>  multiply arguments is conjugated.
>
>  @smallexample
> -  complex TYPE c[N];
> -  complex TYPE a[N];
> -  complex TYPE b[N];
> +  complex TYPE op0[N];
> +  complex TYPE op1[N];
> +  complex TYPE op2[N];
>    for (int i = 0; i < N; i += 1)
>      @{
> -      c[i] += a[i] * conj (b[i]);
> +      op2[i] += op0[i] * conj (op1[i]);
>      @}
>  @end smallexample
>
> @@ -6370,12 +6370,12 @@ Perform a vector multiply and subtract that is semantically the same as
>  a multiply and subtract of complex numbers.
>
>  @smallexample
> -  complex TYPE c[N];
> -  complex TYPE a[N];
> -  complex TYPE b[N];
> +  complex TYPE op0[N];
> +  complex TYPE op1[N];
> +  complex TYPE op2[N];
>    for (int i = 0; i < N; i += 1)
>      @{
> -      c[i] -= a[i] * b[i];
> +      op2[i] -= op0[i] * op1[i];
>      @}
>  @end smallexample
>
> @@ -6393,12 +6393,12 @@ the same as a multiply and subtract of complex numbers where the second
>  multiply arguments is conjugated.
>
>  @smallexample
> -  complex TYPE c[N];
> -  complex TYPE a[N];
> -  complex TYPE b[N];
> +  complex TYPE op0[N];
> +  complex TYPE op1[N];
> +  complex TYPE op2[N];
>    for (int i = 0; i < N; i += 1)
>      @{
> -      c[i] -= a[i] * conj (b[i]);
> +      op2[i] -= op0[i] * conj (op1[i]);
>      @}
>  @end smallexample
>
> @@ -6415,12 +6415,12 @@ Perform a vector multiply that is semantically the same as multiply of
>  complex numbers.
>
>  @smallexample
> -  complex TYPE c[N];
> -  complex TYPE a[N];
> -  complex TYPE b[N];
> +  complex TYPE op0[N];
> +  complex TYPE op1[N];
> +  complex TYPE op2[N];
>    for (int i = 0; i < N; i += 1)
>      @{
> -      c[i] = a[i] * b[i];
> +      op2[i] = op0[i] * op1[i];
>      @}
>  @end smallexample
>
> @@ -6437,12 +6437,12 @@ Perform a vector multiply by conjugate that is semantically the same as a
>  multiply of complex numbers where the second multiply arguments is conjugated.
>
>  @smallexample
> -  complex TYPE c[N];
> -  complex TYPE a[N];
> -  complex TYPE b[N];
> +  complex TYPE op0[N];
> +  complex TYPE op1[N];
> +  complex TYPE op2[N];
>    for (int i = 0; i < N; i += 1)
>      @{
> -      c[i] = a[i] * conj (b[i]);
> +      op2[i] = op0[i] * conj (op1[i]);
>      @}
>  @end smallexample
>
> diff --git a/gcc/testsuite/g++.dg/vect/pr99149.cc b/gcc/testsuite/g++.dg/vect/pr99149.cc
> index e6e0594a336fa053ffba64a12e2de43a4e373f49..bb9f5fa89f12b184368bf5488d6e9432c2166463 100755
> --- a/gcc/testsuite/g++.dg/vect/pr99149.cc
> +++ b/gcc/testsuite/g++.dg/vect/pr99149.cc
> @@ -24,4 +24,4 @@ public:
>  } n;
>  main() { n.j(); }
>
> -/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_MUL" 1 "slp2" } } */
> +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_MUL" 1 "slp2" { xfail { vect_float } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-1.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-1.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..46b9a55f05279d732fa1418e02f779cf693ede07
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-1.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-add-options arm_v8_3a_complex_neon } */
> +
> +float f[12][100];
> +
> +void bad1(float v1, float v2)
> +{
> +  for (int r = 0; r < 100; r += 4)
> +    {
> +      int i = r + 1;
> +      f[0][r] = f[1][r] * (f[2][r] + v2) - f[1][i] * (f[2][i] + v1);
> +      f[0][i] = f[1][r] * (f[2][i] + v1) + f[1][i] * (f[2][r] + v2);
> +      f[0][r+2] = f[1][r+2] * (f[2][r+2] + v2) - f[1][i+2] * (f[2][i+2] + v1);
> +      f[0][i+2] = f[1][r+2] * (f[2][i+2] + v1) + f[1][i+2] * (f[2][r+2] + v2);
> +      //                  ^^^^^^^             ^^^^^^^
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */
> +
> diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-2.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-2.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..ffe646efe57f7ad07541b0fb96601596f46dc5f8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-2.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-add-options arm_v8_3a_complex_neon } */
> +
> +float f[12][100];
> +
> +void bad1(float v1, float v2)
> +{
> +  for (int r = 0; r < 100; r += 2)
> +    {
> +      int i = r + 1;
> +      f[0][r] = f[1][r] * (f[2][r] + v1) - f[1][i] * (f[2][i] + v2);
> +      f[0][i] = f[1][r] * (f[2][i] + v1) + f[1][i] * (f[2][r] + v2);
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */
> +
> diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-3.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-3.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..5f98aa204d8b11b0cb433f8965dbb72cf8940de1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-3.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-add-options arm_v8_3a_complex_neon } */
> +
> +float f[12][100];
> +
> +void good1(float v1, float v2)
> +{
> +  for (int r = 0; r < 100; r += 2)
> +    {
> +      int i = r + 1;
> +      f[0][r] = f[1][r] * (f[2][r] + v2) - f[1][i] * (f[2][i] + v1);
> +      f[0][i] = f[1][r] * (f[2][i] + v1) + f[1][i] * (f[2][r] + v2);
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */
> +
> diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-4.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-4.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..882851789c5085e734000609114be480d3b08bd0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-4.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-add-options arm_v8_3a_complex_neon } */
> +
> +float f[12][100];
> +
> +void good1()
> +{
> +  for (int r = 0; r < 100; r += 2)
> +    {
> +      int i = r + 1;
> +      f[0][r] = f[1][r] * f[2][r] - f[1][i] * f[2][i];
> +      f[0][i] = f[1][r] * f[2][i] + f[1][i] * f[2][r];
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */
> +
> diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-5.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-5.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..6a2d549d65f3f27d407fb0bd469473e6a5c333ae
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-5.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-add-options arm_v8_3a_complex_neon } */
> +
> +float f[12][100];
> +
> +void good2()
> +{
> +  for (int r = 0; r < 100; r += 2)
> +    {
> +      int i = r + 1;
> +      f[0][r] = f[1][r] * (f[2][r] + 1) - f[1][i] * (f[2][i] + 1);
> +      f[0][i] = f[1][r] * (f[2][i] + 1) + f[1][i] * (f[2][r] + 1);
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */
> +
> diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-6.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-6.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..71e66dbe3b29eec1fffb8df9b216022fdc0af54e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-6.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-add-options arm_v8_3a_complex_neon } */
> +
> +float f[12][100];
> +
> +void bad1()
> +{
> +  for (int r = 0; r < 100; r += 2)
> +    {
> +      int i = r + 1;
> +      f[0][r] = f[1][r] * f[2][r] - f[1][i] * f[3][i];
> +      f[0][i] = f[1][r] * f[2][i] + f[1][i] * f[3][r];
> +      //                  ^^^^^^^             ^^^^^^^
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */
> +
> diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-7.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-7.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..536672f3c8bb474ad5fa4bb61b3a36b555acf3cf
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-7.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-add-options arm_v8_3a_complex_neon } */
> +
> +float f[12][100];
> +
> +void bad2()
> +{
> +  for (int r = 0; r < 100; r += 2)
> +    {
> +      int i = r + 1;
> +      f[0][r] = f[1][r] * (f[2][r] + 1) - f[1][i] * f[2][i];
> +      f[0][i] = f[1][r] * (f[2][i] + 1) + f[1][i] * f[2][r];
> +      //                          ^^^^
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */
> +
> diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-8.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-8.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..07b48148688b7d530e5891d023d558b58a485c23
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-8.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-add-options arm_v8_3a_complex_neon } */
> +
> +float f[12][100];
> +
> +void bad3()
> +{
> +  for (int r = 0; r < 100; r += 2)
> +    {
> +      int i = r + 1;
> +      f[0][r] = f[1][r] * f[2][r] - f[1][r] * f[2][i];
> +      f[0][i] = f[1][r] * f[2][i] + f[1][i] * f[2][r];
> +      //                            ^^^^^^^
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { target { vect_float } } } } */
> +
> diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-9.c b/gcc/testsuite/gcc.dg/vect/complex/pr102819-9.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..7655852434b21b381fe7ee316e8caf3d485b8ee1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-9.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-add-options arm_v8_3a_complex_neon } */
> +
> +#include <stdio.h>
> +#include <complex.h>
> +
> +#define N 200
> +#define TYPE float
> +#define TYPE2 float
> +
> +void g (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE complex c[restrict N])
> +{
> +  for (int i=0; i < N; i++)
> +    {
> +      c[i] -=  a[i] * b[0];
> +    }
> +}
> +
> +/* The pattern overlaps with COMPLEX_ADD so we need to support consuming ADDs in COMPLEX_FMS.  */
> +
> +/* { dg-final { scan-tree-dump "Found COMPLEX_FMS" "vect" { xfail { vect_float } } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr103169.c b/gcc/testsuite/gcc.dg/vect/complex/pr103169.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..1bfabbd85a0eedfb4156a82574324126e9083fc5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/complex/pr103169.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile { target { vect_double } } } */
> +/* { dg-add-options arm_v8_3a_complex_neon } */
> +/* { dg-additional-options "-O2 -fvect-cost-model=unlimited" } */
> +
> +_Complex double b_0, c_0;
> +
> +void
> +mul270snd (void)
> +{
> +  c_0 = b_0 * 1.0iF * 1.0iF;
> +}
> +
> diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h
> index 74f579c9f3f23bac25d21546068c2ab43209aa2b..8ad5fa521279b20fa5e63eecf442d5dc5c16e7ee 100644
> --- a/gcc/tree-data-ref.h
> +++ b/gcc/tree-data-ref.h
> @@ -600,10 +600,11 @@ same_data_refs_base_objects (data_reference_p a, data_reference_p b)
>  }
>
>  /* Return true when the data references A and B are accessing the same
> -   memory object with the same access functions.  */
> +   memory object with the same access functions.  Optionally skip the
> +   last OFFSET dimensions in the data reference.  */

But you skip the _first_ dimensions?

Otherwise looks OK to me.

Thanks,
Richard.

>  static inline bool
> -same_data_refs (data_reference_p a, data_reference_p b)
> +same_data_refs (data_reference_p a, data_reference_p b, int offset = 0)
>  {
>    unsigned int i;
>
> @@ -614,7 +615,7 @@ same_data_refs (data_reference_p a, data_reference_p b)
>    if (!same_data_refs_base_objects (a, b))
>      return false;
>
> -  for (i = 0; i < DR_NUM_DIMENSIONS (a); i++)
> +  for (i = offset; i < DR_NUM_DIMENSIONS (a); i++)
>      if (!eq_evolutions_p (DR_ACCESS_FN (a, i), DR_ACCESS_FN (b, i)))
>        return false;
>
> diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
> index 0350441fad9690cd5d04337171ca3470a064a571..f8da4153632a700680091f37305a5d3078fbb0c5 100644
> --- a/gcc/tree-vect-slp-patterns.c
> +++ b/gcc/tree-vect-slp-patterns.c
> @@ -149,12 +149,13 @@ is_linear_load_p (load_permutation_t loads)
>    int valid_patterns = 4;
>    FOR_EACH_VEC_ELT (loads, i, load)
>      {
> -      if (candidates[0] != PERM_UNKNOWN && load != 1)
> +      unsigned adj_load = load % 2;
> +      if (candidates[0] != PERM_UNKNOWN && adj_load != 1)
>         {
>           candidates[0] = PERM_UNKNOWN;
>           valid_patterns--;
>         }
> -      if (candidates[1] != PERM_UNKNOWN && load != 0)
> +      if (candidates[1] != PERM_UNKNOWN && adj_load != 0)
>         {
>           candidates[1] = PERM_UNKNOWN;
>           valid_patterns--;
> @@ -596,11 +597,12 @@ class complex_add_pattern : public complex_pattern
>    public:
>      void build (vec_info *);
>      static internal_fn
> -    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, slp_tree *,
> -            vec<slp_tree> *);
> +    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
> +            slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *);
>
>      static vect_pattern*
> -    recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
> +    recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t *,
> +              slp_tree *);
>
>      static vect_pattern*
>      mkInstance (slp_tree *node, vec<slp_tree> *m_ops, internal_fn ifn)
> @@ -647,6 +649,7 @@ complex_add_pattern::build (vec_info *vinfo)
>  internal_fn
>  complex_add_pattern::matches (complex_operation_t op,
>                               slp_tree_to_load_perm_map_t *perm_cache,
> +                             slp_compat_nodes_map_t * /* compat_cache */,
>                               slp_tree *node, vec<slp_tree> *ops)
>  {
>    internal_fn ifn = IFN_LAST;
> @@ -692,13 +695,14 @@ complex_add_pattern::matches (complex_operation_t op,
>
>  vect_pattern*
>  complex_add_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache,
> +                               slp_compat_nodes_map_t *compat_cache,
>                                 slp_tree *node)
>  {
>    auto_vec<slp_tree> ops;
>    complex_operation_t op
>      = vect_detect_pair_op (*node, true, &ops);
>    internal_fn ifn
> -    = complex_add_pattern::matches (op, perm_cache, node, &ops);
> +    = complex_add_pattern::matches (op, perm_cache, compat_cache, node, &ops);
>    if (ifn == IFN_LAST)
>      return NULL;
>
> @@ -709,147 +713,214 @@ complex_add_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache,
>   * complex_mul_pattern
>   ******************************************************************************/
>
> -/* Check to see if either of the trees in ARGS are a NEGATE_EXPR.  If the first
> -   child (args[0]) is a NEGATE_EXPR then NEG_FIRST_P is set to TRUE.
> -
> -   If a negate is found then the values in ARGS are reordered such that the
> -   negate node is always the second one and the entry is replaced by the child
> -   of the negate node.  */
> +/* Helper function to check if PERM is KIND or PERM_TOP.  */
>
>  static inline bool
> -vect_normalize_conj_loc (vec<slp_tree> &args, bool *neg_first_p = NULL)
> +is_eq_or_top (slp_tree_to_load_perm_map_t *perm_cache,
> +             slp_tree op1, complex_perm_kinds_t kind1,
> +             slp_tree op2, complex_perm_kinds_t kind2)
>  {
> -  gcc_assert (args.length () == 2);
> -  bool neg_found = false;
> -
> -  if (vect_match_expression_p (args[0], NEGATE_EXPR))
> -    {
> -      std::swap (args[0], args[1]);
> -      neg_found = true;
> -      if (neg_first_p)
> -       *neg_first_p = true;
> -    }
> -  else if (vect_match_expression_p (args[1], NEGATE_EXPR))
> -    {
> -      neg_found = true;
> -      if (neg_first_p)
> -       *neg_first_p = false;
> -    }
> +  complex_perm_kinds_t perm1 = linear_loads_p (perm_cache, op1);
> +  if (perm1 != kind1 && perm1 != PERM_TOP)
> +    return false;
>
> -  if (neg_found)
> -    args[1] = SLP_TREE_CHILDREN (args[1])[0];
> +  complex_perm_kinds_t perm2 = linear_loads_p (perm_cache, op2);
> +  if (perm2 != kind2 && perm2 != PERM_TOP)
> +    return false;
>
> -  return neg_found;
> +  return true;
>  }
>
> -/* Helper function to check if PERM is KIND or PERM_TOP.  */
> +enum _conj_status { CONJ_NONE, CONJ_FST, CONJ_SND };
>
>  static inline bool
> -is_eq_or_top (complex_perm_kinds_t perm, complex_perm_kinds_t kind)
> +compatible_complex_nodes_p (slp_compat_nodes_map_t *compat_cache,
> +                           slp_tree a, int *pa, slp_tree b, int *pb)
>  {
> -  return perm == kind || perm == PERM_TOP;
> -}
> +  bool *tmp;
> +  std::pair<slp_tree, slp_tree> key = std::make_pair(a, b);
> +  if ((tmp = compat_cache->get (key)) != NULL)
> +    return *tmp;
>
> -/* Helper function that checks to see if LEFT_OP and RIGHT_OP are both MULT_EXPR
> -   nodes but also that they represent an operation that is either a complex
> -   multiplication or a complex multiplication by conjugated value.
> +   compat_cache->put (key, false);
>
> -   Of the negation is expected to be in the first half of the tree (As required
> -   by an FMS pattern) then NEG_FIRST is true.  If the operation is a conjugate
> -   operation then CONJ_FIRST_OPERAND is set to indicate whether the first or
> -   second operand contains the conjugate operation.  */
> +  if (SLP_TREE_CHILDREN (a).length () != SLP_TREE_CHILDREN (b).length ())
> +    return false;
>
> -static inline bool
> -vect_validate_multiplication (slp_tree_to_load_perm_map_t *perm_cache,
> -                             const vec<slp_tree> &left_op,
> -                             const vec<slp_tree> &right_op,
> -                            bool neg_first, bool *conj_first_operand,
> -                            bool fms)
> -{
> -  /* The presence of a negation indicates that we have either a conjugate or a
> -     rotation.  We need to distinguish which one.  */
> -  *conj_first_operand = false;
> -  complex_perm_kinds_t kind;
> -
> -  /* Complex conjugates have the negation on the imaginary part of the
> -     number where rotations affect the real component.  So check if the
> -     negation is on a dup of lane 1.  */
> -  if (fms)
> +  if (SLP_TREE_DEF_TYPE (a) != SLP_TREE_DEF_TYPE (b))
> +    return false;
> +
> +  /* Only internal nodes can be loads, as such we can't check further if they
> +     are externals.  */
> +  if (SLP_TREE_DEF_TYPE (a) != vect_internal_def)
>      {
> -      /* Canonicalization for fms is not consistent. So have to test both
> -        variants to be sure.  This needs to be fixed in the mid-end so
> -        this part can be simpler.  */
> -      kind = linear_loads_p (perm_cache, right_op[0]);
> -      if (!((is_eq_or_top (linear_loads_p (perm_cache, right_op[0]), PERM_ODDODD)
> -          && is_eq_or_top (linear_loads_p (perm_cache, right_op[1]),
> -                            PERM_ODDEVEN))
> -         || (kind == PERM_ODDEVEN
> -             && is_eq_or_top (linear_loads_p (perm_cache, right_op[1]),
> -                            PERM_ODDODD))))
> -       return false;
> +      for (unsigned i = 0; i < SLP_TREE_SCALAR_OPS (a).length (); i++)
> +       {
> +         tree op1 = SLP_TREE_SCALAR_OPS (a)[pa[i % 2]];
> +         tree op2 = SLP_TREE_SCALAR_OPS (b)[pb[i % 2]];
> +         if (!operand_equal_p (op1, op2, 0))
> +           return false;
> +       }
> +
> +      compat_cache->put (key, true);
> +      return true;
> +    }
> +
> +  auto a_stmt = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE (a));
> +  auto b_stmt = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE (b));
> +
> +  if (gimple_code (a_stmt) != gimple_code (b_stmt))
> +    return false;
> +
> +  /* code, children, type, externals, loads, constants  */
> +  if (gimple_num_args (a_stmt) != gimple_num_args (b_stmt))
> +    return false;
> +
> +  /* At this point, a and b are known to be the same gimple operations.  */
> +  if (is_gimple_call (a_stmt))
> +    {
> +       if (!compatible_calls_p (dyn_cast <gcall *> (a_stmt),
> +                                dyn_cast <gcall *> (b_stmt)))
> +         return false;
>      }
> +  else if (!is_gimple_assign (a_stmt))
> +    return false;
>    else
>      {
> -      if (linear_loads_p (perm_cache, right_op[1]) != PERM_ODDODD
> -         && !is_eq_or_top (linear_loads_p (perm_cache, right_op[0]),
> -                           PERM_ODDEVEN))
> +      tree_code acode = gimple_assign_rhs_code (a_stmt);
> +      tree_code bcode = gimple_assign_rhs_code (b_stmt);
> +      if ((acode == REALPART_EXPR || acode == IMAGPART_EXPR)
> +         && (bcode == REALPART_EXPR || bcode == IMAGPART_EXPR))
> +       return true;
> +
> +      if (acode != bcode)
>         return false;
>      }
>
> -  /* Deal with differences in indexes.  */
> -  int index1 = fms ? 1 : 0;
> -  int index2 = fms ? 0 : 1;
> -
> -  /* Check if the conjugate is on the second first or second operand.  The
> -     order of the node with the conjugate value determines this, and the dup
> -     node must be one of lane 0 of the same DR as the neg node.  */
> -  kind = linear_loads_p (perm_cache, left_op[index1]);
> -  if (kind == PERM_TOP)
> +  if (!SLP_TREE_LOAD_PERMUTATION (a).exists ()
> +      || !SLP_TREE_LOAD_PERMUTATION (b).exists ())
>      {
> -      if (linear_loads_p (perm_cache, left_op[index2]) == PERM_EVENODD)
> -       return true;
> +      for (unsigned i = 0; i < gimple_num_args (a_stmt); i++)
> +       {
> +         tree t1 = gimple_arg (a_stmt, i);
> +         tree t2 = gimple_arg (b_stmt, i);
> +         if (TREE_CODE (t1) != TREE_CODE (t2))
> +           return false;
> +
> +         /* If SSA name then we will need to inspect the children
> +            so we can punt here.  */
> +         if (TREE_CODE (t1) == SSA_NAME)
> +           continue;
> +
> +         if (!operand_equal_p (t1, t2, 0))
> +           return false;
> +       }
>      }
> -  else if (kind == PERM_EVENODD && !neg_first)
> +  else
>      {
> -      if ((kind = linear_loads_p (perm_cache, left_op[index2])) != PERM_EVENEVEN)
> +      auto dr1 = STMT_VINFO_DATA_REF (SLP_TREE_REPRESENTATIVE (a));
> +      auto dr2 = STMT_VINFO_DATA_REF (SLP_TREE_REPRESENTATIVE (b));
> +      /* Don't check the last dimension as that's checked by the lineary
> +        checks.  This check is also much stricter than what we need
> +        because it doesn't consider loading from adjacent elements
> +        in the same struct as loading from the same base object.
> +        But for now, I'll play it safe.  */
> +      if (!same_data_refs (dr1, dr2, 1))
>         return false;
> -      return true;
>      }
> -  else if (kind == PERM_EVENEVEN && neg_first)
> +
> +  for (unsigned i = 0; i < SLP_TREE_CHILDREN (a).length (); i++)
>      {
> -      if ((kind = linear_loads_p (perm_cache, left_op[index2])) != PERM_EVENODD)
> +      if (!compatible_complex_nodes_p (compat_cache,
> +                                      SLP_TREE_CHILDREN (a)[i], pa,
> +                                      SLP_TREE_CHILDREN (b)[i], pb))
>         return false;
> -
> -      *conj_first_operand = true;
> -      return true;
>      }
> -  else
> -    return false;
> -
> -  if (kind != PERM_EVENEVEN)
> -    return false;
>
> +  compat_cache->put (key, true);
>    return true;
>  }
>
> -/* Helper function to help distinguish between a conjugate and a rotation in a
> -   complex multiplication.  The operations have similar shapes but the order of
> -   the load permutes are different.  This function returns TRUE when the order
> -   is consistent with a multiplication or multiplication by conjugated
> -   operand but returns FALSE if it's a multiplication by rotated operand.  */
> -
>  static inline bool
>  vect_validate_multiplication (slp_tree_to_load_perm_map_t *perm_cache,
> -                             const vec<slp_tree> &op,
> -                             complex_perm_kinds_t permKind)
> +                             slp_compat_nodes_map_t *compat_cache,
> +                             vec<slp_tree> &left_op,
> +                             vec<slp_tree> &right_op,
> +                             bool subtract,
> +                             enum _conj_status *_status)
>  {
> -  /* The left node is the more common case, test it first.  */
> -  if (!is_eq_or_top (linear_loads_p (perm_cache, op[0]), permKind))
> +  auto_vec<slp_tree> ops;
> +  enum _conj_status stats = CONJ_NONE;
> +
> +  /* The complex operations can occur in two layouts and two permute sequences
> +     so declare them and re-use them.  */
> +  int styles[][4] = { { 0, 2, 1, 3} /* {L1, R1} + {L2, R2}.  */
> +                   , { 0, 3, 1, 2} /* {L1, R2} + {L2, R1}.  */
> +                   };
> +
> +  /* Now for the corresponding permutes that go with these values.  */
> +  complex_perm_kinds_t perms[][4]
> +    = { { PERM_EVENEVEN, PERM_ODDODD, PERM_EVENODD, PERM_ODDEVEN }
> +      , { PERM_EVENODD, PERM_ODDEVEN, PERM_EVENEVEN, PERM_ODDODD }
> +      };
> +
> +  /* These permutes are used during comparisons of externals on which
> +     we require strict equality.  */
> +  int cq[][4][2]
> +    = { { { 0, 0 }, { 1, 1 }, { 0, 1 }, { 1, 0 } }
> +      , { { 0, 1 }, { 1, 0 }, { 0, 0 }, { 1, 1 } }
> +      };
> +
> +  /* Default to style and perm 0, most operations use this one.  */
> +  int style = 0;
> +  int perm = subtract ? 1 : 0;
> +
> +  /* Check if we have a negate operation, if so absorb the node and continue
> +     looking.  */
> +  bool neg0 = vect_match_expression_p (right_op[0], NEGATE_EXPR);
> +  bool neg1 = vect_match_expression_p (right_op[1], NEGATE_EXPR);
> +
> +  /* Determine which style we're looking at.  We only have different ones
> +     whenever a conjugate is involved.  */
> +  if (neg0 && neg1)
> +    ;
> +  else if (neg0)
>      {
> -      if (!is_eq_or_top (linear_loads_p (perm_cache, op[1]), permKind))
> -       return false;
> +      right_op[0] = SLP_TREE_CHILDREN (right_op[0])[0];
> +      stats = CONJ_FST;
> +      if (subtract)
> +       perm = 0;
>      }
> -  return true;
> +  else if (neg1)
> +    {
> +      right_op[1] = SLP_TREE_CHILDREN (right_op[1])[0];
> +      stats = CONJ_SND;
> +      perm = 1;
> +    }
> +
> +  *_status = stats;
> +
> +  /* Flatten the inputs after we've remapped them.  */
> +  ops.create (4);
> +  ops.safe_splice (left_op);
> +  ops.safe_splice (right_op);
> +
> +  /* Extract out the elements to check.  */
> +  slp_tree op0 = ops[styles[style][0]];
> +  slp_tree op1 = ops[styles[style][1]];
> +  slp_tree op2 = ops[styles[style][2]];
> +  slp_tree op3 = ops[styles[style][3]];
> +
> +  /* Do cheapest test first.  If failed no need to analyze further.  */
> +  if (linear_loads_p (perm_cache, op0) != perms[perm][0]
> +      || linear_loads_p (perm_cache, op1) != perms[perm][1]
> +      || !is_eq_or_top (perm_cache, op2, perms[perm][2], op3, perms[perm][3]))
> +    return false;
> +
> +  return compatible_complex_nodes_p (compat_cache, op0, cq[perm][0], op1,
> +                                    cq[perm][1])
> +        && compatible_complex_nodes_p (compat_cache, op2, cq[perm][2], op3,
> +                                       cq[perm][3]);
>  }
>
>  /* This function combines two nodes containing only even and only odd lanes
> @@ -908,11 +979,12 @@ class complex_mul_pattern : public complex_pattern
>    public:
>      void build (vec_info *);
>      static internal_fn
> -    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, slp_tree *,
> -            vec<slp_tree> *);
> +    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
> +            slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *);
>
>      static vect_pattern*
> -    recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
> +    recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t *,
> +              slp_tree *);
>
>      static vect_pattern*
>      mkInstance (slp_tree *node, vec<slp_tree> *m_ops, internal_fn ifn)
> @@ -943,6 +1015,7 @@ class complex_mul_pattern : public complex_pattern
>  internal_fn
>  complex_mul_pattern::matches (complex_operation_t op,
>                               slp_tree_to_load_perm_map_t *perm_cache,
> +                             slp_compat_nodes_map_t *compat_cache,
>                               slp_tree *node, vec<slp_tree> *ops)
>  {
>    internal_fn ifn = IFN_LAST;
> @@ -990,17 +1063,13 @@ complex_mul_pattern::matches (complex_operation_t op,
>        || linear_loads_p (perm_cache, left_op[1]) == PERM_ODDEVEN)
>      return IFN_LAST;
>
> -  bool neg_first = false;
> -  bool conj_first_operand = false;
> -  bool is_neg = vect_normalize_conj_loc (right_op, &neg_first);
> +  enum _conj_status status;
> +  if (!vect_validate_multiplication (perm_cache, compat_cache, left_op,
> +                                    right_op, false, &status))
> +    return IFN_LAST;
>
> -  if (!is_neg)
> +  if (status == CONJ_NONE)
>      {
> -      /* A multiplication needs to multiply agains the real pair, otherwise
> -        the pattern matches that of FMS.   */
> -      if (!vect_validate_multiplication (perm_cache, left_op, PERM_EVENEVEN)
> -         || vect_normalize_conj_loc (left_op))
> -       return IFN_LAST;
>        if (add0)
>         ifn = IFN_COMPLEX_FMA;
>        else
> @@ -1008,11 +1077,6 @@ complex_mul_pattern::matches (complex_operation_t op,
>      }
>    else
>      {
> -      if (!vect_validate_multiplication (perm_cache, left_op, right_op,
> -                                        neg_first, &conj_first_operand,
> -                                        false))
> -       return IFN_LAST;
> -
>        if(add0)
>         ifn = IFN_COMPLEX_FMA_CONJ;
>        else
> @@ -1029,19 +1093,13 @@ complex_mul_pattern::matches (complex_operation_t op,
>      ops->quick_push (add0);
>
>    complex_perm_kinds_t kind = linear_loads_p (perm_cache, left_op[0]);
> -  if (kind == PERM_EVENODD)
> -    {
> -      ops->quick_push (left_op[1]);
> -      ops->quick_push (right_op[1]);
> -      ops->quick_push (left_op[0]);
> -    }
> -  else if (kind == PERM_TOP)
> +  if (kind == PERM_EVENODD || kind == PERM_TOP)
>      {
>        ops->quick_push (left_op[1]);
>        ops->quick_push (right_op[1]);
>        ops->quick_push (left_op[0]);
>      }
> -  else if (kind == PERM_EVENEVEN && !conj_first_operand)
> +  else if (kind == PERM_EVENEVEN && status != CONJ_SND)
>      {
>        ops->quick_push (left_op[0]);
>        ops->quick_push (right_op[0]);
> @@ -1061,13 +1119,14 @@ complex_mul_pattern::matches (complex_operation_t op,
>
>  vect_pattern*
>  complex_mul_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache,
> +                               slp_compat_nodes_map_t *compat_cache,
>                                 slp_tree *node)
>  {
>    auto_vec<slp_tree> ops;
>    complex_operation_t op
>      = vect_detect_pair_op (*node, true, &ops);
>    internal_fn ifn
> -    = complex_mul_pattern::matches (op, perm_cache, node, &ops);
> +    = complex_mul_pattern::matches (op, perm_cache, compat_cache, node, &ops);
>    if (ifn == IFN_LAST)
>      return NULL;
>
> @@ -1097,8 +1156,8 @@ complex_mul_pattern::build (vec_info *vinfo)
>
>         /* First re-arrange the children.  */
>         SLP_TREE_CHILDREN (*this->m_node).reserve_exact (2);
> -       SLP_TREE_CHILDREN (*this->m_node)[0] = this->m_ops[2];
> -       SLP_TREE_CHILDREN (*this->m_node)[1] = newnode;
> +       SLP_TREE_CHILDREN (*this->m_node)[0] = newnode;
> +       SLP_TREE_CHILDREN (*this->m_node)[1] = this->m_ops[2];
>         break;
>        }
>      case IFN_COMPLEX_FMA:
> @@ -1115,9 +1174,9 @@ complex_mul_pattern::build (vec_info *vinfo)
>
>         /* First re-arrange the children.  */
>         SLP_TREE_CHILDREN (*this->m_node).safe_grow (3);
> -       SLP_TREE_CHILDREN (*this->m_node)[0] = this->m_ops[0];
> +       SLP_TREE_CHILDREN (*this->m_node)[0] = newnode;
>         SLP_TREE_CHILDREN (*this->m_node)[1] = this->m_ops[3];
> -       SLP_TREE_CHILDREN (*this->m_node)[2] = newnode;
> +       SLP_TREE_CHILDREN (*this->m_node)[2] = this->m_ops[0];
>
>         /* Tell the builder to expect an extra argument.  */
>         this->m_num_args++;
> @@ -1147,11 +1206,12 @@ class complex_fms_pattern : public complex_pattern
>    public:
>      void build (vec_info *);
>      static internal_fn
> -    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, slp_tree *,
> -            vec<slp_tree> *);
> +    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
> +            slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *);
>
>      static vect_pattern*
> -    recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
> +    recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t *,
> +              slp_tree *);
>
>      static vect_pattern*
>      mkInstance (slp_tree *node, vec<slp_tree> *m_ops, internal_fn ifn)
> @@ -1182,6 +1242,7 @@ class complex_fms_pattern : public complex_pattern
>  internal_fn
>  complex_fms_pattern::matches (complex_operation_t op,
>                               slp_tree_to_load_perm_map_t *perm_cache,
> +                             slp_compat_nodes_map_t *compat_cache,
>                               slp_tree * ref_node, vec<slp_tree> *ops)
>  {
>    internal_fn ifn = IFN_LAST;
> @@ -1197,6 +1258,8 @@ complex_fms_pattern::matches (complex_operation_t op,
>    if (!vect_match_expression_p (root, MINUS_EXPR))
>      return IFN_LAST;
>
> +  /* TODO: Support invariants here, with the new layout CADD now
> +          can match before we get a chance to try CFMS.  */
>    auto nodes = SLP_TREE_CHILDREN (root);
>    if (!vect_match_expression_p (nodes[1], MULT_EXPR)
>        || vect_detect_pair_op (nodes[0]) != PLUS_MINUS)
> @@ -1217,16 +1280,14 @@ complex_fms_pattern::matches (complex_operation_t op,
>        || !vect_match_expression_p (l0node[1], MULT_EXPR))
>      return IFN_LAST;
>
> -  bool is_neg = vect_normalize_conj_loc (left_op);
> -
> -  bool conj_first_operand = false;
> -  if (!vect_validate_multiplication (perm_cache, right_op, left_op, false,
> -                                    &conj_first_operand, true))
> +  enum _conj_status status;
> +  if (!vect_validate_multiplication (perm_cache, compat_cache, right_op,
> +                                    left_op, true, &status))
>      return IFN_LAST;
>
> -  if (!is_neg)
> +  if (status == CONJ_NONE)
>      ifn = IFN_COMPLEX_FMS;
> -  else if (is_neg)
> +  else
>      ifn = IFN_COMPLEX_FMS_CONJ;
>
>    if (!vect_pattern_validate_optab (ifn, *ref_node))
> @@ -1243,26 +1304,12 @@ complex_fms_pattern::matches (complex_operation_t op,
>        ops->quick_push (right_op[1]);
>        ops->quick_push (left_op[1]);
>      }
> -  else if (kind == PERM_TOP)
> -    {
> -      ops->quick_push (l0node[0]);
> -      ops->quick_push (right_op[1]);
> -      ops->quick_push (right_op[0]);
> -      ops->quick_push (left_op[0]);
> -    }
> -  else if (kind == PERM_EVENEVEN && !is_neg)
> -    {
> -      ops->quick_push (l0node[0]);
> -      ops->quick_push (right_op[1]);
> -      ops->quick_push (right_op[0]);
> -      ops->quick_push (left_op[0]);
> -    }
>    else
>      {
>        ops->quick_push (l0node[0]);
>        ops->quick_push (right_op[1]);
>        ops->quick_push (right_op[0]);
> -      ops->quick_push (left_op[1]);
> +      ops->quick_push (left_op[0]);
>      }
>
>    return ifn;
> @@ -1272,13 +1319,14 @@ complex_fms_pattern::matches (complex_operation_t op,
>
>  vect_pattern*
>  complex_fms_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache,
> +                               slp_compat_nodes_map_t *compat_cache,
>                                 slp_tree *node)
>  {
>    auto_vec<slp_tree> ops;
>    complex_operation_t op
>      = vect_detect_pair_op (*node, true, &ops);
>    internal_fn ifn
> -    = complex_fms_pattern::matches (op, perm_cache, node, &ops);
> +    = complex_fms_pattern::matches (op, perm_cache, compat_cache, node, &ops);
>    if (ifn == IFN_LAST)
>      return NULL;
>
> @@ -1305,9 +1353,24 @@ complex_fms_pattern::build (vec_info *vinfo)
>    SLP_TREE_CHILDREN (*this->m_node).create (3);
>
>    /* First re-arrange the children.  */
> +  switch (this->m_ifn)
> +  {
> +    case IFN_COMPLEX_FMS:
> +      {
> +       SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[1]);
> +       SLP_TREE_CHILDREN (*this->m_node).quick_push (newnode);
> +       break;
> +      }
> +    case IFN_COMPLEX_FMS_CONJ:
> +      {
> +       SLP_TREE_CHILDREN (*this->m_node).quick_push (newnode);
> +       SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[1]);
> +       break;
> +      }
> +    default:
> +      gcc_unreachable ();
> +  }
>    SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[0]);
> -  SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[1]);
> -  SLP_TREE_CHILDREN (*this->m_node).quick_push (newnode);
>
>    /* And then rewrite the node itself.  */
>    complex_pattern::build (vinfo);
> @@ -1334,11 +1397,12 @@ class complex_operations_pattern : public complex_pattern
>    public:
>      void build (vec_info *);
>      static internal_fn
> -    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *, slp_tree *,
> -            vec<slp_tree> *);
> +    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
> +            slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *);
>
>      static vect_pattern*
> -    recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
> +    recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t *,
> +              slp_tree *);
>  };
>
>  /* Dummy matches implementation for proxy object.  */
> @@ -1347,6 +1411,7 @@ internal_fn
>  complex_operations_pattern::
>  matches (complex_operation_t /* op */,
>          slp_tree_to_load_perm_map_t * /* perm_cache */,
> +        slp_compat_nodes_map_t * /* compat_cache */,
>          slp_tree * /* ref_node */, vec<slp_tree> * /* ops */)
>  {
>    return IFN_LAST;
> @@ -1356,6 +1421,7 @@ matches (complex_operation_t /* op */,
>
>  vect_pattern*
>  complex_operations_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache,
> +                                      slp_compat_nodes_map_t *ccache,
>                                        slp_tree *node)
>  {
>    auto_vec<slp_tree> ops;
> @@ -1363,15 +1429,15 @@ complex_operations_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache,
>      = vect_detect_pair_op (*node, true, &ops);
>    internal_fn ifn = IFN_LAST;
>
> -  ifn  = complex_fms_pattern::matches (op, perm_cache, node, &ops);
> +  ifn  = complex_fms_pattern::matches (op, perm_cache, ccache, node, &ops);
>    if (ifn != IFN_LAST)
>      return complex_fms_pattern::mkInstance (node, &ops, ifn);
>
> -  ifn  = complex_mul_pattern::matches (op, perm_cache, node, &ops);
> +  ifn  = complex_mul_pattern::matches (op, perm_cache, ccache, node, &ops);
>    if (ifn != IFN_LAST)
>      return complex_mul_pattern::mkInstance (node, &ops, ifn);
>
> -  ifn  = complex_add_pattern::matches (op, perm_cache, node, &ops);
> +  ifn  = complex_add_pattern::matches (op, perm_cache, ccache, node, &ops);
>    if (ifn != IFN_LAST)
>      return complex_add_pattern::mkInstance (node, &ops, ifn);
>
> @@ -1398,11 +1464,13 @@ class addsub_pattern : public vect_pattern
>      void build (vec_info *);
>
>      static vect_pattern*
> -    recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
> +    recognize (slp_tree_to_load_perm_map_t *, slp_compat_nodes_map_t *,
> +              slp_tree *);
>  };
>
>  vect_pattern *
> -addsub_pattern::recognize (slp_tree_to_load_perm_map_t *, slp_tree *node_)
> +addsub_pattern::recognize (slp_tree_to_load_perm_map_t *,
> +                          slp_compat_nodes_map_t *, slp_tree *node_)
>  {
>    slp_tree node = *node_;
>    if (SLP_TREE_CODE (node) != VEC_PERM_EXPR
> diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
> index b912c3577df61a694d5bb9e22c5303fe6a48ab6e..cb577f8a612d583254e42bb06a6d7a0875de5e75 100644
> --- a/gcc/tree-vect-slp.c
> +++ b/gcc/tree-vect-slp.c
> @@ -804,7 +804,7 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned char swap,
>  /* Return true if call statements CALL1 and CALL2 are similar enough
>     to be combined into the same SLP group.  */
>
> -static bool
> +bool
>  compatible_calls_p (gcall *call1, gcall *call2)
>  {
>    unsigned int nargs = gimple_call_num_args (call1);
> @@ -2907,6 +2907,7 @@ optimize_load_redistribution (scalar_stmts_to_slp_tree_map_t *bst_map,
>  static bool
>  vect_match_slp_patterns_2 (slp_tree *ref_node, vec_info *vinfo,
>                            slp_tree_to_load_perm_map_t *perm_cache,
> +                          slp_compat_nodes_map_t *compat_cache,
>                            hash_set<slp_tree> *visited)
>  {
>    unsigned i;
> @@ -2918,11 +2919,13 @@ vect_match_slp_patterns_2 (slp_tree *ref_node, vec_info *vinfo,
>    slp_tree child;
>    FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
>      found_p |= vect_match_slp_patterns_2 (&SLP_TREE_CHILDREN (node)[i],
> -                                         vinfo, perm_cache, visited);
> +                                         vinfo, perm_cache, compat_cache,
> +                                         visited);
>
>    for (unsigned x = 0; x < num__slp_patterns; x++)
>      {
> -      vect_pattern *pattern = slp_patterns[x] (perm_cache, ref_node);
> +      vect_pattern *pattern
> +       = slp_patterns[x] (perm_cache, compat_cache, ref_node);
>        if (pattern)
>         {
>           pattern->build (vinfo);
> @@ -2943,7 +2946,8 @@ vect_match_slp_patterns_2 (slp_tree *ref_node, vec_info *vinfo,
>  static bool
>  vect_match_slp_patterns (slp_instance instance, vec_info *vinfo,
>                          hash_set<slp_tree> *visited,
> -                        slp_tree_to_load_perm_map_t *perm_cache)
> +                        slp_tree_to_load_perm_map_t *perm_cache,
> +                        slp_compat_nodes_map_t *compat_cache)
>  {
>    DUMP_VECT_SCOPE ("vect_match_slp_patterns");
>    slp_tree *ref_node = &SLP_INSTANCE_TREE (instance);
> @@ -2953,7 +2957,8 @@ vect_match_slp_patterns (slp_instance instance, vec_info *vinfo,
>                      "Analyzing SLP tree %p for patterns\n",
>                      SLP_INSTANCE_TREE (instance));
>
> -  return vect_match_slp_patterns_2 (ref_node, vinfo, perm_cache, visited);
> +  return vect_match_slp_patterns_2 (ref_node, vinfo, perm_cache, compat_cache,
> +                                   visited);
>  }
>
>  /* STMT_INFO is a store group of size GROUP_SIZE that we are considering
> @@ -3437,12 +3442,14 @@ vect_analyze_slp (vec_info *vinfo, unsigned max_tree_size)
>
>    hash_set<slp_tree> visited_patterns;
>    slp_tree_to_load_perm_map_t perm_cache;
> +  slp_compat_nodes_map_t compat_cache;
>
>    /* See if any patterns can be found in the SLP tree.  */
>    bool pattern_found = false;
>    FOR_EACH_VEC_ELT (LOOP_VINFO_SLP_INSTANCES (vinfo), i, instance)
>      pattern_found |= vect_match_slp_patterns (instance, vinfo,
> -                                             &visited_patterns, &perm_cache);
> +                                             &visited_patterns, &perm_cache,
> +                                             &compat_cache);
>
>    /* If any were found optimize permutations of loads.  */
>    if (pattern_found)
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index 2f6e1e268fb07e9de065ff9c45af87546e565d66..83cd0919c7838c65576e1debd881e0ec636a605a 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -2268,6 +2268,7 @@ extern void duplicate_and_interleave (vec_info *, gimple_seq *, tree,
>  extern int vect_get_place_in_interleaving_chain (stmt_vec_info, stmt_vec_info);
>  extern slp_tree vect_create_new_slp_node (unsigned, tree_code);
>  extern void vect_free_slp_tree (slp_tree);
> +extern bool compatible_calls_p (gcall *, gcall *);
>
>  /* In tree-vect-patterns.c.  */
>  extern void
> @@ -2306,6 +2307,12 @@ typedef enum _complex_perm_kinds {
>  typedef hash_map <slp_tree, complex_perm_kinds_t>
>    slp_tree_to_load_perm_map_t;
>
> +/* Cache from nodes pair to being compatible or not.  */
> +typedef pair_hash <nofree_ptr_hash <_slp_tree>,
> +                  nofree_ptr_hash <_slp_tree>> slp_node_hash;
> +typedef hash_map <slp_node_hash, bool> slp_compat_nodes_map_t;
> +
> +
>  /* Vector pattern matcher base class.  All SLP pattern matchers must inherit
>     from this type.  */
>
> @@ -2338,7 +2345,8 @@ class vect_pattern
>    public:
>
>      /* Create a new instance of the pattern matcher class of the given type.  */
> -    static vect_pattern* recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
> +    static vect_pattern* recognize (slp_tree_to_load_perm_map_t *,
> +                                   slp_compat_nodes_map_t *, slp_tree *);
>
>      /* Build the pattern from the data collected so far.  */
>      virtual void build (vec_info *) = 0;
> @@ -2352,6 +2360,7 @@ class vect_pattern
>
>  /* Function pointer to create a new pattern matcher from a generic type.  */
>  typedef vect_pattern* (*vect_pattern_decl_t) (slp_tree_to_load_perm_map_t *,
> +                                             slp_compat_nodes_map_t *,
>                                               slp_tree *);
>
>  /* List of supported pattern matchers.  */
>
>
> --

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [2/3 PATCH]AArch64 use canonical ordering for complex mul, fma and fms
  2021-12-20 16:20       ` Tamar Christina
@ 2022-01-11  7:10         ` Tamar Christina
  2022-02-01  9:55           ` Tamar Christina
  2022-02-01 11:04         ` Richard Sandiford
  1 sibling, 1 reply; 18+ messages in thread
From: Tamar Christina @ 2022-01-11  7:10 UTC (permalink / raw)
  To: Richard Sandiford
  Cc: gcc-patches, nd, Richard Earnshaw, Marcus Shawcroft, Kyrylo Tkachov

ping

> -----Original Message-----
> From: Tamar Christina
> Sent: Monday, December 20, 2021 4:21 PM
> To: Richard Sandiford <richard.sandiford@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Earnshaw 
> <Richard.Earnshaw@arm.com>; Marcus Shawcroft 
> <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> Subject: RE: [2/3 PATCH]AArch64 use canonical ordering for complex 
> mul, fma and fms
> 
> 
> 
> > -----Original Message-----
> > From: Richard Sandiford <richard.sandiford@arm.com>
> > Sent: Friday, December 17, 2021 4:49 PM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Earnshaw 
> > <Richard.Earnshaw@arm.com>; Marcus Shawcroft 
> > <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov
> <Kyrylo.Tkachov@arm.com>
> > Subject: Re: [2/3 PATCH]AArch64 use canonical ordering for complex 
> > mul, fma and fms
> >
> > Richard Sandiford <richard.sandiford@arm.com> writes:
> > > Tamar Christina <tamar.christina@arm.com> writes:
> > >> Hi All,
> > >>
> > >> After the first patch in the series this updates the optabs to 
> > >> expect the canonical sequence.
> > >>
> > >> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >>
> > >> Ok for master? and backport along with the first patch?
> > >>
> > >> Thanks,
> > >> Tamar
> > >>
> > >> gcc/ChangeLog:
> > >>
> > >> 	PR tree-optimization/102819
> > >> 	PR tree-optimization/103169
> > >> 	* config/aarch64/aarch64-simd.md
> > (cml<fcmac1><conj_op><mode>4,
> > >> 	cmul<conj_op><mode>3): Use canonical order.
> > >> 	* config/aarch64/aarch64-sve.md (cml<fcmac1><conj_op><mode>4,
> > >> 	cmul<conj_op><mode>3): Likewise.
> > >>
> > >> --- inline copy of patch --
> > >> diff --git a/gcc/config/aarch64/aarch64-simd.md
> > >> b/gcc/config/aarch64/aarch64-simd.md
> > >> index
> > >>
> >
> f95a7e1d91c97c9e981d75e71f0b49c02ef748ba..875896ee71324712c8034eeff9
> > c
> > >> fb5649f9b0e73 100644
> > >> --- a/gcc/config/aarch64/aarch64-simd.md
> > >> +++ b/gcc/config/aarch64/aarch64-simd.md
> > >> @@ -556,17 +556,17 @@ (define_insn
> > "aarch64_fcmlaq_lane<rot><mode>"
> > >>  ;; remainder.  Because of this, expand early.
> > >>  (define_expand "cml<fcmac1><conj_op><mode>4"
> > >>    [(set (match_operand:VHSDF 0 "register_operand")
> > >> -	(plus:VHSDF (match_operand:VHSDF 1 "register_operand")
> > >> -		    (unspec:VHSDF [(match_operand:VHSDF 2
> > "register_operand")
> > >> -				   (match_operand:VHSDF 3
> > "register_operand")]
> > >> -				   FCMLA_OP)))]
> > >> +	(plus:VHSDF (unspec:VHSDF [(match_operand:VHSDF 1
> > "register_operand")
> > >> +				   (match_operand:VHSDF 2
> > "register_operand")]
> > >> +				   FCMLA_OP)
> > >> +		    (match_operand:VHSDF 3 "register_operand")))]
> > >>    "TARGET_COMPLEX && !BYTES_BIG_ENDIAN"
> > >>  {
> > >>    rtx tmp = gen_reg_rtx (<MODE>mode);
> > >> -  emit_insn (gen_aarch64_fcmla<rotsplit1><mode> (tmp, operands[1],
> > >> -						 operands[3], operands[2]));
> > >> +  emit_insn (gen_aarch64_fcmla<rotsplit1><mode> (tmp, operands[3],
> > >> +						 operands[1], operands[2]));
> > >>    emit_insn (gen_aarch64_fcmla<rotsplit2><mode> (operands[0], tmp,
> > >> -						 operands[3], operands[2]));
> > >> +						 operands[1], operands[2]));
> > >>    DONE;
> > >>  })
> > >>
> > >> @@ -583,9 +583,9 @@ (define_expand "cmul<conj_op><mode>3"
> > >>    rtx tmp = force_reg (<MODE>mode, CONST0_RTX (<MODE>mode));
> > >>    rtx res1 = gen_reg_rtx (<MODE>mode);
> > >>    emit_insn (gen_aarch64_fcmla<rotsplit1><mode> (res1, tmp,
> > >> -						 operands[2], operands[1]));
> > >> +						 operands[1], operands[2]));
> > >>    emit_insn (gen_aarch64_fcmla<rotsplit2><mode> (operands[0], res1,
> > >> -						 operands[2], operands[1]));
> > >> +						 operands[1], operands[2]));
> > >
> > > This doesn't look right.  Going from the documentation, patch 1 
> > > isn't changing the operand order for CMUL: the conjugated operand 
> > > (if there is one) is still operand 2.  The FCMLA sequences use the 
> > > opposite order, where the conjugated operand (if there is one) is
> operand 1.
> > > So I think
> >
> > I meant “the first multiplication operand” rather than “operand 1” here.
> >
> > > the reversal here is still needed.
> > >
> > > Same for the multiplication operands in CML* above.
> 
> I did actually change the order in patch 1, but didn't update the docs..
> That was done because I followed the SLP order again, but now I've 
> updated them to do what the docs say.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master? and backport along with the first patch?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	PR tree-optimization/102819
> 	PR tree-optimization/103169
> 	* config/aarch64/aarch64-simd.md
> (cml<fcmac1><conj_op><mode>4): Use
> 	canonical order.
> 	* config/aarch64/aarch64-sve.md (cml<fcmac1><conj_op><mode>4):
> Likewise.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/config/aarch64/aarch64-simd.md
> b/gcc/config/aarch64/aarch64-simd.md
> index
> f95a7e1d91c97c9e981d75e71f0b49c02ef748ba..9e41610fba85862ef7675bea1
> e5731b14cab59ce 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -556,17 +556,17 @@ (define_insn "aarch64_fcmlaq_lane<rot><mode>"
>  ;; remainder.  Because of this, expand early.
>  (define_expand "cml<fcmac1><conj_op><mode>4"
>    [(set (match_operand:VHSDF 0 "register_operand")
> -	(plus:VHSDF (match_operand:VHSDF 1 "register_operand")
> -		    (unspec:VHSDF [(match_operand:VHSDF 2
> "register_operand")
> -				   (match_operand:VHSDF 3
> "register_operand")]
> -				   FCMLA_OP)))]
> +	(plus:VHSDF (unspec:VHSDF [(match_operand:VHSDF 1
> "register_operand")
> +				   (match_operand:VHSDF 2
> "register_operand")]
> +				   FCMLA_OP)
> +		    (match_operand:VHSDF 3 "register_operand")))]
>    "TARGET_COMPLEX && !BYTES_BIG_ENDIAN"
>  {
>    rtx tmp = gen_reg_rtx (<MODE>mode);
> -  emit_insn (gen_aarch64_fcmla<rotsplit1><mode> (tmp, operands[1],
> -						 operands[3], operands[2]));
> +  emit_insn (gen_aarch64_fcmla<rotsplit1><mode> (tmp, operands[3],
> +						 operands[2], operands[1]));
>    emit_insn (gen_aarch64_fcmla<rotsplit2><mode> (operands[0], tmp,
> -						 operands[3], operands[2]));
> +						 operands[2], operands[1]));
>    DONE;
>  })
> 
> diff --git a/gcc/config/aarch64/aarch64-sve.md
> b/gcc/config/aarch64/aarch64-sve.md
> index
> 9ef968840c20a3049901b3f8a919cf27ded1da3e..9ed19017c480b88779e9e3b08
> c0e031be60a8c12 100644
> --- a/gcc/config/aarch64/aarch64-sve.md
> +++ b/gcc/config/aarch64/aarch64-sve.md
> @@ -7278,11 +7278,11 @@ (define_expand "cml<fcmac1><conj_op><mode>4"
>    rtx tmp = gen_reg_rtx (<MODE>mode);
>    emit_insn
>      (gen_aarch64_pred_fcmla<sve_rot1><mode> (tmp, operands[4],
> -					     operands[3], operands[2],
> -					     operands[1], operands[5]));
> +					     operands[2], operands[1],
> +					     operands[3], operands[5]));
>    emit_insn
>      (gen_aarch64_pred_fcmla<sve_rot2><mode> (operands[0], operands[4],
> -					     operands[3], operands[2],
> +					     operands[2], operands[1],
>  					     tmp, operands[5]));
>    DONE;
>  })

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [3/3 PATCH][AArch32] use canonical ordering for complex mul, fma and fms
  2021-12-20 16:22   ` Tamar Christina
@ 2022-01-11  7:10     ` Tamar Christina
  2022-02-01  9:54       ` Tamar Christina
  2022-02-01  9:56     ` Kyrylo Tkachov
  1 sibling, 1 reply; 18+ messages in thread
From: Tamar Christina @ 2022-01-11  7:10 UTC (permalink / raw)
  To: gcc-patches
  Cc: nd, Ramana Radhakrishnan, Richard Earnshaw, nickc, Kyrylo Tkachov

ping

> -----Original Message-----
> From: Tamar Christina
> Sent: Monday, December 20, 2021 4:22 PM
> To: gcc-patches@gcc.gnu.org
> Cc: nd <nd@arm.com>; Ramana Radhakrishnan
> <Ramana.Radhakrishnan@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; nickc@redhat.com; Kyrylo Tkachov
> <Kyrylo.Tkachov@arm.com>
> Subject: RE: [3/3 PATCH][AArch32] use canonical ordering for complex mul,
> fma and fms
> 
> Updated version of patch following AArch64 review.
> 
> Bootstrapped Regtested on arm-none-linux-gnueabihf and no issues.
> 
> Ok for master? and backport along with the first patch?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	PR tree-optimization/102819
> 	PR tree-optimization/103169
> 	* config/arm/vec-common.md (cml<fcmac1><conj_op><mode>4):
> Use
> 	canonical order.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-
> common.md index
> e71d9b3811fde62159f5c21944fef9fe3f97b4bd..eab77ac8decce76d70f5b2594f
> 4439e6ed363e6e 100644
> --- a/gcc/config/arm/vec-common.md
> +++ b/gcc/config/arm/vec-common.md
> @@ -265,18 +265,18 @@ (define_expand "arm_vcmla<rot><mode>"
>  ;; remainder.  Because of this, expand early.
>  (define_expand "cml<fcmac1><conj_op><mode>4"
>    [(set (match_operand:VF 0 "register_operand")
> -	(plus:VF (match_operand:VF 1 "register_operand")
> -		 (unspec:VF [(match_operand:VF 2 "register_operand")
> -			     (match_operand:VF 3 "register_operand")]
> -			    VCMLA_OP)))]
> +	(plus:VF (unspec:VF [(match_operand:VF 1 "register_operand")
> +			     (match_operand:VF 2 "register_operand")]
> +			    VCMLA_OP)
> +		 (match_operand:VF 3 "register_operand")))]
>    "(TARGET_COMPLEX || (TARGET_HAVE_MVE &&
> TARGET_HAVE_MVE_FLOAT
>  		      && ARM_HAVE_<MODE>_ARITH))
> && !BYTES_BIG_ENDIAN"
>  {
>    rtx tmp = gen_reg_rtx (<MODE>mode);
> -  emit_insn (gen_arm_vcmla<rotsplit1><mode> (tmp, operands[1],
> -					     operands[3], operands[2]));
> +  emit_insn (gen_arm_vcmla<rotsplit1><mode> (tmp, operands[3],
> +					     operands[2], operands[1]));
>    emit_insn (gen_arm_vcmla<rotsplit2><mode> (operands[0], tmp,
> -					     operands[3], operands[2]));
> +					     operands[2], operands[1]));
>    DONE;
>  })


^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [1/3 PATCH]middle-end vect: Simplify and extend the complex numbers validation routines.
  2022-01-10 13:00 ` Richard Biener
@ 2022-01-11  7:31   ` Tamar Christina
  0 siblings, 0 replies; 18+ messages in thread
From: Tamar Christina @ 2022-01-11  7:31 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, nd, Richard Guenther



> -----Original Message-----
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Monday, January 10, 2022 1:00 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: GCC Patches <gcc-patches@gcc.gnu.org>; nd <nd@arm.com>; Richard
> Guenther <rguenther@suse.de>
> Subject: Re: [1/3 PATCH]middle-end vect: Simplify and extend the complex
> numbers validation routines.
> 
> On Fri, Dec 17, 2021 at 4:44 PM Tamar Christina via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > Hi All,
> >
> > This patch boosts the analysis for complex mul,fma and fms in order to
> ensure
> > that it doesn't create an incorrect output.
> >
> > Essentially it adds an extra verification to check that the two nodes it's
> going
> > to combine do the same operations on compatible values.  The reason it
> needs to
> > do this is that if one computation differs from the other then with the
> current
> > implementation we have no way to deal with it since we have to remove
> the
> > permute.
> >
> > When we can keep the permute around we can probably handle these by
> unrolling.
> >
> > While implementing this since I have to do the traversal anyway I took
> advantage
> > of it by simplifying the code a bit.  Previously we would determine whether
> > something is a conjugate and then try to figure out which conjugate it is
> and
> > then try to see if the permutes match what we expect.
> >
> > Now the code that does the traversal will detect this in one go and return
> to us
> > whether the operation is something that can be combined and whether a
> conjugate
> > is present.
> >
> > Secondly because it does this I can now simplify the checking code itself to
> > essentially just try to apply fixed patterns to each operation.
> >
> > The patterns represent the order operations should appear in. For instance
> a
> > complex MUL operation combines :
> >
> >   Left 1 + Right 1
> >   Left 2 + Right 2
> >
> > with a permute on the nodes consisting of:
> >
> >   { Even, Even } + { Odd, Odd  }
> >   { Even, Odd  } + { Odd, Even }
> >
> > By abstracting over these patterns the checking code becomes quite simple.
> >
> > As part of this I was checking the order of the operands which was left in
> > "slp" order. as in, the same order they showed up in during SLP, which
> means
> > that the accumulator is first.  However it looks like I didn't document this
> > and the x86 optab was implemented assuming the same order as FMA, i.e.
> that
> > the accumulator is last.
> >
> > I have this changed the order to match that of FMA and FMS which corrects
> the
> > x86 codegen and will update the Arm targets.  This has now also been
> > documented.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > x86_64-pc-linux-gnu and no regressions.
> >
> > Ok for master? and backport to GCC 11 after some stew?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> >         PR tree-optimization/102819
> >         PR tree-optimization/103169
> >         * doc/md.texi: Update docs for cfms, cfma.
> >         * tree-data-ref.h (same_data_refs): Accept optional offset.
> >         * tree-vect-slp-patterns.c (is_linear_load_p): Fix issue with repeating
> >         patterns.
> >         (vect_normalize_conj_loc): Remove.
> >         (is_eq_or_top): Change to take two nodes.
> >         (enum _conj_status, compatible_complex_nodes_p,
> >         vect_validate_multiplication): New.
> >         (class complex_add_pattern, complex_add_pattern::matches,
> >         complex_add_pattern::recognize, class complex_mul_pattern,
> >         complex_mul_pattern::recognize, class complex_fms_pattern,
> >         complex_fms_pattern::recognize, class complex_operations_pattern,
> >         complex_operations_pattern::recognize, addsub_pattern::recognize):
> Pass
> >         new cache.
> >         (complex_fms_pattern::matches, complex_mul_pattern::matches):
> Pass new
> >         cache and use new validation code.
> >         * tree-vect-slp.c (vect_match_slp_patterns_2,
> vect_match_slp_patterns,
> >         vect_analyze_slp): Pass along cache.
> >         (compatible_calls_p): Expose.
> >         * tree-vectorizer.h (compatible_calls_p, slp_node_hash,
> >         slp_compat_nodes_map_t): New.
> >         (class vect_pattern): Update signatures include new cache.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         PR tree-optimization/102819
> >         PR tree-optimization/103169
> >         * g++.dg/vect/pr99149.cc: xfail for now.
> >         * gcc.dg/vect/complex/pr102819-1.c: New test.
> >         * gcc.dg/vect/complex/pr102819-2.c: New test.
> >         * gcc.dg/vect/complex/pr102819-3.c: New test.
> >         * gcc.dg/vect/complex/pr102819-4.c: New test.
> >         * gcc.dg/vect/complex/pr102819-5.c: New test.
> >         * gcc.dg/vect/complex/pr102819-6.c: New test.
> >         * gcc.dg/vect/complex/pr102819-7.c: New test.
> >         * gcc.dg/vect/complex/pr102819-8.c: New test.
> >         * gcc.dg/vect/complex/pr102819-9.c: New test.
> >         * gcc.dg/vect/complex/pr103169.c: New test.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > index
> 9ec051e94e10cca9eec2773e1b8c01b74b6ea4db..60dc5b3ea6087c2824ad1467
> bc66e9cfebe9dcfc 100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -6325,12 +6325,12 @@ Perform a vector multiply and accumulate that
> is semantically the same as
> >  a multiply and accumulate of complex numbers.
> >
> >  @smallexample
> > -  complex TYPE c[N];
> > -  complex TYPE a[N];
> > -  complex TYPE b[N];
> > +  complex TYPE op0[N];
> > +  complex TYPE op1[N];
> > +  complex TYPE op2[N];
> >    for (int i = 0; i < N; i += 1)
> >      @{
> > -      c[i] += a[i] * b[i];
> > +      op2[i] += op1[i] * op2[i];
> >      @}
> >  @end smallexample
> >
> > @@ -6348,12 +6348,12 @@ the same as a multiply and accumulate of
> complex numbers where the second
> >  multiply arguments is conjugated.
> >
> >  @smallexample
> > -  complex TYPE c[N];
> > -  complex TYPE a[N];
> > -  complex TYPE b[N];
> > +  complex TYPE op0[N];
> > +  complex TYPE op1[N];
> > +  complex TYPE op2[N];
> >    for (int i = 0; i < N; i += 1)
> >      @{
> > -      c[i] += a[i] * conj (b[i]);
> > +      op2[i] += op0[i] * conj (op1[i]);
> >      @}
> >  @end smallexample
> >
> > @@ -6370,12 +6370,12 @@ Perform a vector multiply and subtract that is
> semantically the same as
> >  a multiply and subtract of complex numbers.
> >
> >  @smallexample
> > -  complex TYPE c[N];
> > -  complex TYPE a[N];
> > -  complex TYPE b[N];
> > +  complex TYPE op0[N];
> > +  complex TYPE op1[N];
> > +  complex TYPE op2[N];
> >    for (int i = 0; i < N; i += 1)
> >      @{
> > -      c[i] -= a[i] * b[i];
> > +      op2[i] -= op0[i] * op1[i];
> >      @}
> >  @end smallexample
> >
> > @@ -6393,12 +6393,12 @@ the same as a multiply and subtract of complex
> numbers where the second
> >  multiply arguments is conjugated.
> >
> >  @smallexample
> > -  complex TYPE c[N];
> > -  complex TYPE a[N];
> > -  complex TYPE b[N];
> > +  complex TYPE op0[N];
> > +  complex TYPE op1[N];
> > +  complex TYPE op2[N];
> >    for (int i = 0; i < N; i += 1)
> >      @{
> > -      c[i] -= a[i] * conj (b[i]);
> > +      op2[i] -= op0[i] * conj (op1[i]);
> >      @}
> >  @end smallexample
> >
> > @@ -6415,12 +6415,12 @@ Perform a vector multiply that is semantically
> the same as multiply of
> >  complex numbers.
> >
> >  @smallexample
> > -  complex TYPE c[N];
> > -  complex TYPE a[N];
> > -  complex TYPE b[N];
> > +  complex TYPE op0[N];
> > +  complex TYPE op1[N];
> > +  complex TYPE op2[N];
> >    for (int i = 0; i < N; i += 1)
> >      @{
> > -      c[i] = a[i] * b[i];
> > +      op2[i] = op0[i] * op1[i];
> >      @}
> >  @end smallexample
> >
> > @@ -6437,12 +6437,12 @@ Perform a vector multiply by conjugate that is
> semantically the same as a
> >  multiply of complex numbers where the second multiply arguments is
> conjugated.
> >
> >  @smallexample
> > -  complex TYPE c[N];
> > -  complex TYPE a[N];
> > -  complex TYPE b[N];
> > +  complex TYPE op0[N];
> > +  complex TYPE op1[N];
> > +  complex TYPE op2[N];
> >    for (int i = 0; i < N; i += 1)
> >      @{
> > -      c[i] = a[i] * conj (b[i]);
> > +      op2[i] = op0[i] * conj (op1[i]);
> >      @}
> >  @end smallexample
> >
> > diff --git a/gcc/testsuite/g++.dg/vect/pr99149.cc
> b/gcc/testsuite/g++.dg/vect/pr99149.cc
> > index
> e6e0594a336fa053ffba64a12e2de43a4e373f49..bb9f5fa89f12b184368bf5488d
> 6e9432c2166463 100755
> > --- a/gcc/testsuite/g++.dg/vect/pr99149.cc
> > +++ b/gcc/testsuite/g++.dg/vect/pr99149.cc
> > @@ -24,4 +24,4 @@ public:
> >  } n;
> >  main() { n.j(); }
> >
> > -/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_MUL" 1 "slp2" } } */
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_MUL" 1 "slp2"
> { xfail { vect_float } } } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-1.c
> b/gcc/testsuite/gcc.dg/vect/complex/pr102819-1.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..46b9a55f05279d732fa1418e02
> f779cf693ede07
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-1.c
> > @@ -0,0 +1,20 @@
> > +/* { dg-do compile } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +
> > +float f[12][100];
> > +
> > +void bad1(float v1, float v2)
> > +{
> > +  for (int r = 0; r < 100; r += 4)
> > +    {
> > +      int i = r + 1;
> > +      f[0][r] = f[1][r] * (f[2][r] + v2) - f[1][i] * (f[2][i] + v1);
> > +      f[0][i] = f[1][r] * (f[2][i] + v1) + f[1][i] * (f[2][r] + v2);
> > +      f[0][r+2] = f[1][r+2] * (f[2][r+2] + v2) - f[1][i+2] * (f[2][i+2] + v1);
> > +      f[0][i+2] = f[1][r+2] * (f[2][i+2] + v1) + f[1][i+2] * (f[2][r+2] + v2);
> > +      //                  ^^^^^^^             ^^^^^^^
> > +    }
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target
> { vect_float } } } } */
> > +
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-2.c
> b/gcc/testsuite/gcc.dg/vect/complex/pr102819-2.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..ffe646efe57f7ad07541b0fb96
> 601596f46dc5f8
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-2.c
> > @@ -0,0 +1,17 @@
> > +/* { dg-do compile } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +
> > +float f[12][100];
> > +
> > +void bad1(float v1, float v2)
> > +{
> > +  for (int r = 0; r < 100; r += 2)
> > +    {
> > +      int i = r + 1;
> > +      f[0][r] = f[1][r] * (f[2][r] + v1) - f[1][i] * (f[2][i] + v2);
> > +      f[0][i] = f[1][r] * (f[2][i] + v1) + f[1][i] * (f[2][r] + v2);
> > +    }
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { target
> { vect_float } } } } */
> > +
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-3.c
> b/gcc/testsuite/gcc.dg/vect/complex/pr102819-3.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..5f98aa204d8b11b0cb433f8965
> dbb72cf8940de1
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-3.c
> > @@ -0,0 +1,17 @@
> > +/* { dg-do compile } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +
> > +float f[12][100];
> > +
> > +void good1(float v1, float v2)
> > +{
> > +  for (int r = 0; r < 100; r += 2)
> > +    {
> > +      int i = r + 1;
> > +      f[0][r] = f[1][r] * (f[2][r] + v2) - f[1][i] * (f[2][i] + v1);
> > +      f[0][i] = f[1][r] * (f[2][i] + v1) + f[1][i] * (f[2][r] + v2);
> > +    }
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target
> { vect_float } } } } */
> > +
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-4.c
> b/gcc/testsuite/gcc.dg/vect/complex/pr102819-4.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..882851789c5085e73400060911
> 4be480d3b08bd0
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-4.c
> > @@ -0,0 +1,17 @@
> > +/* { dg-do compile } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +
> > +float f[12][100];
> > +
> > +void good1()
> > +{
> > +  for (int r = 0; r < 100; r += 2)
> > +    {
> > +      int i = r + 1;
> > +      f[0][r] = f[1][r] * f[2][r] - f[1][i] * f[2][i];
> > +      f[0][i] = f[1][r] * f[2][i] + f[1][i] * f[2][r];
> > +    }
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target
> { vect_float } } } } */
> > +
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-5.c
> b/gcc/testsuite/gcc.dg/vect/complex/pr102819-5.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..6a2d549d65f3f27d407fb0bd46
> 9473e6a5c333ae
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-5.c
> > @@ -0,0 +1,17 @@
> > +/* { dg-do compile } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +
> > +float f[12][100];
> > +
> > +void good2()
> > +{
> > +  for (int r = 0; r < 100; r += 2)
> > +    {
> > +      int i = r + 1;
> > +      f[0][r] = f[1][r] * (f[2][r] + 1) - f[1][i] * (f[2][i] + 1);
> > +      f[0][i] = f[1][r] * (f[2][i] + 1) + f[1][i] * (f[2][r] + 1);
> > +    }
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target
> { vect_float } } } } */
> > +
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-6.c
> b/gcc/testsuite/gcc.dg/vect/complex/pr102819-6.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..71e66dbe3b29eec1fffb8df9b
> 216022fdc0af54e
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-6.c
> > @@ -0,0 +1,18 @@
> > +/* { dg-do compile } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +
> > +float f[12][100];
> > +
> > +void bad1()
> > +{
> > +  for (int r = 0; r < 100; r += 2)
> > +    {
> > +      int i = r + 1;
> > +      f[0][r] = f[1][r] * f[2][r] - f[1][i] * f[3][i];
> > +      f[0][i] = f[1][r] * f[2][i] + f[1][i] * f[3][r];
> > +      //                  ^^^^^^^             ^^^^^^^
> > +    }
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { target
> { vect_float } } } } */
> > +
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-7.c
> b/gcc/testsuite/gcc.dg/vect/complex/pr102819-7.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..536672f3c8bb474ad5fa4bb61
> b3a36b555acf3cf
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-7.c
> > @@ -0,0 +1,18 @@
> > +/* { dg-do compile } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +
> > +float f[12][100];
> > +
> > +void bad2()
> > +{
> > +  for (int r = 0; r < 100; r += 2)
> > +    {
> > +      int i = r + 1;
> > +      f[0][r] = f[1][r] * (f[2][r] + 1) - f[1][i] * f[2][i];
> > +      f[0][i] = f[1][r] * (f[2][i] + 1) + f[1][i] * f[2][r];
> > +      //                          ^^^^
> > +    }
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { target
> { vect_float } } } } */
> > +
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-8.c
> b/gcc/testsuite/gcc.dg/vect/complex/pr102819-8.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..07b48148688b7d530e5891d02
> 3d558b58a485c23
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-8.c
> > @@ -0,0 +1,18 @@
> > +/* { dg-do compile } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +
> > +float f[12][100];
> > +
> > +void bad3()
> > +{
> > +  for (int r = 0; r < 100; r += 2)
> > +    {
> > +      int i = r + 1;
> > +      f[0][r] = f[1][r] * f[2][r] - f[1][r] * f[2][i];
> > +      f[0][i] = f[1][r] * f[2][i] + f[1][i] * f[2][r];
> > +      //                            ^^^^^^^
> > +    }
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { target
> { vect_float } } } } */
> > +
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-9.c
> b/gcc/testsuite/gcc.dg/vect/complex/pr102819-9.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..7655852434b21b381fe7ee316
> e8caf3d485b8ee1
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-9.c
> > @@ -0,0 +1,21 @@
> > +/* { dg-do compile } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +
> > +#include <stdio.h>
> > +#include <complex.h>
> > +
> > +#define N 200
> > +#define TYPE float
> > +#define TYPE2 float
> > +
> > +void g (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE
> complex c[restrict N])
> > +{
> > +  for (int i=0; i < N; i++)
> > +    {
> > +      c[i] -=  a[i] * b[0];
> > +    }
> > +}
> > +
> > +/* The pattern overlaps with COMPLEX_ADD so we need to support
> consuming ADDs in COMPLEX_FMS.  */
> > +
> > +/* { dg-final { scan-tree-dump "Found COMPLEX_FMS" "vect" { xfail
> { vect_float } } } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr103169.c
> b/gcc/testsuite/gcc.dg/vect/complex/pr103169.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..1bfabbd85a0eedfb4156a8257
> 4324126e9083fc5
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr103169.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile { target { vect_double } } } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +/* { dg-additional-options "-O2 -fvect-cost-model=unlimited" } */
> > +
> > +_Complex double b_0, c_0;
> > +
> > +void
> > +mul270snd (void)
> > +{
> > +  c_0 = b_0 * 1.0iF * 1.0iF;
> > +}
> > +
> > diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h
> > index
> 74f579c9f3f23bac25d21546068c2ab43209aa2b..8ad5fa521279b20fa5e63eecf44
> 2d5dc5c16e7ee 100644
> > --- a/gcc/tree-data-ref.h
> > +++ b/gcc/tree-data-ref.h
> > @@ -600,10 +600,11 @@ same_data_refs_base_objects
> (data_reference_p a, data_reference_p b)
> >  }
> >
> >  /* Return true when the data references A and B are accessing the same
> > -   memory object with the same access functions.  */
> > +   memory object with the same access functions.  Optionally skip the
> > +   last OFFSET dimensions in the data reference.  */
> 
> But you skip the _first_ dimensions?

That's because the dimensions seem to be laid out in reverse order, i.e.
float f[12][200] with an access as f[1][r] gets a DR as:

>>> p debug (dr1)
#(Data Ref:
#  bb: 3
#  stmt: _1 = f[1][r_20];
#  ref: f[1][r_20];
#  base_object: f;
#  Access function 0: {0, +, 2}_1
#  Access function 1: 1
#)

So index 0 has the outer most dimension.

Cheers,
Tamar

> 
> Otherwise looks OK to me.
> 
> Thanks,
> Richard.
> 
> >  static inline bool
> > -same_data_refs (data_reference_p a, data_reference_p b)
> > +same_data_refs (data_reference_p a, data_reference_p b, int offset = 0)
> >  {
> >    unsigned int i;
> >
> > @@ -614,7 +615,7 @@ same_data_refs (data_reference_p a,
> data_reference_p b)
> >    if (!same_data_refs_base_objects (a, b))
> >      return false;
> >
> > -  for (i = 0; i < DR_NUM_DIMENSIONS (a); i++)
> > +  for (i = offset; i < DR_NUM_DIMENSIONS (a); i++)
> >      if (!eq_evolutions_p (DR_ACCESS_FN (a, i), DR_ACCESS_FN (b, i)))
> >        return false;
> >
> > diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
> > index
> 0350441fad9690cd5d04337171ca3470a064a571..f8da4153632a700680091f3730
> 5a5d3078fbb0c5 100644
> > --- a/gcc/tree-vect-slp-patterns.c
> > +++ b/gcc/tree-vect-slp-patterns.c
> > @@ -149,12 +149,13 @@ is_linear_load_p (load_permutation_t loads)
> >    int valid_patterns = 4;
> >    FOR_EACH_VEC_ELT (loads, i, load)
> >      {
> > -      if (candidates[0] != PERM_UNKNOWN && load != 1)
> > +      unsigned adj_load = load % 2;
> > +      if (candidates[0] != PERM_UNKNOWN && adj_load != 1)
> >         {
> >           candidates[0] = PERM_UNKNOWN;
> >           valid_patterns--;
> >         }
> > -      if (candidates[1] != PERM_UNKNOWN && load != 0)
> > +      if (candidates[1] != PERM_UNKNOWN && adj_load != 0)
> >         {
> >           candidates[1] = PERM_UNKNOWN;
> >           valid_patterns--;
> > @@ -596,11 +597,12 @@ class complex_add_pattern : public
> complex_pattern
> >    public:
> >      void build (vec_info *);
> >      static internal_fn
> > -    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
> slp_tree *,
> > -            vec<slp_tree> *);
> > +    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
> > +            slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *);
> >
> >      static vect_pattern*
> > -    recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
> > +    recognize (slp_tree_to_load_perm_map_t *,
> slp_compat_nodes_map_t *,
> > +              slp_tree *);
> >
> >      static vect_pattern*
> >      mkInstance (slp_tree *node, vec<slp_tree> *m_ops, internal_fn ifn)
> > @@ -647,6 +649,7 @@ complex_add_pattern::build (vec_info *vinfo)
> >  internal_fn
> >  complex_add_pattern::matches (complex_operation_t op,
> >                               slp_tree_to_load_perm_map_t *perm_cache,
> > +                             slp_compat_nodes_map_t * /* compat_cache */,
> >                               slp_tree *node, vec<slp_tree> *ops)
> >  {
> >    internal_fn ifn = IFN_LAST;
> > @@ -692,13 +695,14 @@ complex_add_pattern::matches
> (complex_operation_t op,
> >
> >  vect_pattern*
> >  complex_add_pattern::recognize (slp_tree_to_load_perm_map_t
> *perm_cache,
> > +                               slp_compat_nodes_map_t *compat_cache,
> >                                 slp_tree *node)
> >  {
> >    auto_vec<slp_tree> ops;
> >    complex_operation_t op
> >      = vect_detect_pair_op (*node, true, &ops);
> >    internal_fn ifn
> > -    = complex_add_pattern::matches (op, perm_cache, node, &ops);
> > +    = complex_add_pattern::matches (op, perm_cache, compat_cache,
> node, &ops);
> >    if (ifn == IFN_LAST)
> >      return NULL;
> >
> > @@ -709,147 +713,214 @@ complex_add_pattern::recognize
> (slp_tree_to_load_perm_map_t *perm_cache,
> >   * complex_mul_pattern
> >
> **********************************************************
> ********************/
> >
> > -/* Check to see if either of the trees in ARGS are a NEGATE_EXPR.  If the
> first
> > -   child (args[0]) is a NEGATE_EXPR then NEG_FIRST_P is set to TRUE.
> > -
> > -   If a negate is found then the values in ARGS are reordered such that the
> > -   negate node is always the second one and the entry is replaced by the
> child
> > -   of the negate node.  */
> > +/* Helper function to check if PERM is KIND or PERM_TOP.  */
> >
> >  static inline bool
> > -vect_normalize_conj_loc (vec<slp_tree> &args, bool *neg_first_p = NULL)
> > +is_eq_or_top (slp_tree_to_load_perm_map_t *perm_cache,
> > +             slp_tree op1, complex_perm_kinds_t kind1,
> > +             slp_tree op2, complex_perm_kinds_t kind2)
> >  {
> > -  gcc_assert (args.length () == 2);
> > -  bool neg_found = false;
> > -
> > -  if (vect_match_expression_p (args[0], NEGATE_EXPR))
> > -    {
> > -      std::swap (args[0], args[1]);
> > -      neg_found = true;
> > -      if (neg_first_p)
> > -       *neg_first_p = true;
> > -    }
> > -  else if (vect_match_expression_p (args[1], NEGATE_EXPR))
> > -    {
> > -      neg_found = true;
> > -      if (neg_first_p)
> > -       *neg_first_p = false;
> > -    }
> > +  complex_perm_kinds_t perm1 = linear_loads_p (perm_cache, op1);
> > +  if (perm1 != kind1 && perm1 != PERM_TOP)
> > +    return false;
> >
> > -  if (neg_found)
> > -    args[1] = SLP_TREE_CHILDREN (args[1])[0];
> > +  complex_perm_kinds_t perm2 = linear_loads_p (perm_cache, op2);
> > +  if (perm2 != kind2 && perm2 != PERM_TOP)
> > +    return false;
> >
> > -  return neg_found;
> > +  return true;
> >  }
> >
> > -/* Helper function to check if PERM is KIND or PERM_TOP.  */
> > +enum _conj_status { CONJ_NONE, CONJ_FST, CONJ_SND };
> >
> >  static inline bool
> > -is_eq_or_top (complex_perm_kinds_t perm, complex_perm_kinds_t
> kind)
> > +compatible_complex_nodes_p (slp_compat_nodes_map_t
> *compat_cache,
> > +                           slp_tree a, int *pa, slp_tree b, int *pb)
> >  {
> > -  return perm == kind || perm == PERM_TOP;
> > -}
> > +  bool *tmp;
> > +  std::pair<slp_tree, slp_tree> key = std::make_pair(a, b);
> > +  if ((tmp = compat_cache->get (key)) != NULL)
> > +    return *tmp;
> >
> > -/* Helper function that checks to see if LEFT_OP and RIGHT_OP are both
> MULT_EXPR
> > -   nodes but also that they represent an operation that is either a complex
> > -   multiplication or a complex multiplication by conjugated value.
> > +   compat_cache->put (key, false);
> >
> > -   Of the negation is expected to be in the first half of the tree (As required
> > -   by an FMS pattern) then NEG_FIRST is true.  If the operation is a
> conjugate
> > -   operation then CONJ_FIRST_OPERAND is set to indicate whether the
> first or
> > -   second operand contains the conjugate operation.  */
> > +  if (SLP_TREE_CHILDREN (a).length () != SLP_TREE_CHILDREN (b).length ())
> > +    return false;
> >
> > -static inline bool
> > -vect_validate_multiplication (slp_tree_to_load_perm_map_t
> *perm_cache,
> > -                             const vec<slp_tree> &left_op,
> > -                             const vec<slp_tree> &right_op,
> > -                            bool neg_first, bool *conj_first_operand,
> > -                            bool fms)
> > -{
> > -  /* The presence of a negation indicates that we have either a conjugate
> or a
> > -     rotation.  We need to distinguish which one.  */
> > -  *conj_first_operand = false;
> > -  complex_perm_kinds_t kind;
> > -
> > -  /* Complex conjugates have the negation on the imaginary part of the
> > -     number where rotations affect the real component.  So check if the
> > -     negation is on a dup of lane 1.  */
> > -  if (fms)
> > +  if (SLP_TREE_DEF_TYPE (a) != SLP_TREE_DEF_TYPE (b))
> > +    return false;
> > +
> > +  /* Only internal nodes can be loads, as such we can't check further if
> they
> > +     are externals.  */
> > +  if (SLP_TREE_DEF_TYPE (a) != vect_internal_def)
> >      {
> > -      /* Canonicalization for fms is not consistent. So have to test both
> > -        variants to be sure.  This needs to be fixed in the mid-end so
> > -        this part can be simpler.  */
> > -      kind = linear_loads_p (perm_cache, right_op[0]);
> > -      if (!((is_eq_or_top (linear_loads_p (perm_cache, right_op[0]),
> PERM_ODDODD)
> > -          && is_eq_or_top (linear_loads_p (perm_cache, right_op[1]),
> > -                            PERM_ODDEVEN))
> > -         || (kind == PERM_ODDEVEN
> > -             && is_eq_or_top (linear_loads_p (perm_cache, right_op[1]),
> > -                            PERM_ODDODD))))
> > -       return false;
> > +      for (unsigned i = 0; i < SLP_TREE_SCALAR_OPS (a).length (); i++)
> > +       {
> > +         tree op1 = SLP_TREE_SCALAR_OPS (a)[pa[i % 2]];
> > +         tree op2 = SLP_TREE_SCALAR_OPS (b)[pb[i % 2]];
> > +         if (!operand_equal_p (op1, op2, 0))
> > +           return false;
> > +       }
> > +
> > +      compat_cache->put (key, true);
> > +      return true;
> > +    }
> > +
> > +  auto a_stmt = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE (a));
> > +  auto b_stmt = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE (b));
> > +
> > +  if (gimple_code (a_stmt) != gimple_code (b_stmt))
> > +    return false;
> > +
> > +  /* code, children, type, externals, loads, constants  */
> > +  if (gimple_num_args (a_stmt) != gimple_num_args (b_stmt))
> > +    return false;
> > +
> > +  /* At this point, a and b are known to be the same gimple operations.  */
> > +  if (is_gimple_call (a_stmt))
> > +    {
> > +       if (!compatible_calls_p (dyn_cast <gcall *> (a_stmt),
> > +                                dyn_cast <gcall *> (b_stmt)))
> > +         return false;
> >      }
> > +  else if (!is_gimple_assign (a_stmt))
> > +    return false;
> >    else
> >      {
> > -      if (linear_loads_p (perm_cache, right_op[1]) != PERM_ODDODD
> > -         && !is_eq_or_top (linear_loads_p (perm_cache, right_op[0]),
> > -                           PERM_ODDEVEN))
> > +      tree_code acode = gimple_assign_rhs_code (a_stmt);
> > +      tree_code bcode = gimple_assign_rhs_code (b_stmt);
> > +      if ((acode == REALPART_EXPR || acode == IMAGPART_EXPR)
> > +         && (bcode == REALPART_EXPR || bcode == IMAGPART_EXPR))
> > +       return true;
> > +
> > +      if (acode != bcode)
> >         return false;
> >      }
> >
> > -  /* Deal with differences in indexes.  */
> > -  int index1 = fms ? 1 : 0;
> > -  int index2 = fms ? 0 : 1;
> > -
> > -  /* Check if the conjugate is on the second first or second operand.  The
> > -     order of the node with the conjugate value determines this, and the
> dup
> > -     node must be one of lane 0 of the same DR as the neg node.  */
> > -  kind = linear_loads_p (perm_cache, left_op[index1]);
> > -  if (kind == PERM_TOP)
> > +  if (!SLP_TREE_LOAD_PERMUTATION (a).exists ()
> > +      || !SLP_TREE_LOAD_PERMUTATION (b).exists ())
> >      {
> > -      if (linear_loads_p (perm_cache, left_op[index2]) == PERM_EVENODD)
> > -       return true;
> > +      for (unsigned i = 0; i < gimple_num_args (a_stmt); i++)
> > +       {
> > +         tree t1 = gimple_arg (a_stmt, i);
> > +         tree t2 = gimple_arg (b_stmt, i);
> > +         if (TREE_CODE (t1) != TREE_CODE (t2))
> > +           return false;
> > +
> > +         /* If SSA name then we will need to inspect the children
> > +            so we can punt here.  */
> > +         if (TREE_CODE (t1) == SSA_NAME)
> > +           continue;
> > +
> > +         if (!operand_equal_p (t1, t2, 0))
> > +           return false;
> > +       }
> >      }
> > -  else if (kind == PERM_EVENODD && !neg_first)
> > +  else
> >      {
> > -      if ((kind = linear_loads_p (perm_cache, left_op[index2])) !=
> PERM_EVENEVEN)
> > +      auto dr1 = STMT_VINFO_DATA_REF (SLP_TREE_REPRESENTATIVE (a));
> > +      auto dr2 = STMT_VINFO_DATA_REF (SLP_TREE_REPRESENTATIVE (b));
> > +      /* Don't check the last dimension as that's checked by the lineary
> > +        checks.  This check is also much stricter than what we need
> > +        because it doesn't consider loading from adjacent elements
> > +        in the same struct as loading from the same base object.
> > +        But for now, I'll play it safe.  */
> > +      if (!same_data_refs (dr1, dr2, 1))
> >         return false;
> > -      return true;
> >      }
> > -  else if (kind == PERM_EVENEVEN && neg_first)
> > +
> > +  for (unsigned i = 0; i < SLP_TREE_CHILDREN (a).length (); i++)
> >      {
> > -      if ((kind = linear_loads_p (perm_cache, left_op[index2])) !=
> PERM_EVENODD)
> > +      if (!compatible_complex_nodes_p (compat_cache,
> > +                                      SLP_TREE_CHILDREN (a)[i], pa,
> > +                                      SLP_TREE_CHILDREN (b)[i], pb))
> >         return false;
> > -
> > -      *conj_first_operand = true;
> > -      return true;
> >      }
> > -  else
> > -    return false;
> > -
> > -  if (kind != PERM_EVENEVEN)
> > -    return false;
> >
> > +  compat_cache->put (key, true);
> >    return true;
> >  }
> >
> > -/* Helper function to help distinguish between a conjugate and a rotation
> in a
> > -   complex multiplication.  The operations have similar shapes but the order
> of
> > -   the load permutes are different.  This function returns TRUE when the
> order
> > -   is consistent with a multiplication or multiplication by conjugated
> > -   operand but returns FALSE if it's a multiplication by rotated operand.  */
> > -
> >  static inline bool
> >  vect_validate_multiplication (slp_tree_to_load_perm_map_t
> *perm_cache,
> > -                             const vec<slp_tree> &op,
> > -                             complex_perm_kinds_t permKind)
> > +                             slp_compat_nodes_map_t *compat_cache,
> > +                             vec<slp_tree> &left_op,
> > +                             vec<slp_tree> &right_op,
> > +                             bool subtract,
> > +                             enum _conj_status *_status)
> >  {
> > -  /* The left node is the more common case, test it first.  */
> > -  if (!is_eq_or_top (linear_loads_p (perm_cache, op[0]), permKind))
> > +  auto_vec<slp_tree> ops;
> > +  enum _conj_status stats = CONJ_NONE;
> > +
> > +  /* The complex operations can occur in two layouts and two permute
> sequences
> > +     so declare them and re-use them.  */
> > +  int styles[][4] = { { 0, 2, 1, 3} /* {L1, R1} + {L2, R2}.  */
> > +                   , { 0, 3, 1, 2} /* {L1, R2} + {L2, R1}.  */
> > +                   };
> > +
> > +  /* Now for the corresponding permutes that go with these values.  */
> > +  complex_perm_kinds_t perms[][4]
> > +    = { { PERM_EVENEVEN, PERM_ODDODD, PERM_EVENODD,
> PERM_ODDEVEN }
> > +      , { PERM_EVENODD, PERM_ODDEVEN, PERM_EVENEVEN,
> PERM_ODDODD }
> > +      };
> > +
> > +  /* These permutes are used during comparisons of externals on which
> > +     we require strict equality.  */
> > +  int cq[][4][2]
> > +    = { { { 0, 0 }, { 1, 1 }, { 0, 1 }, { 1, 0 } }
> > +      , { { 0, 1 }, { 1, 0 }, { 0, 0 }, { 1, 1 } }
> > +      };
> > +
> > +  /* Default to style and perm 0, most operations use this one.  */
> > +  int style = 0;
> > +  int perm = subtract ? 1 : 0;
> > +
> > +  /* Check if we have a negate operation, if so absorb the node and
> continue
> > +     looking.  */
> > +  bool neg0 = vect_match_expression_p (right_op[0], NEGATE_EXPR);
> > +  bool neg1 = vect_match_expression_p (right_op[1], NEGATE_EXPR);
> > +
> > +  /* Determine which style we're looking at.  We only have different ones
> > +     whenever a conjugate is involved.  */
> > +  if (neg0 && neg1)
> > +    ;
> > +  else if (neg0)
> >      {
> > -      if (!is_eq_or_top (linear_loads_p (perm_cache, op[1]), permKind))
> > -       return false;
> > +      right_op[0] = SLP_TREE_CHILDREN (right_op[0])[0];
> > +      stats = CONJ_FST;
> > +      if (subtract)
> > +       perm = 0;
> >      }
> > -  return true;
> > +  else if (neg1)
> > +    {
> > +      right_op[1] = SLP_TREE_CHILDREN (right_op[1])[0];
> > +      stats = CONJ_SND;
> > +      perm = 1;
> > +    }
> > +
> > +  *_status = stats;
> > +
> > +  /* Flatten the inputs after we've remapped them.  */
> > +  ops.create (4);
> > +  ops.safe_splice (left_op);
> > +  ops.safe_splice (right_op);
> > +
> > +  /* Extract out the elements to check.  */
> > +  slp_tree op0 = ops[styles[style][0]];
> > +  slp_tree op1 = ops[styles[style][1]];
> > +  slp_tree op2 = ops[styles[style][2]];
> > +  slp_tree op3 = ops[styles[style][3]];
> > +
> > +  /* Do cheapest test first.  If failed no need to analyze further.  */
> > +  if (linear_loads_p (perm_cache, op0) != perms[perm][0]
> > +      || linear_loads_p (perm_cache, op1) != perms[perm][1]
> > +      || !is_eq_or_top (perm_cache, op2, perms[perm][2], op3,
> perms[perm][3]))
> > +    return false;
> > +
> > +  return compatible_complex_nodes_p (compat_cache, op0, cq[perm][0],
> op1,
> > +                                    cq[perm][1])
> > +        && compatible_complex_nodes_p (compat_cache, op2, cq[perm][2],
> op3,
> > +                                       cq[perm][3]);
> >  }
> >
> >  /* This function combines two nodes containing only even and only odd
> lanes
> > @@ -908,11 +979,12 @@ class complex_mul_pattern : public
> complex_pattern
> >    public:
> >      void build (vec_info *);
> >      static internal_fn
> > -    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
> slp_tree *,
> > -            vec<slp_tree> *);
> > +    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
> > +            slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *);
> >
> >      static vect_pattern*
> > -    recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
> > +    recognize (slp_tree_to_load_perm_map_t *,
> slp_compat_nodes_map_t *,
> > +              slp_tree *);
> >
> >      static vect_pattern*
> >      mkInstance (slp_tree *node, vec<slp_tree> *m_ops, internal_fn ifn)
> > @@ -943,6 +1015,7 @@ class complex_mul_pattern : public
> complex_pattern
> >  internal_fn
> >  complex_mul_pattern::matches (complex_operation_t op,
> >                               slp_tree_to_load_perm_map_t *perm_cache,
> > +                             slp_compat_nodes_map_t *compat_cache,
> >                               slp_tree *node, vec<slp_tree> *ops)
> >  {
> >    internal_fn ifn = IFN_LAST;
> > @@ -990,17 +1063,13 @@ complex_mul_pattern::matches
> (complex_operation_t op,
> >        || linear_loads_p (perm_cache, left_op[1]) == PERM_ODDEVEN)
> >      return IFN_LAST;
> >
> > -  bool neg_first = false;
> > -  bool conj_first_operand = false;
> > -  bool is_neg = vect_normalize_conj_loc (right_op, &neg_first);
> > +  enum _conj_status status;
> > +  if (!vect_validate_multiplication (perm_cache, compat_cache, left_op,
> > +                                    right_op, false, &status))
> > +    return IFN_LAST;
> >
> > -  if (!is_neg)
> > +  if (status == CONJ_NONE)
> >      {
> > -      /* A multiplication needs to multiply agains the real pair, otherwise
> > -        the pattern matches that of FMS.   */
> > -      if (!vect_validate_multiplication (perm_cache, left_op,
> PERM_EVENEVEN)
> > -         || vect_normalize_conj_loc (left_op))
> > -       return IFN_LAST;
> >        if (add0)
> >         ifn = IFN_COMPLEX_FMA;
> >        else
> > @@ -1008,11 +1077,6 @@ complex_mul_pattern::matches
> (complex_operation_t op,
> >      }
> >    else
> >      {
> > -      if (!vect_validate_multiplication (perm_cache, left_op, right_op,
> > -                                        neg_first, &conj_first_operand,
> > -                                        false))
> > -       return IFN_LAST;
> > -
> >        if(add0)
> >         ifn = IFN_COMPLEX_FMA_CONJ;
> >        else
> > @@ -1029,19 +1093,13 @@ complex_mul_pattern::matches
> (complex_operation_t op,
> >      ops->quick_push (add0);
> >
> >    complex_perm_kinds_t kind = linear_loads_p (perm_cache, left_op[0]);
> > -  if (kind == PERM_EVENODD)
> > -    {
> > -      ops->quick_push (left_op[1]);
> > -      ops->quick_push (right_op[1]);
> > -      ops->quick_push (left_op[0]);
> > -    }
> > -  else if (kind == PERM_TOP)
> > +  if (kind == PERM_EVENODD || kind == PERM_TOP)
> >      {
> >        ops->quick_push (left_op[1]);
> >        ops->quick_push (right_op[1]);
> >        ops->quick_push (left_op[0]);
> >      }
> > -  else if (kind == PERM_EVENEVEN && !conj_first_operand)
> > +  else if (kind == PERM_EVENEVEN && status != CONJ_SND)
> >      {
> >        ops->quick_push (left_op[0]);
> >        ops->quick_push (right_op[0]);
> > @@ -1061,13 +1119,14 @@ complex_mul_pattern::matches
> (complex_operation_t op,
> >
> >  vect_pattern*
> >  complex_mul_pattern::recognize (slp_tree_to_load_perm_map_t
> *perm_cache,
> > +                               slp_compat_nodes_map_t *compat_cache,
> >                                 slp_tree *node)
> >  {
> >    auto_vec<slp_tree> ops;
> >    complex_operation_t op
> >      = vect_detect_pair_op (*node, true, &ops);
> >    internal_fn ifn
> > -    = complex_mul_pattern::matches (op, perm_cache, node, &ops);
> > +    = complex_mul_pattern::matches (op, perm_cache, compat_cache,
> node, &ops);
> >    if (ifn == IFN_LAST)
> >      return NULL;
> >
> > @@ -1097,8 +1156,8 @@ complex_mul_pattern::build (vec_info *vinfo)
> >
> >         /* First re-arrange the children.  */
> >         SLP_TREE_CHILDREN (*this->m_node).reserve_exact (2);
> > -       SLP_TREE_CHILDREN (*this->m_node)[0] = this->m_ops[2];
> > -       SLP_TREE_CHILDREN (*this->m_node)[1] = newnode;
> > +       SLP_TREE_CHILDREN (*this->m_node)[0] = newnode;
> > +       SLP_TREE_CHILDREN (*this->m_node)[1] = this->m_ops[2];
> >         break;
> >        }
> >      case IFN_COMPLEX_FMA:
> > @@ -1115,9 +1174,9 @@ complex_mul_pattern::build (vec_info *vinfo)
> >
> >         /* First re-arrange the children.  */
> >         SLP_TREE_CHILDREN (*this->m_node).safe_grow (3);
> > -       SLP_TREE_CHILDREN (*this->m_node)[0] = this->m_ops[0];
> > +       SLP_TREE_CHILDREN (*this->m_node)[0] = newnode;
> >         SLP_TREE_CHILDREN (*this->m_node)[1] = this->m_ops[3];
> > -       SLP_TREE_CHILDREN (*this->m_node)[2] = newnode;
> > +       SLP_TREE_CHILDREN (*this->m_node)[2] = this->m_ops[0];
> >
> >         /* Tell the builder to expect an extra argument.  */
> >         this->m_num_args++;
> > @@ -1147,11 +1206,12 @@ class complex_fms_pattern : public
> complex_pattern
> >    public:
> >      void build (vec_info *);
> >      static internal_fn
> > -    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
> slp_tree *,
> > -            vec<slp_tree> *);
> > +    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
> > +            slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *);
> >
> >      static vect_pattern*
> > -    recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
> > +    recognize (slp_tree_to_load_perm_map_t *,
> slp_compat_nodes_map_t *,
> > +              slp_tree *);
> >
> >      static vect_pattern*
> >      mkInstance (slp_tree *node, vec<slp_tree> *m_ops, internal_fn ifn)
> > @@ -1182,6 +1242,7 @@ class complex_fms_pattern : public
> complex_pattern
> >  internal_fn
> >  complex_fms_pattern::matches (complex_operation_t op,
> >                               slp_tree_to_load_perm_map_t *perm_cache,
> > +                             slp_compat_nodes_map_t *compat_cache,
> >                               slp_tree * ref_node, vec<slp_tree> *ops)
> >  {
> >    internal_fn ifn = IFN_LAST;
> > @@ -1197,6 +1258,8 @@ complex_fms_pattern::matches
> (complex_operation_t op,
> >    if (!vect_match_expression_p (root, MINUS_EXPR))
> >      return IFN_LAST;
> >
> > +  /* TODO: Support invariants here, with the new layout CADD now
> > +          can match before we get a chance to try CFMS.  */
> >    auto nodes = SLP_TREE_CHILDREN (root);
> >    if (!vect_match_expression_p (nodes[1], MULT_EXPR)
> >        || vect_detect_pair_op (nodes[0]) != PLUS_MINUS)
> > @@ -1217,16 +1280,14 @@ complex_fms_pattern::matches
> (complex_operation_t op,
> >        || !vect_match_expression_p (l0node[1], MULT_EXPR))
> >      return IFN_LAST;
> >
> > -  bool is_neg = vect_normalize_conj_loc (left_op);
> > -
> > -  bool conj_first_operand = false;
> > -  if (!vect_validate_multiplication (perm_cache, right_op, left_op, false,
> > -                                    &conj_first_operand, true))
> > +  enum _conj_status status;
> > +  if (!vect_validate_multiplication (perm_cache, compat_cache, right_op,
> > +                                    left_op, true, &status))
> >      return IFN_LAST;
> >
> > -  if (!is_neg)
> > +  if (status == CONJ_NONE)
> >      ifn = IFN_COMPLEX_FMS;
> > -  else if (is_neg)
> > +  else
> >      ifn = IFN_COMPLEX_FMS_CONJ;
> >
> >    if (!vect_pattern_validate_optab (ifn, *ref_node))
> > @@ -1243,26 +1304,12 @@ complex_fms_pattern::matches
> (complex_operation_t op,
> >        ops->quick_push (right_op[1]);
> >        ops->quick_push (left_op[1]);
> >      }
> > -  else if (kind == PERM_TOP)
> > -    {
> > -      ops->quick_push (l0node[0]);
> > -      ops->quick_push (right_op[1]);
> > -      ops->quick_push (right_op[0]);
> > -      ops->quick_push (left_op[0]);
> > -    }
> > -  else if (kind == PERM_EVENEVEN && !is_neg)
> > -    {
> > -      ops->quick_push (l0node[0]);
> > -      ops->quick_push (right_op[1]);
> > -      ops->quick_push (right_op[0]);
> > -      ops->quick_push (left_op[0]);
> > -    }
> >    else
> >      {
> >        ops->quick_push (l0node[0]);
> >        ops->quick_push (right_op[1]);
> >        ops->quick_push (right_op[0]);
> > -      ops->quick_push (left_op[1]);
> > +      ops->quick_push (left_op[0]);
> >      }
> >
> >    return ifn;
> > @@ -1272,13 +1319,14 @@ complex_fms_pattern::matches
> (complex_operation_t op,
> >
> >  vect_pattern*
> >  complex_fms_pattern::recognize (slp_tree_to_load_perm_map_t
> *perm_cache,
> > +                               slp_compat_nodes_map_t *compat_cache,
> >                                 slp_tree *node)
> >  {
> >    auto_vec<slp_tree> ops;
> >    complex_operation_t op
> >      = vect_detect_pair_op (*node, true, &ops);
> >    internal_fn ifn
> > -    = complex_fms_pattern::matches (op, perm_cache, node, &ops);
> > +    = complex_fms_pattern::matches (op, perm_cache, compat_cache,
> node, &ops);
> >    if (ifn == IFN_LAST)
> >      return NULL;
> >
> > @@ -1305,9 +1353,24 @@ complex_fms_pattern::build (vec_info *vinfo)
> >    SLP_TREE_CHILDREN (*this->m_node).create (3);
> >
> >    /* First re-arrange the children.  */
> > +  switch (this->m_ifn)
> > +  {
> > +    case IFN_COMPLEX_FMS:
> > +      {
> > +       SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[1]);
> > +       SLP_TREE_CHILDREN (*this->m_node).quick_push (newnode);
> > +       break;
> > +      }
> > +    case IFN_COMPLEX_FMS_CONJ:
> > +      {
> > +       SLP_TREE_CHILDREN (*this->m_node).quick_push (newnode);
> > +       SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[1]);
> > +       break;
> > +      }
> > +    default:
> > +      gcc_unreachable ();
> > +  }
> >    SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[0]);
> > -  SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[1]);
> > -  SLP_TREE_CHILDREN (*this->m_node).quick_push (newnode);
> >
> >    /* And then rewrite the node itself.  */
> >    complex_pattern::build (vinfo);
> > @@ -1334,11 +1397,12 @@ class complex_operations_pattern : public
> complex_pattern
> >    public:
> >      void build (vec_info *);
> >      static internal_fn
> > -    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
> slp_tree *,
> > -            vec<slp_tree> *);
> > +    matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
> > +            slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *);
> >
> >      static vect_pattern*
> > -    recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
> > +    recognize (slp_tree_to_load_perm_map_t *,
> slp_compat_nodes_map_t *,
> > +              slp_tree *);
> >  };
> >
> >  /* Dummy matches implementation for proxy object.  */
> > @@ -1347,6 +1411,7 @@ internal_fn
> >  complex_operations_pattern::
> >  matches (complex_operation_t /* op */,
> >          slp_tree_to_load_perm_map_t * /* perm_cache */,
> > +        slp_compat_nodes_map_t * /* compat_cache */,
> >          slp_tree * /* ref_node */, vec<slp_tree> * /* ops */)
> >  {
> >    return IFN_LAST;
> > @@ -1356,6 +1421,7 @@ matches (complex_operation_t /* op */,
> >
> >  vect_pattern*
> >  complex_operations_pattern::recognize (slp_tree_to_load_perm_map_t
> *perm_cache,
> > +                                      slp_compat_nodes_map_t *ccache,
> >                                        slp_tree *node)
> >  {
> >    auto_vec<slp_tree> ops;
> > @@ -1363,15 +1429,15 @@ complex_operations_pattern::recognize
> (slp_tree_to_load_perm_map_t *perm_cache,
> >      = vect_detect_pair_op (*node, true, &ops);
> >    internal_fn ifn = IFN_LAST;
> >
> > -  ifn  = complex_fms_pattern::matches (op, perm_cache, node, &ops);
> > +  ifn  = complex_fms_pattern::matches (op, perm_cache, ccache, node,
> &ops);
> >    if (ifn != IFN_LAST)
> >      return complex_fms_pattern::mkInstance (node, &ops, ifn);
> >
> > -  ifn  = complex_mul_pattern::matches (op, perm_cache, node, &ops);
> > +  ifn  = complex_mul_pattern::matches (op, perm_cache, ccache, node,
> &ops);
> >    if (ifn != IFN_LAST)
> >      return complex_mul_pattern::mkInstance (node, &ops, ifn);
> >
> > -  ifn  = complex_add_pattern::matches (op, perm_cache, node, &ops);
> > +  ifn  = complex_add_pattern::matches (op, perm_cache, ccache, node,
> &ops);
> >    if (ifn != IFN_LAST)
> >      return complex_add_pattern::mkInstance (node, &ops, ifn);
> >
> > @@ -1398,11 +1464,13 @@ class addsub_pattern : public vect_pattern
> >      void build (vec_info *);
> >
> >      static vect_pattern*
> > -    recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
> > +    recognize (slp_tree_to_load_perm_map_t *,
> slp_compat_nodes_map_t *,
> > +              slp_tree *);
> >  };
> >
> >  vect_pattern *
> > -addsub_pattern::recognize (slp_tree_to_load_perm_map_t *, slp_tree
> *node_)
> > +addsub_pattern::recognize (slp_tree_to_load_perm_map_t *,
> > +                          slp_compat_nodes_map_t *, slp_tree *node_)
> >  {
> >    slp_tree node = *node_;
> >    if (SLP_TREE_CODE (node) != VEC_PERM_EXPR
> > diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
> > index
> b912c3577df61a694d5bb9e22c5303fe6a48ab6e..cb577f8a612d583254e42bb06
> a6d7a0875de5e75 100644
> > --- a/gcc/tree-vect-slp.c
> > +++ b/gcc/tree-vect-slp.c
> > @@ -804,7 +804,7 @@ vect_get_and_check_slp_defs (vec_info *vinfo,
> unsigned char swap,
> >  /* Return true if call statements CALL1 and CALL2 are similar enough
> >     to be combined into the same SLP group.  */
> >
> > -static bool
> > +bool
> >  compatible_calls_p (gcall *call1, gcall *call2)
> >  {
> >    unsigned int nargs = gimple_call_num_args (call1);
> > @@ -2907,6 +2907,7 @@ optimize_load_redistribution
> (scalar_stmts_to_slp_tree_map_t *bst_map,
> >  static bool
> >  vect_match_slp_patterns_2 (slp_tree *ref_node, vec_info *vinfo,
> >                            slp_tree_to_load_perm_map_t *perm_cache,
> > +                          slp_compat_nodes_map_t *compat_cache,
> >                            hash_set<slp_tree> *visited)
> >  {
> >    unsigned i;
> > @@ -2918,11 +2919,13 @@ vect_match_slp_patterns_2 (slp_tree
> *ref_node, vec_info *vinfo,
> >    slp_tree child;
> >    FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
> >      found_p |= vect_match_slp_patterns_2 (&SLP_TREE_CHILDREN
> (node)[i],
> > -                                         vinfo, perm_cache, visited);
> > +                                         vinfo, perm_cache, compat_cache,
> > +                                         visited);
> >
> >    for (unsigned x = 0; x < num__slp_patterns; x++)
> >      {
> > -      vect_pattern *pattern = slp_patterns[x] (perm_cache, ref_node);
> > +      vect_pattern *pattern
> > +       = slp_patterns[x] (perm_cache, compat_cache, ref_node);
> >        if (pattern)
> >         {
> >           pattern->build (vinfo);
> > @@ -2943,7 +2946,8 @@ vect_match_slp_patterns_2 (slp_tree *ref_node,
> vec_info *vinfo,
> >  static bool
> >  vect_match_slp_patterns (slp_instance instance, vec_info *vinfo,
> >                          hash_set<slp_tree> *visited,
> > -                        slp_tree_to_load_perm_map_t *perm_cache)
> > +                        slp_tree_to_load_perm_map_t *perm_cache,
> > +                        slp_compat_nodes_map_t *compat_cache)
> >  {
> >    DUMP_VECT_SCOPE ("vect_match_slp_patterns");
> >    slp_tree *ref_node = &SLP_INSTANCE_TREE (instance);
> > @@ -2953,7 +2957,8 @@ vect_match_slp_patterns (slp_instance instance,
> vec_info *vinfo,
> >                      "Analyzing SLP tree %p for patterns\n",
> >                      SLP_INSTANCE_TREE (instance));
> >
> > -  return vect_match_slp_patterns_2 (ref_node, vinfo, perm_cache,
> visited);
> > +  return vect_match_slp_patterns_2 (ref_node, vinfo, perm_cache,
> compat_cache,
> > +                                   visited);
> >  }
> >
> >  /* STMT_INFO is a store group of size GROUP_SIZE that we are considering
> > @@ -3437,12 +3442,14 @@ vect_analyze_slp (vec_info *vinfo, unsigned
> max_tree_size)
> >
> >    hash_set<slp_tree> visited_patterns;
> >    slp_tree_to_load_perm_map_t perm_cache;
> > +  slp_compat_nodes_map_t compat_cache;
> >
> >    /* See if any patterns can be found in the SLP tree.  */
> >    bool pattern_found = false;
> >    FOR_EACH_VEC_ELT (LOOP_VINFO_SLP_INSTANCES (vinfo), i, instance)
> >      pattern_found |= vect_match_slp_patterns (instance, vinfo,
> > -                                             &visited_patterns, &perm_cache);
> > +                                             &visited_patterns, &perm_cache,
> > +                                             &compat_cache);
> >
> >    /* If any were found optimize permutations of loads.  */
> >    if (pattern_found)
> > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> > index
> 2f6e1e268fb07e9de065ff9c45af87546e565d66..83cd0919c7838c65576e1debd8
> 81e0ec636a605a 100644
> > --- a/gcc/tree-vectorizer.h
> > +++ b/gcc/tree-vectorizer.h
> > @@ -2268,6 +2268,7 @@ extern void duplicate_and_interleave (vec_info *,
> gimple_seq *, tree,
> >  extern int vect_get_place_in_interleaving_chain (stmt_vec_info,
> stmt_vec_info);
> >  extern slp_tree vect_create_new_slp_node (unsigned, tree_code);
> >  extern void vect_free_slp_tree (slp_tree);
> > +extern bool compatible_calls_p (gcall *, gcall *);
> >
> >  /* In tree-vect-patterns.c.  */
> >  extern void
> > @@ -2306,6 +2307,12 @@ typedef enum _complex_perm_kinds {
> >  typedef hash_map <slp_tree, complex_perm_kinds_t>
> >    slp_tree_to_load_perm_map_t;
> >
> > +/* Cache from nodes pair to being compatible or not.  */
> > +typedef pair_hash <nofree_ptr_hash <_slp_tree>,
> > +                  nofree_ptr_hash <_slp_tree>> slp_node_hash;
> > +typedef hash_map <slp_node_hash, bool> slp_compat_nodes_map_t;
> > +
> > +
> >  /* Vector pattern matcher base class.  All SLP pattern matchers must
> inherit
> >     from this type.  */
> >
> > @@ -2338,7 +2345,8 @@ class vect_pattern
> >    public:
> >
> >      /* Create a new instance of the pattern matcher class of the given type.
> */
> > -    static vect_pattern* recognize (slp_tree_to_load_perm_map_t *,
> slp_tree *);
> > +    static vect_pattern* recognize (slp_tree_to_load_perm_map_t *,
> > +                                   slp_compat_nodes_map_t *, slp_tree *);
> >
> >      /* Build the pattern from the data collected so far.  */
> >      virtual void build (vec_info *) = 0;
> > @@ -2352,6 +2360,7 @@ class vect_pattern
> >
> >  /* Function pointer to create a new pattern matcher from a generic type.
> */
> >  typedef vect_pattern* (*vect_pattern_decl_t)
> (slp_tree_to_load_perm_map_t *,
> > +                                             slp_compat_nodes_map_t *,
> >                                               slp_tree *);
> >
> >  /* List of supported pattern matchers.  */
> >
> >
> > --

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [3/3 PATCH][AArch32] use canonical ordering for complex mul, fma and fms
  2022-01-11  7:10     ` Tamar Christina
@ 2022-02-01  9:54       ` Tamar Christina
  0 siblings, 0 replies; 18+ messages in thread
From: Tamar Christina @ 2022-02-01  9:54 UTC (permalink / raw)
  To: gcc-patches
  Cc: nd, Ramana Radhakrishnan, Richard Earnshaw, nickc, Kyrylo Tkachov

Ping x3

> -----Original Message-----
> From: Tamar Christina
> Sent: Tuesday, January 11, 2022 7:11 AM
> To: gcc-patches@gcc.gnu.org
> Cc: nd <nd@arm.com>; Ramana Radhakrishnan
> <Ramana.Radhakrishnan@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; 'nickc@redhat.com' <nickc@redhat.com>;
> Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> Subject: RE: [3/3 PATCH][AArch32] use canonical ordering for complex mul,
> fma and fms
> 
> ping
> 
> > -----Original Message-----
> > From: Tamar Christina
> > Sent: Monday, December 20, 2021 4:22 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: nd <nd@arm.com>; Ramana Radhakrishnan
> > <Ramana.Radhakrishnan@arm.com>; Richard Earnshaw
> > <Richard.Earnshaw@arm.com>; nickc@redhat.com; Kyrylo Tkachov
> > <Kyrylo.Tkachov@arm.com>
> > Subject: RE: [3/3 PATCH][AArch32] use canonical ordering for complex
> > mul, fma and fms
> >
> > Updated version of patch following AArch64 review.
> >
> > Bootstrapped Regtested on arm-none-linux-gnueabihf and no issues.
> >
> > Ok for master? and backport along with the first patch?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	PR tree-optimization/102819
> > 	PR tree-optimization/103169
> > 	* config/arm/vec-common.md (cml<fcmac1><conj_op><mode>4):
> > Use
> > 	canonical order.
> >
> > --- inline copy of patch ---
> >
> > diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-
> > common.md index
> >
> e71d9b3811fde62159f5c21944fef9fe3f97b4bd..eab77ac8decce76d70f5b2594f
> > 4439e6ed363e6e 100644
> > --- a/gcc/config/arm/vec-common.md
> > +++ b/gcc/config/arm/vec-common.md
> > @@ -265,18 +265,18 @@ (define_expand "arm_vcmla<rot><mode>"
> >  ;; remainder.  Because of this, expand early.
> >  (define_expand "cml<fcmac1><conj_op><mode>4"
> >    [(set (match_operand:VF 0 "register_operand")
> > -	(plus:VF (match_operand:VF 1 "register_operand")
> > -		 (unspec:VF [(match_operand:VF 2 "register_operand")
> > -			     (match_operand:VF 3 "register_operand")]
> > -			    VCMLA_OP)))]
> > +	(plus:VF (unspec:VF [(match_operand:VF 1 "register_operand")
> > +			     (match_operand:VF 2 "register_operand")]
> > +			    VCMLA_OP)
> > +		 (match_operand:VF 3 "register_operand")))]
> >    "(TARGET_COMPLEX || (TARGET_HAVE_MVE &&
> TARGET_HAVE_MVE_FLOAT
> >  		      && ARM_HAVE_<MODE>_ARITH))
> > && !BYTES_BIG_ENDIAN"
> >  {
> >    rtx tmp = gen_reg_rtx (<MODE>mode);
> > -  emit_insn (gen_arm_vcmla<rotsplit1><mode> (tmp, operands[1],
> > -					     operands[3], operands[2]));
> > +  emit_insn (gen_arm_vcmla<rotsplit1><mode> (tmp, operands[3],
> > +					     operands[2], operands[1]));
> >    emit_insn (gen_arm_vcmla<rotsplit2><mode> (operands[0], tmp,
> > -					     operands[3], operands[2]));
> > +					     operands[2], operands[1]));
> >    DONE;
> >  })


^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [2/3 PATCH]AArch64 use canonical ordering for complex mul, fma and fms
  2022-01-11  7:10         ` Tamar Christina
@ 2022-02-01  9:55           ` Tamar Christina
  0 siblings, 0 replies; 18+ messages in thread
From: Tamar Christina @ 2022-02-01  9:55 UTC (permalink / raw)
  To: Richard Sandiford
  Cc: gcc-patches, nd, Richard Earnshaw, Marcus Shawcroft, Kyrylo Tkachov

Ping x3.

> -----Original Message-----
> From: Tamar Christina
> Sent: Tuesday, January 11, 2022 7:11 AM
> To: Richard Sandiford <richard.sandiford@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Marcus Shawcroft
> <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> Subject: RE: [2/3 PATCH]AArch64 use canonical ordering for complex mul,
> fma and fms
> 
> ping
> 
> > -----Original Message-----
> > From: Tamar Christina
> > Sent: Monday, December 20, 2021 4:21 PM
> > To: Richard Sandiford <richard.sandiford@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Earnshaw
> > <Richard.Earnshaw@arm.com>; Marcus Shawcroft
> > <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov
> <Kyrylo.Tkachov@arm.com>
> > Subject: RE: [2/3 PATCH]AArch64 use canonical ordering for complex
> > mul, fma and fms
> >
> >
> >
> > > -----Original Message-----
> > > From: Richard Sandiford <richard.sandiford@arm.com>
> > > Sent: Friday, December 17, 2021 4:49 PM
> > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Earnshaw
> > > <Richard.Earnshaw@arm.com>; Marcus Shawcroft
> > > <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov
> > <Kyrylo.Tkachov@arm.com>
> > > Subject: Re: [2/3 PATCH]AArch64 use canonical ordering for complex
> > > mul, fma and fms
> > >
> > > Richard Sandiford <richard.sandiford@arm.com> writes:
> > > > Tamar Christina <tamar.christina@arm.com> writes:
> > > >> Hi All,
> > > >>
> > > >> After the first patch in the series this updates the optabs to
> > > >> expect the canonical sequence.
> > > >>
> > > >> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > >>
> > > >> Ok for master? and backport along with the first patch?
> > > >>
> > > >> Thanks,
> > > >> Tamar
> > > >>
> > > >> gcc/ChangeLog:
> > > >>
> > > >> 	PR tree-optimization/102819
> > > >> 	PR tree-optimization/103169
> > > >> 	* config/aarch64/aarch64-simd.md
> > > (cml<fcmac1><conj_op><mode>4,
> > > >> 	cmul<conj_op><mode>3): Use canonical order.
> > > >> 	* config/aarch64/aarch64-sve.md (cml<fcmac1><conj_op><mode>4,
> > > >> 	cmul<conj_op><mode>3): Likewise.
> > > >>
> > > >> --- inline copy of patch --
> > > >> diff --git a/gcc/config/aarch64/aarch64-simd.md
> > > >> b/gcc/config/aarch64/aarch64-simd.md
> > > >> index
> > > >>
> > >
> >
> f95a7e1d91c97c9e981d75e71f0b49c02ef748ba..875896ee71324712c8034eeff9
> > > c
> > > >> fb5649f9b0e73 100644
> > > >> --- a/gcc/config/aarch64/aarch64-simd.md
> > > >> +++ b/gcc/config/aarch64/aarch64-simd.md
> > > >> @@ -556,17 +556,17 @@ (define_insn
> > > "aarch64_fcmlaq_lane<rot><mode>"
> > > >>  ;; remainder.  Because of this, expand early.
> > > >>  (define_expand "cml<fcmac1><conj_op><mode>4"
> > > >>    [(set (match_operand:VHSDF 0 "register_operand")
> > > >> -	(plus:VHSDF (match_operand:VHSDF 1 "register_operand")
> > > >> -		    (unspec:VHSDF [(match_operand:VHSDF 2
> > > "register_operand")
> > > >> -				   (match_operand:VHSDF 3
> > > "register_operand")]
> > > >> -				   FCMLA_OP)))]
> > > >> +	(plus:VHSDF (unspec:VHSDF [(match_operand:VHSDF 1
> > > "register_operand")
> > > >> +				   (match_operand:VHSDF 2
> > > "register_operand")]
> > > >> +				   FCMLA_OP)
> > > >> +		    (match_operand:VHSDF 3 "register_operand")))]
> > > >>    "TARGET_COMPLEX && !BYTES_BIG_ENDIAN"
> > > >>  {
> > > >>    rtx tmp = gen_reg_rtx (<MODE>mode);
> > > >> -  emit_insn (gen_aarch64_fcmla<rotsplit1><mode> (tmp,
> operands[1],
> > > >> -						 operands[3], operands[2]));
> > > >> +  emit_insn (gen_aarch64_fcmla<rotsplit1><mode> (tmp,
> operands[3],
> > > >> +						 operands[1],
> operands[2]));
> > > >>    emit_insn (gen_aarch64_fcmla<rotsplit2><mode> (operands[0],
> tmp,
> > > >> -						 operands[3], operands[2]));
> > > >> +						 operands[1],
> operands[2]));
> > > >>    DONE;
> > > >>  })
> > > >>
> > > >> @@ -583,9 +583,9 @@ (define_expand "cmul<conj_op><mode>3"
> > > >>    rtx tmp = force_reg (<MODE>mode, CONST0_RTX (<MODE>mode));
> > > >>    rtx res1 = gen_reg_rtx (<MODE>mode);
> > > >>    emit_insn (gen_aarch64_fcmla<rotsplit1><mode> (res1, tmp,
> > > >> -						 operands[2], operands[1]));
> > > >> +						 operands[1],
> operands[2]));
> > > >>    emit_insn (gen_aarch64_fcmla<rotsplit2><mode> (operands[0],
> res1,
> > > >> -						 operands[2], operands[1]));
> > > >> +						 operands[1],
> operands[2]));
> > > >
> > > > This doesn't look right.  Going from the documentation, patch 1
> > > > isn't changing the operand order for CMUL: the conjugated operand
> > > > (if there is one) is still operand 2.  The FCMLA sequences use the
> > > > opposite order, where the conjugated operand (if there is one) is
> > operand 1.
> > > > So I think
> > >
> > > I meant “the first multiplication operand” rather than “operand 1” here.
> > >
> > > > the reversal here is still needed.
> > > >
> > > > Same for the multiplication operands in CML* above.
> >
> > I did actually change the order in patch 1, but didn't update the docs..
> > That was done because I followed the SLP order again, but now I've
> > updated them to do what the docs say.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master? and backport along with the first patch?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	PR tree-optimization/102819
> > 	PR tree-optimization/103169
> > 	* config/aarch64/aarch64-simd.md
> > (cml<fcmac1><conj_op><mode>4): Use
> > 	canonical order.
> > 	* config/aarch64/aarch64-sve.md (cml<fcmac1><conj_op><mode>4):
> > Likewise.
> >
> > --- inline copy of patch ---
> >
> > diff --git a/gcc/config/aarch64/aarch64-simd.md
> > b/gcc/config/aarch64/aarch64-simd.md
> > index
> >
> f95a7e1d91c97c9e981d75e71f0b49c02ef748ba..9e41610fba85862ef7675bea1
> > e5731b14cab59ce 100644
> > --- a/gcc/config/aarch64/aarch64-simd.md
> > +++ b/gcc/config/aarch64/aarch64-simd.md
> > @@ -556,17 +556,17 @@ (define_insn
> "aarch64_fcmlaq_lane<rot><mode>"
> >  ;; remainder.  Because of this, expand early.
> >  (define_expand "cml<fcmac1><conj_op><mode>4"
> >    [(set (match_operand:VHSDF 0 "register_operand")
> > -	(plus:VHSDF (match_operand:VHSDF 1 "register_operand")
> > -		    (unspec:VHSDF [(match_operand:VHSDF 2
> > "register_operand")
> > -				   (match_operand:VHSDF 3
> > "register_operand")]
> > -				   FCMLA_OP)))]
> > +	(plus:VHSDF (unspec:VHSDF [(match_operand:VHSDF 1
> > "register_operand")
> > +				   (match_operand:VHSDF 2
> > "register_operand")]
> > +				   FCMLA_OP)
> > +		    (match_operand:VHSDF 3 "register_operand")))]
> >    "TARGET_COMPLEX && !BYTES_BIG_ENDIAN"
> >  {
> >    rtx tmp = gen_reg_rtx (<MODE>mode);
> > -  emit_insn (gen_aarch64_fcmla<rotsplit1><mode> (tmp, operands[1],
> > -						 operands[3], operands[2]));
> > +  emit_insn (gen_aarch64_fcmla<rotsplit1><mode> (tmp, operands[3],
> > +						 operands[2], operands[1]));
> >    emit_insn (gen_aarch64_fcmla<rotsplit2><mode> (operands[0], tmp,
> > -						 operands[3], operands[2]));
> > +						 operands[2], operands[1]));
> >    DONE;
> >  })
> >
> > diff --git a/gcc/config/aarch64/aarch64-sve.md
> > b/gcc/config/aarch64/aarch64-sve.md
> > index
> >
> 9ef968840c20a3049901b3f8a919cf27ded1da3e..9ed19017c480b88779e9e3b08
> > c0e031be60a8c12 100644
> > --- a/gcc/config/aarch64/aarch64-sve.md
> > +++ b/gcc/config/aarch64/aarch64-sve.md
> > @@ -7278,11 +7278,11 @@ (define_expand
> "cml<fcmac1><conj_op><mode>4"
> >    rtx tmp = gen_reg_rtx (<MODE>mode);
> >    emit_insn
> >      (gen_aarch64_pred_fcmla<sve_rot1><mode> (tmp, operands[4],
> > -					     operands[3], operands[2],
> > -					     operands[1], operands[5]));
> > +					     operands[2], operands[1],
> > +					     operands[3], operands[5]));
> >    emit_insn
> >      (gen_aarch64_pred_fcmla<sve_rot2><mode> (operands[0],
> operands[4],
> > -					     operands[3], operands[2],
> > +					     operands[2], operands[1],
> >  					     tmp, operands[5]));
> >    DONE;
> >  })

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [3/3 PATCH][AArch32] use canonical ordering for complex mul, fma and fms
  2021-12-20 16:22   ` Tamar Christina
  2022-01-11  7:10     ` Tamar Christina
@ 2022-02-01  9:56     ` Kyrylo Tkachov
  1 sibling, 0 replies; 18+ messages in thread
From: Kyrylo Tkachov @ 2022-02-01  9:56 UTC (permalink / raw)
  To: Tamar Christina, gcc-patches
  Cc: nd, Ramana Radhakrishnan, Richard Earnshaw, nickc



> -----Original Message-----
> From: Tamar Christina <Tamar.Christina@arm.com>
> Sent: Monday, December 20, 2021 4:22 PM
> To: gcc-patches@gcc.gnu.org
> Cc: nd <nd@arm.com>; Ramana Radhakrishnan
> <Ramana.Radhakrishnan@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; nickc@redhat.com; Kyrylo Tkachov
> <Kyrylo.Tkachov@arm.com>
> Subject: RE: [3/3 PATCH][AArch32] use canonical ordering for complex mul,
> fma and fms
> 
> Updated version of patch following AArch64 review.
> 
> Bootstrapped Regtested on arm-none-linux-gnueabihf and no issues.
> 
> Ok for master? and backport along with the first patch?

Ok, sorry I missed it.
Thanks,
Kyrill

> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	PR tree-optimization/102819
> 	PR tree-optimization/103169
> 	* config/arm/vec-common.md (cml<fcmac1><conj_op><mode>4):
> Use
> 	canonical order.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-
> common.md
> index
> e71d9b3811fde62159f5c21944fef9fe3f97b4bd..eab77ac8decce76d70f5b2594
> f4439e6ed363e6e 100644
> --- a/gcc/config/arm/vec-common.md
> +++ b/gcc/config/arm/vec-common.md
> @@ -265,18 +265,18 @@ (define_expand "arm_vcmla<rot><mode>"
>  ;; remainder.  Because of this, expand early.
>  (define_expand "cml<fcmac1><conj_op><mode>4"
>    [(set (match_operand:VF 0 "register_operand")
> -	(plus:VF (match_operand:VF 1 "register_operand")
> -		 (unspec:VF [(match_operand:VF 2 "register_operand")
> -			     (match_operand:VF 3 "register_operand")]
> -			    VCMLA_OP)))]
> +	(plus:VF (unspec:VF [(match_operand:VF 1 "register_operand")
> +			     (match_operand:VF 2 "register_operand")]
> +			    VCMLA_OP)
> +		 (match_operand:VF 3 "register_operand")))]
>    "(TARGET_COMPLEX || (TARGET_HAVE_MVE &&
> TARGET_HAVE_MVE_FLOAT
>  		      && ARM_HAVE_<MODE>_ARITH)) &&
> !BYTES_BIG_ENDIAN"
>  {
>    rtx tmp = gen_reg_rtx (<MODE>mode);
> -  emit_insn (gen_arm_vcmla<rotsplit1><mode> (tmp, operands[1],
> -					     operands[3], operands[2]));
> +  emit_insn (gen_arm_vcmla<rotsplit1><mode> (tmp, operands[3],
> +					     operands[2], operands[1]));
>    emit_insn (gen_arm_vcmla<rotsplit2><mode> (operands[0], tmp,
> -					     operands[3], operands[2]));
> +					     operands[2], operands[1]));
>    DONE;
>  })
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [2/3 PATCH]AArch64 use canonical ordering for complex mul, fma and fms
  2021-12-20 16:20       ` Tamar Christina
  2022-01-11  7:10         ` Tamar Christina
@ 2022-02-01 11:04         ` Richard Sandiford
  1 sibling, 0 replies; 18+ messages in thread
From: Richard Sandiford @ 2022-02-01 11:04 UTC (permalink / raw)
  To: Tamar Christina
  Cc: gcc-patches, nd, Richard Earnshaw, Marcus Shawcroft, Kyrylo Tkachov

Tamar Christina <Tamar.Christina@arm.com> writes:
>> -----Original Message-----
>> From: Richard Sandiford <richard.sandiford@arm.com>
>> Sent: Friday, December 17, 2021 4:49 PM
>> To: Tamar Christina <Tamar.Christina@arm.com>
>> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Earnshaw
>> <Richard.Earnshaw@arm.com>; Marcus Shawcroft
>> <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
>> Subject: Re: [2/3 PATCH]AArch64 use canonical ordering for complex mul,
>> fma and fms
>> 
>> Richard Sandiford <richard.sandiford@arm.com> writes:
>> > Tamar Christina <tamar.christina@arm.com> writes:
>> >> Hi All,
>> >>
>> >> After the first patch in the series this updates the optabs to expect
>> >> the canonical sequence.
>> >>
>> >> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>> >>
>> >> Ok for master? and backport along with the first patch?
>> >>
>> >> Thanks,
>> >> Tamar
>> >>
>> >> gcc/ChangeLog:
>> >>
>> >> 	PR tree-optimization/102819
>> >> 	PR tree-optimization/103169
>> >> 	* config/aarch64/aarch64-simd.md
>> (cml<fcmac1><conj_op><mode>4,
>> >> 	cmul<conj_op><mode>3): Use canonical order.
>> >> 	* config/aarch64/aarch64-sve.md (cml<fcmac1><conj_op><mode>4,
>> >> 	cmul<conj_op><mode>3): Likewise.
>> >>
>> >> --- inline copy of patch --
>> >> diff --git a/gcc/config/aarch64/aarch64-simd.md
>> >> b/gcc/config/aarch64/aarch64-simd.md
>> >> index
>> >>
>> f95a7e1d91c97c9e981d75e71f0b49c02ef748ba..875896ee71324712c8034eeff9
>> c
>> >> fb5649f9b0e73 100644
>> >> --- a/gcc/config/aarch64/aarch64-simd.md
>> >> +++ b/gcc/config/aarch64/aarch64-simd.md
>> >> @@ -556,17 +556,17 @@ (define_insn
>> "aarch64_fcmlaq_lane<rot><mode>"
>> >>  ;; remainder.  Because of this, expand early.
>> >>  (define_expand "cml<fcmac1><conj_op><mode>4"
>> >>    [(set (match_operand:VHSDF 0 "register_operand")
>> >> -	(plus:VHSDF (match_operand:VHSDF 1 "register_operand")
>> >> -		    (unspec:VHSDF [(match_operand:VHSDF 2
>> "register_operand")
>> >> -				   (match_operand:VHSDF 3
>> "register_operand")]
>> >> -				   FCMLA_OP)))]
>> >> +	(plus:VHSDF (unspec:VHSDF [(match_operand:VHSDF 1
>> "register_operand")
>> >> +				   (match_operand:VHSDF 2
>> "register_operand")]
>> >> +				   FCMLA_OP)
>> >> +		    (match_operand:VHSDF 3 "register_operand")))]
>> >>    "TARGET_COMPLEX && !BYTES_BIG_ENDIAN"
>> >>  {
>> >>    rtx tmp = gen_reg_rtx (<MODE>mode);
>> >> -  emit_insn (gen_aarch64_fcmla<rotsplit1><mode> (tmp, operands[1],
>> >> -						 operands[3], operands[2]));
>> >> +  emit_insn (gen_aarch64_fcmla<rotsplit1><mode> (tmp, operands[3],
>> >> +						 operands[1], operands[2]));
>> >>    emit_insn (gen_aarch64_fcmla<rotsplit2><mode> (operands[0], tmp,
>> >> -						 operands[3], operands[2]));
>> >> +						 operands[1], operands[2]));
>> >>    DONE;
>> >>  })
>> >>
>> >> @@ -583,9 +583,9 @@ (define_expand "cmul<conj_op><mode>3"
>> >>    rtx tmp = force_reg (<MODE>mode, CONST0_RTX (<MODE>mode));
>> >>    rtx res1 = gen_reg_rtx (<MODE>mode);
>> >>    emit_insn (gen_aarch64_fcmla<rotsplit1><mode> (res1, tmp,
>> >> -						 operands[2], operands[1]));
>> >> +						 operands[1], operands[2]));
>> >>    emit_insn (gen_aarch64_fcmla<rotsplit2><mode> (operands[0], res1,
>> >> -						 operands[2], operands[1]));
>> >> +						 operands[1], operands[2]));
>> >
>> > This doesn't look right.  Going from the documentation, patch 1 isn't
>> > changing the operand order for CMUL: the conjugated operand (if there
>> > is one) is still operand 2.  The FCMLA sequences use the opposite
>> > order, where the conjugated operand (if there is one) is operand 1.
>> > So I think
>> 
>> I meant “the first multiplication operand” rather than “operand 1” here.
>> 
>> > the reversal here is still needed.
>> >
>> > Same for the multiplication operands in CML* above.
>
> I did actually change the order in patch 1, but didn't update the docs..
> That was done because I followed the SLP order again, but now I've updated
> them to do what the docs say.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master? and backport along with the first patch?

OK, thanks.

Richard

> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> 	PR tree-optimization/102819
> 	PR tree-optimization/103169
> 	* config/aarch64/aarch64-simd.md (cml<fcmac1><conj_op><mode>4): Use
> 	canonical order.
> 	* config/aarch64/aarch64-sve.md (cml<fcmac1><conj_op><mode>4): Likewise.
>
> --- inline copy of patch ---
>
> diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
> index f95a7e1d91c97c9e981d75e71f0b49c02ef748ba..9e41610fba85862ef7675bea1e5731b14cab59ce 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -556,17 +556,17 @@ (define_insn "aarch64_fcmlaq_lane<rot><mode>"
>  ;; remainder.  Because of this, expand early.
>  (define_expand "cml<fcmac1><conj_op><mode>4"
>    [(set (match_operand:VHSDF 0 "register_operand")
> -	(plus:VHSDF (match_operand:VHSDF 1 "register_operand")
> -		    (unspec:VHSDF [(match_operand:VHSDF 2 "register_operand")
> -				   (match_operand:VHSDF 3 "register_operand")]
> -				   FCMLA_OP)))]
> +	(plus:VHSDF (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand")
> +				   (match_operand:VHSDF 2 "register_operand")]
> +				   FCMLA_OP)
> +		    (match_operand:VHSDF 3 "register_operand")))]
>    "TARGET_COMPLEX && !BYTES_BIG_ENDIAN"
>  {
>    rtx tmp = gen_reg_rtx (<MODE>mode);
> -  emit_insn (gen_aarch64_fcmla<rotsplit1><mode> (tmp, operands[1],
> -						 operands[3], operands[2]));
> +  emit_insn (gen_aarch64_fcmla<rotsplit1><mode> (tmp, operands[3],
> +						 operands[2], operands[1]));
>    emit_insn (gen_aarch64_fcmla<rotsplit2><mode> (operands[0], tmp,
> -						 operands[3], operands[2]));
> +						 operands[2], operands[1]));
>    DONE;
>  })
>  
> diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md
> index 9ef968840c20a3049901b3f8a919cf27ded1da3e..9ed19017c480b88779e9e3b08c0e031be60a8c12 100644
> --- a/gcc/config/aarch64/aarch64-sve.md
> +++ b/gcc/config/aarch64/aarch64-sve.md
> @@ -7278,11 +7278,11 @@ (define_expand "cml<fcmac1><conj_op><mode>4"
>    rtx tmp = gen_reg_rtx (<MODE>mode);
>    emit_insn
>      (gen_aarch64_pred_fcmla<sve_rot1><mode> (tmp, operands[4],
> -					     operands[3], operands[2],
> -					     operands[1], operands[5]));
> +					     operands[2], operands[1],
> +					     operands[3], operands[5]));
>    emit_insn
>      (gen_aarch64_pred_fcmla<sve_rot2><mode> (operands[0], operands[4],
> -					     operands[3], operands[2],
> +					     operands[2], operands[1],
>  					     tmp, operands[5]));
>    DONE;
>  })

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-02-01 11:04 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-17 15:42 [1/3 PATCH]middle-end vect: Simplify and extend the complex numbers validation routines Tamar Christina
2021-12-17 15:42 ` [2/3 PATCH]AArch64 use canonical ordering for complex mul, fma and fms Tamar Christina
2021-12-17 16:24   ` Richard Sandiford
2021-12-17 16:48     ` Richard Sandiford
2021-12-20 16:20       ` Tamar Christina
2022-01-11  7:10         ` Tamar Christina
2022-02-01  9:55           ` Tamar Christina
2022-02-01 11:04         ` Richard Sandiford
2021-12-17 15:43 ` [3/3 PATCH][AArch32] " Tamar Christina
2021-12-20 16:22   ` Tamar Christina
2022-01-11  7:10     ` Tamar Christina
2022-02-01  9:54       ` Tamar Christina
2022-02-01  9:56     ` Kyrylo Tkachov
2021-12-17 16:18 ` [1/3 PATCH]middle-end vect: Simplify and extend the complex numbers validation routines Richard Sandiford
2021-12-20 16:18   ` Tamar Christina
2022-01-10 10:16     ` Tamar Christina
2022-01-10 13:00 ` Richard Biener
2022-01-11  7:31   ` Tamar Christina

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).