From: Tamar Christina <Tamar.Christina@arm.com>
To: Richard Biener <richard.guenther@gmail.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>, nd <nd@arm.com>,
Richard Guenther <rguenther@suse.de>
Subject: RE: [1/3 PATCH]middle-end vect: Simplify and extend the complex numbers validation routines.
Date: Tue, 11 Jan 2022 07:31:19 +0000 [thread overview]
Message-ID: <VI1PR08MB5325F1F961DD15AD36539369FF519@VI1PR08MB5325.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <CAFiYyc3zjSp=WtV9MhckoTSTQRutdgBHJvRfpAW=tiJpqfhUOg@mail.gmail.com>
> -----Original Message-----
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Monday, January 10, 2022 1:00 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: GCC Patches <gcc-patches@gcc.gnu.org>; nd <nd@arm.com>; Richard
> Guenther <rguenther@suse.de>
> Subject: Re: [1/3 PATCH]middle-end vect: Simplify and extend the complex
> numbers validation routines.
>
> On Fri, Dec 17, 2021 at 4:44 PM Tamar Christina via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > Hi All,
> >
> > This patch boosts the analysis for complex mul,fma and fms in order to
> ensure
> > that it doesn't create an incorrect output.
> >
> > Essentially it adds an extra verification to check that the two nodes it's
> going
> > to combine do the same operations on compatible values. The reason it
> needs to
> > do this is that if one computation differs from the other then with the
> current
> > implementation we have no way to deal with it since we have to remove
> the
> > permute.
> >
> > When we can keep the permute around we can probably handle these by
> unrolling.
> >
> > While implementing this since I have to do the traversal anyway I took
> advantage
> > of it by simplifying the code a bit. Previously we would determine whether
> > something is a conjugate and then try to figure out which conjugate it is
> and
> > then try to see if the permutes match what we expect.
> >
> > Now the code that does the traversal will detect this in one go and return
> to us
> > whether the operation is something that can be combined and whether a
> conjugate
> > is present.
> >
> > Secondly because it does this I can now simplify the checking code itself to
> > essentially just try to apply fixed patterns to each operation.
> >
> > The patterns represent the order operations should appear in. For instance
> a
> > complex MUL operation combines :
> >
> > Left 1 + Right 1
> > Left 2 + Right 2
> >
> > with a permute on the nodes consisting of:
> >
> > { Even, Even } + { Odd, Odd }
> > { Even, Odd } + { Odd, Even }
> >
> > By abstracting over these patterns the checking code becomes quite simple.
> >
> > As part of this I was checking the order of the operands which was left in
> > "slp" order. as in, the same order they showed up in during SLP, which
> means
> > that the accumulator is first. However it looks like I didn't document this
> > and the x86 optab was implemented assuming the same order as FMA, i.e.
> that
> > the accumulator is last.
> >
> > I have this changed the order to match that of FMA and FMS which corrects
> the
> > x86 codegen and will update the Arm targets. This has now also been
> > documented.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > x86_64-pc-linux-gnu and no regressions.
> >
> > Ok for master? and backport to GCC 11 after some stew?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > PR tree-optimization/102819
> > PR tree-optimization/103169
> > * doc/md.texi: Update docs for cfms, cfma.
> > * tree-data-ref.h (same_data_refs): Accept optional offset.
> > * tree-vect-slp-patterns.c (is_linear_load_p): Fix issue with repeating
> > patterns.
> > (vect_normalize_conj_loc): Remove.
> > (is_eq_or_top): Change to take two nodes.
> > (enum _conj_status, compatible_complex_nodes_p,
> > vect_validate_multiplication): New.
> > (class complex_add_pattern, complex_add_pattern::matches,
> > complex_add_pattern::recognize, class complex_mul_pattern,
> > complex_mul_pattern::recognize, class complex_fms_pattern,
> > complex_fms_pattern::recognize, class complex_operations_pattern,
> > complex_operations_pattern::recognize, addsub_pattern::recognize):
> Pass
> > new cache.
> > (complex_fms_pattern::matches, complex_mul_pattern::matches):
> Pass new
> > cache and use new validation code.
> > * tree-vect-slp.c (vect_match_slp_patterns_2,
> vect_match_slp_patterns,
> > vect_analyze_slp): Pass along cache.
> > (compatible_calls_p): Expose.
> > * tree-vectorizer.h (compatible_calls_p, slp_node_hash,
> > slp_compat_nodes_map_t): New.
> > (class vect_pattern): Update signatures include new cache.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR tree-optimization/102819
> > PR tree-optimization/103169
> > * g++.dg/vect/pr99149.cc: xfail for now.
> > * gcc.dg/vect/complex/pr102819-1.c: New test.
> > * gcc.dg/vect/complex/pr102819-2.c: New test.
> > * gcc.dg/vect/complex/pr102819-3.c: New test.
> > * gcc.dg/vect/complex/pr102819-4.c: New test.
> > * gcc.dg/vect/complex/pr102819-5.c: New test.
> > * gcc.dg/vect/complex/pr102819-6.c: New test.
> > * gcc.dg/vect/complex/pr102819-7.c: New test.
> > * gcc.dg/vect/complex/pr102819-8.c: New test.
> > * gcc.dg/vect/complex/pr102819-9.c: New test.
> > * gcc.dg/vect/complex/pr103169.c: New test.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > index
> 9ec051e94e10cca9eec2773e1b8c01b74b6ea4db..60dc5b3ea6087c2824ad1467
> bc66e9cfebe9dcfc 100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -6325,12 +6325,12 @@ Perform a vector multiply and accumulate that
> is semantically the same as
> > a multiply and accumulate of complex numbers.
> >
> > @smallexample
> > - complex TYPE c[N];
> > - complex TYPE a[N];
> > - complex TYPE b[N];
> > + complex TYPE op0[N];
> > + complex TYPE op1[N];
> > + complex TYPE op2[N];
> > for (int i = 0; i < N; i += 1)
> > @{
> > - c[i] += a[i] * b[i];
> > + op2[i] += op1[i] * op2[i];
> > @}
> > @end smallexample
> >
> > @@ -6348,12 +6348,12 @@ the same as a multiply and accumulate of
> complex numbers where the second
> > multiply arguments is conjugated.
> >
> > @smallexample
> > - complex TYPE c[N];
> > - complex TYPE a[N];
> > - complex TYPE b[N];
> > + complex TYPE op0[N];
> > + complex TYPE op1[N];
> > + complex TYPE op2[N];
> > for (int i = 0; i < N; i += 1)
> > @{
> > - c[i] += a[i] * conj (b[i]);
> > + op2[i] += op0[i] * conj (op1[i]);
> > @}
> > @end smallexample
> >
> > @@ -6370,12 +6370,12 @@ Perform a vector multiply and subtract that is
> semantically the same as
> > a multiply and subtract of complex numbers.
> >
> > @smallexample
> > - complex TYPE c[N];
> > - complex TYPE a[N];
> > - complex TYPE b[N];
> > + complex TYPE op0[N];
> > + complex TYPE op1[N];
> > + complex TYPE op2[N];
> > for (int i = 0; i < N; i += 1)
> > @{
> > - c[i] -= a[i] * b[i];
> > + op2[i] -= op0[i] * op1[i];
> > @}
> > @end smallexample
> >
> > @@ -6393,12 +6393,12 @@ the same as a multiply and subtract of complex
> numbers where the second
> > multiply arguments is conjugated.
> >
> > @smallexample
> > - complex TYPE c[N];
> > - complex TYPE a[N];
> > - complex TYPE b[N];
> > + complex TYPE op0[N];
> > + complex TYPE op1[N];
> > + complex TYPE op2[N];
> > for (int i = 0; i < N; i += 1)
> > @{
> > - c[i] -= a[i] * conj (b[i]);
> > + op2[i] -= op0[i] * conj (op1[i]);
> > @}
> > @end smallexample
> >
> > @@ -6415,12 +6415,12 @@ Perform a vector multiply that is semantically
> the same as multiply of
> > complex numbers.
> >
> > @smallexample
> > - complex TYPE c[N];
> > - complex TYPE a[N];
> > - complex TYPE b[N];
> > + complex TYPE op0[N];
> > + complex TYPE op1[N];
> > + complex TYPE op2[N];
> > for (int i = 0; i < N; i += 1)
> > @{
> > - c[i] = a[i] * b[i];
> > + op2[i] = op0[i] * op1[i];
> > @}
> > @end smallexample
> >
> > @@ -6437,12 +6437,12 @@ Perform a vector multiply by conjugate that is
> semantically the same as a
> > multiply of complex numbers where the second multiply arguments is
> conjugated.
> >
> > @smallexample
> > - complex TYPE c[N];
> > - complex TYPE a[N];
> > - complex TYPE b[N];
> > + complex TYPE op0[N];
> > + complex TYPE op1[N];
> > + complex TYPE op2[N];
> > for (int i = 0; i < N; i += 1)
> > @{
> > - c[i] = a[i] * conj (b[i]);
> > + op2[i] = op0[i] * conj (op1[i]);
> > @}
> > @end smallexample
> >
> > diff --git a/gcc/testsuite/g++.dg/vect/pr99149.cc
> b/gcc/testsuite/g++.dg/vect/pr99149.cc
> > index
> e6e0594a336fa053ffba64a12e2de43a4e373f49..bb9f5fa89f12b184368bf5488d
> 6e9432c2166463 100755
> > --- a/gcc/testsuite/g++.dg/vect/pr99149.cc
> > +++ b/gcc/testsuite/g++.dg/vect/pr99149.cc
> > @@ -24,4 +24,4 @@ public:
> > } n;
> > main() { n.j(); }
> >
> > -/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_MUL" 1 "slp2" } } */
> > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_MUL" 1 "slp2"
> { xfail { vect_float } } } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-1.c
> b/gcc/testsuite/gcc.dg/vect/complex/pr102819-1.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..46b9a55f05279d732fa1418e02
> f779cf693ede07
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-1.c
> > @@ -0,0 +1,20 @@
> > +/* { dg-do compile } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +
> > +float f[12][100];
> > +
> > +void bad1(float v1, float v2)
> > +{
> > + for (int r = 0; r < 100; r += 4)
> > + {
> > + int i = r + 1;
> > + f[0][r] = f[1][r] * (f[2][r] + v2) - f[1][i] * (f[2][i] + v1);
> > + f[0][i] = f[1][r] * (f[2][i] + v1) + f[1][i] * (f[2][r] + v2);
> > + f[0][r+2] = f[1][r+2] * (f[2][r+2] + v2) - f[1][i+2] * (f[2][i+2] + v1);
> > + f[0][i+2] = f[1][r+2] * (f[2][i+2] + v1) + f[1][i+2] * (f[2][r+2] + v2);
> > + // ^^^^^^^ ^^^^^^^
> > + }
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target
> { vect_float } } } } */
> > +
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-2.c
> b/gcc/testsuite/gcc.dg/vect/complex/pr102819-2.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..ffe646efe57f7ad07541b0fb96
> 601596f46dc5f8
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-2.c
> > @@ -0,0 +1,17 @@
> > +/* { dg-do compile } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +
> > +float f[12][100];
> > +
> > +void bad1(float v1, float v2)
> > +{
> > + for (int r = 0; r < 100; r += 2)
> > + {
> > + int i = r + 1;
> > + f[0][r] = f[1][r] * (f[2][r] + v1) - f[1][i] * (f[2][i] + v2);
> > + f[0][i] = f[1][r] * (f[2][i] + v1) + f[1][i] * (f[2][r] + v2);
> > + }
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { target
> { vect_float } } } } */
> > +
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-3.c
> b/gcc/testsuite/gcc.dg/vect/complex/pr102819-3.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..5f98aa204d8b11b0cb433f8965
> dbb72cf8940de1
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-3.c
> > @@ -0,0 +1,17 @@
> > +/* { dg-do compile } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +
> > +float f[12][100];
> > +
> > +void good1(float v1, float v2)
> > +{
> > + for (int r = 0; r < 100; r += 2)
> > + {
> > + int i = r + 1;
> > + f[0][r] = f[1][r] * (f[2][r] + v2) - f[1][i] * (f[2][i] + v1);
> > + f[0][i] = f[1][r] * (f[2][i] + v1) + f[1][i] * (f[2][r] + v2);
> > + }
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target
> { vect_float } } } } */
> > +
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-4.c
> b/gcc/testsuite/gcc.dg/vect/complex/pr102819-4.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..882851789c5085e73400060911
> 4be480d3b08bd0
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-4.c
> > @@ -0,0 +1,17 @@
> > +/* { dg-do compile } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +
> > +float f[12][100];
> > +
> > +void good1()
> > +{
> > + for (int r = 0; r < 100; r += 2)
> > + {
> > + int i = r + 1;
> > + f[0][r] = f[1][r] * f[2][r] - f[1][i] * f[2][i];
> > + f[0][i] = f[1][r] * f[2][i] + f[1][i] * f[2][r];
> > + }
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target
> { vect_float } } } } */
> > +
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-5.c
> b/gcc/testsuite/gcc.dg/vect/complex/pr102819-5.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..6a2d549d65f3f27d407fb0bd46
> 9473e6a5c333ae
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-5.c
> > @@ -0,0 +1,17 @@
> > +/* { dg-do compile } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +
> > +float f[12][100];
> > +
> > +void good2()
> > +{
> > + for (int r = 0; r < 100; r += 2)
> > + {
> > + int i = r + 1;
> > + f[0][r] = f[1][r] * (f[2][r] + 1) - f[1][i] * (f[2][i] + 1);
> > + f[0][i] = f[1][r] * (f[2][i] + 1) + f[1][i] * (f[2][r] + 1);
> > + }
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "Found COMPLEX_MUL" "vect" { target
> { vect_float } } } } */
> > +
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-6.c
> b/gcc/testsuite/gcc.dg/vect/complex/pr102819-6.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..71e66dbe3b29eec1fffb8df9b
> 216022fdc0af54e
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-6.c
> > @@ -0,0 +1,18 @@
> > +/* { dg-do compile } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +
> > +float f[12][100];
> > +
> > +void bad1()
> > +{
> > + for (int r = 0; r < 100; r += 2)
> > + {
> > + int i = r + 1;
> > + f[0][r] = f[1][r] * f[2][r] - f[1][i] * f[3][i];
> > + f[0][i] = f[1][r] * f[2][i] + f[1][i] * f[3][r];
> > + // ^^^^^^^ ^^^^^^^
> > + }
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { target
> { vect_float } } } } */
> > +
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-7.c
> b/gcc/testsuite/gcc.dg/vect/complex/pr102819-7.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..536672f3c8bb474ad5fa4bb61
> b3a36b555acf3cf
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-7.c
> > @@ -0,0 +1,18 @@
> > +/* { dg-do compile } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +
> > +float f[12][100];
> > +
> > +void bad2()
> > +{
> > + for (int r = 0; r < 100; r += 2)
> > + {
> > + int i = r + 1;
> > + f[0][r] = f[1][r] * (f[2][r] + 1) - f[1][i] * f[2][i];
> > + f[0][i] = f[1][r] * (f[2][i] + 1) + f[1][i] * f[2][r];
> > + // ^^^^
> > + }
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { target
> { vect_float } } } } */
> > +
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-8.c
> b/gcc/testsuite/gcc.dg/vect/complex/pr102819-8.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..07b48148688b7d530e5891d02
> 3d558b58a485c23
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-8.c
> > @@ -0,0 +1,18 @@
> > +/* { dg-do compile } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +
> > +float f[12][100];
> > +
> > +void bad3()
> > +{
> > + for (int r = 0; r < 100; r += 2)
> > + {
> > + int i = r + 1;
> > + f[0][r] = f[1][r] * f[2][r] - f[1][r] * f[2][i];
> > + f[0][i] = f[1][r] * f[2][i] + f[1][i] * f[2][r];
> > + // ^^^^^^^
> > + }
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-not "Found COMPLEX_MUL" "vect" { target
> { vect_float } } } } */
> > +
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr102819-9.c
> b/gcc/testsuite/gcc.dg/vect/complex/pr102819-9.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..7655852434b21b381fe7ee316
> e8caf3d485b8ee1
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr102819-9.c
> > @@ -0,0 +1,21 @@
> > +/* { dg-do compile } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +
> > +#include <stdio.h>
> > +#include <complex.h>
> > +
> > +#define N 200
> > +#define TYPE float
> > +#define TYPE2 float
> > +
> > +void g (TYPE2 complex a[restrict N], TYPE complex b[restrict N], TYPE
> complex c[restrict N])
> > +{
> > + for (int i=0; i < N; i++)
> > + {
> > + c[i] -= a[i] * b[0];
> > + }
> > +}
> > +
> > +/* The pattern overlaps with COMPLEX_ADD so we need to support
> consuming ADDs in COMPLEX_FMS. */
> > +
> > +/* { dg-final { scan-tree-dump "Found COMPLEX_FMS" "vect" { xfail
> { vect_float } } } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/complex/pr103169.c
> b/gcc/testsuite/gcc.dg/vect/complex/pr103169.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..1bfabbd85a0eedfb4156a8257
> 4324126e9083fc5
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/complex/pr103169.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile { target { vect_double } } } */
> > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > +/* { dg-additional-options "-O2 -fvect-cost-model=unlimited" } */
> > +
> > +_Complex double b_0, c_0;
> > +
> > +void
> > +mul270snd (void)
> > +{
> > + c_0 = b_0 * 1.0iF * 1.0iF;
> > +}
> > +
> > diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h
> > index
> 74f579c9f3f23bac25d21546068c2ab43209aa2b..8ad5fa521279b20fa5e63eecf44
> 2d5dc5c16e7ee 100644
> > --- a/gcc/tree-data-ref.h
> > +++ b/gcc/tree-data-ref.h
> > @@ -600,10 +600,11 @@ same_data_refs_base_objects
> (data_reference_p a, data_reference_p b)
> > }
> >
> > /* Return true when the data references A and B are accessing the same
> > - memory object with the same access functions. */
> > + memory object with the same access functions. Optionally skip the
> > + last OFFSET dimensions in the data reference. */
>
> But you skip the _first_ dimensions?
That's because the dimensions seem to be laid out in reverse order, i.e.
float f[12][200] with an access as f[1][r] gets a DR as:
>>> p debug (dr1)
#(Data Ref:
# bb: 3
# stmt: _1 = f[1][r_20];
# ref: f[1][r_20];
# base_object: f;
# Access function 0: {0, +, 2}_1
# Access function 1: 1
#)
So index 0 has the outer most dimension.
Cheers,
Tamar
>
> Otherwise looks OK to me.
>
> Thanks,
> Richard.
>
> > static inline bool
> > -same_data_refs (data_reference_p a, data_reference_p b)
> > +same_data_refs (data_reference_p a, data_reference_p b, int offset = 0)
> > {
> > unsigned int i;
> >
> > @@ -614,7 +615,7 @@ same_data_refs (data_reference_p a,
> data_reference_p b)
> > if (!same_data_refs_base_objects (a, b))
> > return false;
> >
> > - for (i = 0; i < DR_NUM_DIMENSIONS (a); i++)
> > + for (i = offset; i < DR_NUM_DIMENSIONS (a); i++)
> > if (!eq_evolutions_p (DR_ACCESS_FN (a, i), DR_ACCESS_FN (b, i)))
> > return false;
> >
> > diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
> > index
> 0350441fad9690cd5d04337171ca3470a064a571..f8da4153632a700680091f3730
> 5a5d3078fbb0c5 100644
> > --- a/gcc/tree-vect-slp-patterns.c
> > +++ b/gcc/tree-vect-slp-patterns.c
> > @@ -149,12 +149,13 @@ is_linear_load_p (load_permutation_t loads)
> > int valid_patterns = 4;
> > FOR_EACH_VEC_ELT (loads, i, load)
> > {
> > - if (candidates[0] != PERM_UNKNOWN && load != 1)
> > + unsigned adj_load = load % 2;
> > + if (candidates[0] != PERM_UNKNOWN && adj_load != 1)
> > {
> > candidates[0] = PERM_UNKNOWN;
> > valid_patterns--;
> > }
> > - if (candidates[1] != PERM_UNKNOWN && load != 0)
> > + if (candidates[1] != PERM_UNKNOWN && adj_load != 0)
> > {
> > candidates[1] = PERM_UNKNOWN;
> > valid_patterns--;
> > @@ -596,11 +597,12 @@ class complex_add_pattern : public
> complex_pattern
> > public:
> > void build (vec_info *);
> > static internal_fn
> > - matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
> slp_tree *,
> > - vec<slp_tree> *);
> > + matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
> > + slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *);
> >
> > static vect_pattern*
> > - recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
> > + recognize (slp_tree_to_load_perm_map_t *,
> slp_compat_nodes_map_t *,
> > + slp_tree *);
> >
> > static vect_pattern*
> > mkInstance (slp_tree *node, vec<slp_tree> *m_ops, internal_fn ifn)
> > @@ -647,6 +649,7 @@ complex_add_pattern::build (vec_info *vinfo)
> > internal_fn
> > complex_add_pattern::matches (complex_operation_t op,
> > slp_tree_to_load_perm_map_t *perm_cache,
> > + slp_compat_nodes_map_t * /* compat_cache */,
> > slp_tree *node, vec<slp_tree> *ops)
> > {
> > internal_fn ifn = IFN_LAST;
> > @@ -692,13 +695,14 @@ complex_add_pattern::matches
> (complex_operation_t op,
> >
> > vect_pattern*
> > complex_add_pattern::recognize (slp_tree_to_load_perm_map_t
> *perm_cache,
> > + slp_compat_nodes_map_t *compat_cache,
> > slp_tree *node)
> > {
> > auto_vec<slp_tree> ops;
> > complex_operation_t op
> > = vect_detect_pair_op (*node, true, &ops);
> > internal_fn ifn
> > - = complex_add_pattern::matches (op, perm_cache, node, &ops);
> > + = complex_add_pattern::matches (op, perm_cache, compat_cache,
> node, &ops);
> > if (ifn == IFN_LAST)
> > return NULL;
> >
> > @@ -709,147 +713,214 @@ complex_add_pattern::recognize
> (slp_tree_to_load_perm_map_t *perm_cache,
> > * complex_mul_pattern
> >
> **********************************************************
> ********************/
> >
> > -/* Check to see if either of the trees in ARGS are a NEGATE_EXPR. If the
> first
> > - child (args[0]) is a NEGATE_EXPR then NEG_FIRST_P is set to TRUE.
> > -
> > - If a negate is found then the values in ARGS are reordered such that the
> > - negate node is always the second one and the entry is replaced by the
> child
> > - of the negate node. */
> > +/* Helper function to check if PERM is KIND or PERM_TOP. */
> >
> > static inline bool
> > -vect_normalize_conj_loc (vec<slp_tree> &args, bool *neg_first_p = NULL)
> > +is_eq_or_top (slp_tree_to_load_perm_map_t *perm_cache,
> > + slp_tree op1, complex_perm_kinds_t kind1,
> > + slp_tree op2, complex_perm_kinds_t kind2)
> > {
> > - gcc_assert (args.length () == 2);
> > - bool neg_found = false;
> > -
> > - if (vect_match_expression_p (args[0], NEGATE_EXPR))
> > - {
> > - std::swap (args[0], args[1]);
> > - neg_found = true;
> > - if (neg_first_p)
> > - *neg_first_p = true;
> > - }
> > - else if (vect_match_expression_p (args[1], NEGATE_EXPR))
> > - {
> > - neg_found = true;
> > - if (neg_first_p)
> > - *neg_first_p = false;
> > - }
> > + complex_perm_kinds_t perm1 = linear_loads_p (perm_cache, op1);
> > + if (perm1 != kind1 && perm1 != PERM_TOP)
> > + return false;
> >
> > - if (neg_found)
> > - args[1] = SLP_TREE_CHILDREN (args[1])[0];
> > + complex_perm_kinds_t perm2 = linear_loads_p (perm_cache, op2);
> > + if (perm2 != kind2 && perm2 != PERM_TOP)
> > + return false;
> >
> > - return neg_found;
> > + return true;
> > }
> >
> > -/* Helper function to check if PERM is KIND or PERM_TOP. */
> > +enum _conj_status { CONJ_NONE, CONJ_FST, CONJ_SND };
> >
> > static inline bool
> > -is_eq_or_top (complex_perm_kinds_t perm, complex_perm_kinds_t
> kind)
> > +compatible_complex_nodes_p (slp_compat_nodes_map_t
> *compat_cache,
> > + slp_tree a, int *pa, slp_tree b, int *pb)
> > {
> > - return perm == kind || perm == PERM_TOP;
> > -}
> > + bool *tmp;
> > + std::pair<slp_tree, slp_tree> key = std::make_pair(a, b);
> > + if ((tmp = compat_cache->get (key)) != NULL)
> > + return *tmp;
> >
> > -/* Helper function that checks to see if LEFT_OP and RIGHT_OP are both
> MULT_EXPR
> > - nodes but also that they represent an operation that is either a complex
> > - multiplication or a complex multiplication by conjugated value.
> > + compat_cache->put (key, false);
> >
> > - Of the negation is expected to be in the first half of the tree (As required
> > - by an FMS pattern) then NEG_FIRST is true. If the operation is a
> conjugate
> > - operation then CONJ_FIRST_OPERAND is set to indicate whether the
> first or
> > - second operand contains the conjugate operation. */
> > + if (SLP_TREE_CHILDREN (a).length () != SLP_TREE_CHILDREN (b).length ())
> > + return false;
> >
> > -static inline bool
> > -vect_validate_multiplication (slp_tree_to_load_perm_map_t
> *perm_cache,
> > - const vec<slp_tree> &left_op,
> > - const vec<slp_tree> &right_op,
> > - bool neg_first, bool *conj_first_operand,
> > - bool fms)
> > -{
> > - /* The presence of a negation indicates that we have either a conjugate
> or a
> > - rotation. We need to distinguish which one. */
> > - *conj_first_operand = false;
> > - complex_perm_kinds_t kind;
> > -
> > - /* Complex conjugates have the negation on the imaginary part of the
> > - number where rotations affect the real component. So check if the
> > - negation is on a dup of lane 1. */
> > - if (fms)
> > + if (SLP_TREE_DEF_TYPE (a) != SLP_TREE_DEF_TYPE (b))
> > + return false;
> > +
> > + /* Only internal nodes can be loads, as such we can't check further if
> they
> > + are externals. */
> > + if (SLP_TREE_DEF_TYPE (a) != vect_internal_def)
> > {
> > - /* Canonicalization for fms is not consistent. So have to test both
> > - variants to be sure. This needs to be fixed in the mid-end so
> > - this part can be simpler. */
> > - kind = linear_loads_p (perm_cache, right_op[0]);
> > - if (!((is_eq_or_top (linear_loads_p (perm_cache, right_op[0]),
> PERM_ODDODD)
> > - && is_eq_or_top (linear_loads_p (perm_cache, right_op[1]),
> > - PERM_ODDEVEN))
> > - || (kind == PERM_ODDEVEN
> > - && is_eq_or_top (linear_loads_p (perm_cache, right_op[1]),
> > - PERM_ODDODD))))
> > - return false;
> > + for (unsigned i = 0; i < SLP_TREE_SCALAR_OPS (a).length (); i++)
> > + {
> > + tree op1 = SLP_TREE_SCALAR_OPS (a)[pa[i % 2]];
> > + tree op2 = SLP_TREE_SCALAR_OPS (b)[pb[i % 2]];
> > + if (!operand_equal_p (op1, op2, 0))
> > + return false;
> > + }
> > +
> > + compat_cache->put (key, true);
> > + return true;
> > + }
> > +
> > + auto a_stmt = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE (a));
> > + auto b_stmt = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE (b));
> > +
> > + if (gimple_code (a_stmt) != gimple_code (b_stmt))
> > + return false;
> > +
> > + /* code, children, type, externals, loads, constants */
> > + if (gimple_num_args (a_stmt) != gimple_num_args (b_stmt))
> > + return false;
> > +
> > + /* At this point, a and b are known to be the same gimple operations. */
> > + if (is_gimple_call (a_stmt))
> > + {
> > + if (!compatible_calls_p (dyn_cast <gcall *> (a_stmt),
> > + dyn_cast <gcall *> (b_stmt)))
> > + return false;
> > }
> > + else if (!is_gimple_assign (a_stmt))
> > + return false;
> > else
> > {
> > - if (linear_loads_p (perm_cache, right_op[1]) != PERM_ODDODD
> > - && !is_eq_or_top (linear_loads_p (perm_cache, right_op[0]),
> > - PERM_ODDEVEN))
> > + tree_code acode = gimple_assign_rhs_code (a_stmt);
> > + tree_code bcode = gimple_assign_rhs_code (b_stmt);
> > + if ((acode == REALPART_EXPR || acode == IMAGPART_EXPR)
> > + && (bcode == REALPART_EXPR || bcode == IMAGPART_EXPR))
> > + return true;
> > +
> > + if (acode != bcode)
> > return false;
> > }
> >
> > - /* Deal with differences in indexes. */
> > - int index1 = fms ? 1 : 0;
> > - int index2 = fms ? 0 : 1;
> > -
> > - /* Check if the conjugate is on the second first or second operand. The
> > - order of the node with the conjugate value determines this, and the
> dup
> > - node must be one of lane 0 of the same DR as the neg node. */
> > - kind = linear_loads_p (perm_cache, left_op[index1]);
> > - if (kind == PERM_TOP)
> > + if (!SLP_TREE_LOAD_PERMUTATION (a).exists ()
> > + || !SLP_TREE_LOAD_PERMUTATION (b).exists ())
> > {
> > - if (linear_loads_p (perm_cache, left_op[index2]) == PERM_EVENODD)
> > - return true;
> > + for (unsigned i = 0; i < gimple_num_args (a_stmt); i++)
> > + {
> > + tree t1 = gimple_arg (a_stmt, i);
> > + tree t2 = gimple_arg (b_stmt, i);
> > + if (TREE_CODE (t1) != TREE_CODE (t2))
> > + return false;
> > +
> > + /* If SSA name then we will need to inspect the children
> > + so we can punt here. */
> > + if (TREE_CODE (t1) == SSA_NAME)
> > + continue;
> > +
> > + if (!operand_equal_p (t1, t2, 0))
> > + return false;
> > + }
> > }
> > - else if (kind == PERM_EVENODD && !neg_first)
> > + else
> > {
> > - if ((kind = linear_loads_p (perm_cache, left_op[index2])) !=
> PERM_EVENEVEN)
> > + auto dr1 = STMT_VINFO_DATA_REF (SLP_TREE_REPRESENTATIVE (a));
> > + auto dr2 = STMT_VINFO_DATA_REF (SLP_TREE_REPRESENTATIVE (b));
> > + /* Don't check the last dimension as that's checked by the lineary
> > + checks. This check is also much stricter than what we need
> > + because it doesn't consider loading from adjacent elements
> > + in the same struct as loading from the same base object.
> > + But for now, I'll play it safe. */
> > + if (!same_data_refs (dr1, dr2, 1))
> > return false;
> > - return true;
> > }
> > - else if (kind == PERM_EVENEVEN && neg_first)
> > +
> > + for (unsigned i = 0; i < SLP_TREE_CHILDREN (a).length (); i++)
> > {
> > - if ((kind = linear_loads_p (perm_cache, left_op[index2])) !=
> PERM_EVENODD)
> > + if (!compatible_complex_nodes_p (compat_cache,
> > + SLP_TREE_CHILDREN (a)[i], pa,
> > + SLP_TREE_CHILDREN (b)[i], pb))
> > return false;
> > -
> > - *conj_first_operand = true;
> > - return true;
> > }
> > - else
> > - return false;
> > -
> > - if (kind != PERM_EVENEVEN)
> > - return false;
> >
> > + compat_cache->put (key, true);
> > return true;
> > }
> >
> > -/* Helper function to help distinguish between a conjugate and a rotation
> in a
> > - complex multiplication. The operations have similar shapes but the order
> of
> > - the load permutes are different. This function returns TRUE when the
> order
> > - is consistent with a multiplication or multiplication by conjugated
> > - operand but returns FALSE if it's a multiplication by rotated operand. */
> > -
> > static inline bool
> > vect_validate_multiplication (slp_tree_to_load_perm_map_t
> *perm_cache,
> > - const vec<slp_tree> &op,
> > - complex_perm_kinds_t permKind)
> > + slp_compat_nodes_map_t *compat_cache,
> > + vec<slp_tree> &left_op,
> > + vec<slp_tree> &right_op,
> > + bool subtract,
> > + enum _conj_status *_status)
> > {
> > - /* The left node is the more common case, test it first. */
> > - if (!is_eq_or_top (linear_loads_p (perm_cache, op[0]), permKind))
> > + auto_vec<slp_tree> ops;
> > + enum _conj_status stats = CONJ_NONE;
> > +
> > + /* The complex operations can occur in two layouts and two permute
> sequences
> > + so declare them and re-use them. */
> > + int styles[][4] = { { 0, 2, 1, 3} /* {L1, R1} + {L2, R2}. */
> > + , { 0, 3, 1, 2} /* {L1, R2} + {L2, R1}. */
> > + };
> > +
> > + /* Now for the corresponding permutes that go with these values. */
> > + complex_perm_kinds_t perms[][4]
> > + = { { PERM_EVENEVEN, PERM_ODDODD, PERM_EVENODD,
> PERM_ODDEVEN }
> > + , { PERM_EVENODD, PERM_ODDEVEN, PERM_EVENEVEN,
> PERM_ODDODD }
> > + };
> > +
> > + /* These permutes are used during comparisons of externals on which
> > + we require strict equality. */
> > + int cq[][4][2]
> > + = { { { 0, 0 }, { 1, 1 }, { 0, 1 }, { 1, 0 } }
> > + , { { 0, 1 }, { 1, 0 }, { 0, 0 }, { 1, 1 } }
> > + };
> > +
> > + /* Default to style and perm 0, most operations use this one. */
> > + int style = 0;
> > + int perm = subtract ? 1 : 0;
> > +
> > + /* Check if we have a negate operation, if so absorb the node and
> continue
> > + looking. */
> > + bool neg0 = vect_match_expression_p (right_op[0], NEGATE_EXPR);
> > + bool neg1 = vect_match_expression_p (right_op[1], NEGATE_EXPR);
> > +
> > + /* Determine which style we're looking at. We only have different ones
> > + whenever a conjugate is involved. */
> > + if (neg0 && neg1)
> > + ;
> > + else if (neg0)
> > {
> > - if (!is_eq_or_top (linear_loads_p (perm_cache, op[1]), permKind))
> > - return false;
> > + right_op[0] = SLP_TREE_CHILDREN (right_op[0])[0];
> > + stats = CONJ_FST;
> > + if (subtract)
> > + perm = 0;
> > }
> > - return true;
> > + else if (neg1)
> > + {
> > + right_op[1] = SLP_TREE_CHILDREN (right_op[1])[0];
> > + stats = CONJ_SND;
> > + perm = 1;
> > + }
> > +
> > + *_status = stats;
> > +
> > + /* Flatten the inputs after we've remapped them. */
> > + ops.create (4);
> > + ops.safe_splice (left_op);
> > + ops.safe_splice (right_op);
> > +
> > + /* Extract out the elements to check. */
> > + slp_tree op0 = ops[styles[style][0]];
> > + slp_tree op1 = ops[styles[style][1]];
> > + slp_tree op2 = ops[styles[style][2]];
> > + slp_tree op3 = ops[styles[style][3]];
> > +
> > + /* Do cheapest test first. If failed no need to analyze further. */
> > + if (linear_loads_p (perm_cache, op0) != perms[perm][0]
> > + || linear_loads_p (perm_cache, op1) != perms[perm][1]
> > + || !is_eq_or_top (perm_cache, op2, perms[perm][2], op3,
> perms[perm][3]))
> > + return false;
> > +
> > + return compatible_complex_nodes_p (compat_cache, op0, cq[perm][0],
> op1,
> > + cq[perm][1])
> > + && compatible_complex_nodes_p (compat_cache, op2, cq[perm][2],
> op3,
> > + cq[perm][3]);
> > }
> >
> > /* This function combines two nodes containing only even and only odd
> lanes
> > @@ -908,11 +979,12 @@ class complex_mul_pattern : public
> complex_pattern
> > public:
> > void build (vec_info *);
> > static internal_fn
> > - matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
> slp_tree *,
> > - vec<slp_tree> *);
> > + matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
> > + slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *);
> >
> > static vect_pattern*
> > - recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
> > + recognize (slp_tree_to_load_perm_map_t *,
> slp_compat_nodes_map_t *,
> > + slp_tree *);
> >
> > static vect_pattern*
> > mkInstance (slp_tree *node, vec<slp_tree> *m_ops, internal_fn ifn)
> > @@ -943,6 +1015,7 @@ class complex_mul_pattern : public
> complex_pattern
> > internal_fn
> > complex_mul_pattern::matches (complex_operation_t op,
> > slp_tree_to_load_perm_map_t *perm_cache,
> > + slp_compat_nodes_map_t *compat_cache,
> > slp_tree *node, vec<slp_tree> *ops)
> > {
> > internal_fn ifn = IFN_LAST;
> > @@ -990,17 +1063,13 @@ complex_mul_pattern::matches
> (complex_operation_t op,
> > || linear_loads_p (perm_cache, left_op[1]) == PERM_ODDEVEN)
> > return IFN_LAST;
> >
> > - bool neg_first = false;
> > - bool conj_first_operand = false;
> > - bool is_neg = vect_normalize_conj_loc (right_op, &neg_first);
> > + enum _conj_status status;
> > + if (!vect_validate_multiplication (perm_cache, compat_cache, left_op,
> > + right_op, false, &status))
> > + return IFN_LAST;
> >
> > - if (!is_neg)
> > + if (status == CONJ_NONE)
> > {
> > - /* A multiplication needs to multiply agains the real pair, otherwise
> > - the pattern matches that of FMS. */
> > - if (!vect_validate_multiplication (perm_cache, left_op,
> PERM_EVENEVEN)
> > - || vect_normalize_conj_loc (left_op))
> > - return IFN_LAST;
> > if (add0)
> > ifn = IFN_COMPLEX_FMA;
> > else
> > @@ -1008,11 +1077,6 @@ complex_mul_pattern::matches
> (complex_operation_t op,
> > }
> > else
> > {
> > - if (!vect_validate_multiplication (perm_cache, left_op, right_op,
> > - neg_first, &conj_first_operand,
> > - false))
> > - return IFN_LAST;
> > -
> > if(add0)
> > ifn = IFN_COMPLEX_FMA_CONJ;
> > else
> > @@ -1029,19 +1093,13 @@ complex_mul_pattern::matches
> (complex_operation_t op,
> > ops->quick_push (add0);
> >
> > complex_perm_kinds_t kind = linear_loads_p (perm_cache, left_op[0]);
> > - if (kind == PERM_EVENODD)
> > - {
> > - ops->quick_push (left_op[1]);
> > - ops->quick_push (right_op[1]);
> > - ops->quick_push (left_op[0]);
> > - }
> > - else if (kind == PERM_TOP)
> > + if (kind == PERM_EVENODD || kind == PERM_TOP)
> > {
> > ops->quick_push (left_op[1]);
> > ops->quick_push (right_op[1]);
> > ops->quick_push (left_op[0]);
> > }
> > - else if (kind == PERM_EVENEVEN && !conj_first_operand)
> > + else if (kind == PERM_EVENEVEN && status != CONJ_SND)
> > {
> > ops->quick_push (left_op[0]);
> > ops->quick_push (right_op[0]);
> > @@ -1061,13 +1119,14 @@ complex_mul_pattern::matches
> (complex_operation_t op,
> >
> > vect_pattern*
> > complex_mul_pattern::recognize (slp_tree_to_load_perm_map_t
> *perm_cache,
> > + slp_compat_nodes_map_t *compat_cache,
> > slp_tree *node)
> > {
> > auto_vec<slp_tree> ops;
> > complex_operation_t op
> > = vect_detect_pair_op (*node, true, &ops);
> > internal_fn ifn
> > - = complex_mul_pattern::matches (op, perm_cache, node, &ops);
> > + = complex_mul_pattern::matches (op, perm_cache, compat_cache,
> node, &ops);
> > if (ifn == IFN_LAST)
> > return NULL;
> >
> > @@ -1097,8 +1156,8 @@ complex_mul_pattern::build (vec_info *vinfo)
> >
> > /* First re-arrange the children. */
> > SLP_TREE_CHILDREN (*this->m_node).reserve_exact (2);
> > - SLP_TREE_CHILDREN (*this->m_node)[0] = this->m_ops[2];
> > - SLP_TREE_CHILDREN (*this->m_node)[1] = newnode;
> > + SLP_TREE_CHILDREN (*this->m_node)[0] = newnode;
> > + SLP_TREE_CHILDREN (*this->m_node)[1] = this->m_ops[2];
> > break;
> > }
> > case IFN_COMPLEX_FMA:
> > @@ -1115,9 +1174,9 @@ complex_mul_pattern::build (vec_info *vinfo)
> >
> > /* First re-arrange the children. */
> > SLP_TREE_CHILDREN (*this->m_node).safe_grow (3);
> > - SLP_TREE_CHILDREN (*this->m_node)[0] = this->m_ops[0];
> > + SLP_TREE_CHILDREN (*this->m_node)[0] = newnode;
> > SLP_TREE_CHILDREN (*this->m_node)[1] = this->m_ops[3];
> > - SLP_TREE_CHILDREN (*this->m_node)[2] = newnode;
> > + SLP_TREE_CHILDREN (*this->m_node)[2] = this->m_ops[0];
> >
> > /* Tell the builder to expect an extra argument. */
> > this->m_num_args++;
> > @@ -1147,11 +1206,12 @@ class complex_fms_pattern : public
> complex_pattern
> > public:
> > void build (vec_info *);
> > static internal_fn
> > - matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
> slp_tree *,
> > - vec<slp_tree> *);
> > + matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
> > + slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *);
> >
> > static vect_pattern*
> > - recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
> > + recognize (slp_tree_to_load_perm_map_t *,
> slp_compat_nodes_map_t *,
> > + slp_tree *);
> >
> > static vect_pattern*
> > mkInstance (slp_tree *node, vec<slp_tree> *m_ops, internal_fn ifn)
> > @@ -1182,6 +1242,7 @@ class complex_fms_pattern : public
> complex_pattern
> > internal_fn
> > complex_fms_pattern::matches (complex_operation_t op,
> > slp_tree_to_load_perm_map_t *perm_cache,
> > + slp_compat_nodes_map_t *compat_cache,
> > slp_tree * ref_node, vec<slp_tree> *ops)
> > {
> > internal_fn ifn = IFN_LAST;
> > @@ -1197,6 +1258,8 @@ complex_fms_pattern::matches
> (complex_operation_t op,
> > if (!vect_match_expression_p (root, MINUS_EXPR))
> > return IFN_LAST;
> >
> > + /* TODO: Support invariants here, with the new layout CADD now
> > + can match before we get a chance to try CFMS. */
> > auto nodes = SLP_TREE_CHILDREN (root);
> > if (!vect_match_expression_p (nodes[1], MULT_EXPR)
> > || vect_detect_pair_op (nodes[0]) != PLUS_MINUS)
> > @@ -1217,16 +1280,14 @@ complex_fms_pattern::matches
> (complex_operation_t op,
> > || !vect_match_expression_p (l0node[1], MULT_EXPR))
> > return IFN_LAST;
> >
> > - bool is_neg = vect_normalize_conj_loc (left_op);
> > -
> > - bool conj_first_operand = false;
> > - if (!vect_validate_multiplication (perm_cache, right_op, left_op, false,
> > - &conj_first_operand, true))
> > + enum _conj_status status;
> > + if (!vect_validate_multiplication (perm_cache, compat_cache, right_op,
> > + left_op, true, &status))
> > return IFN_LAST;
> >
> > - if (!is_neg)
> > + if (status == CONJ_NONE)
> > ifn = IFN_COMPLEX_FMS;
> > - else if (is_neg)
> > + else
> > ifn = IFN_COMPLEX_FMS_CONJ;
> >
> > if (!vect_pattern_validate_optab (ifn, *ref_node))
> > @@ -1243,26 +1304,12 @@ complex_fms_pattern::matches
> (complex_operation_t op,
> > ops->quick_push (right_op[1]);
> > ops->quick_push (left_op[1]);
> > }
> > - else if (kind == PERM_TOP)
> > - {
> > - ops->quick_push (l0node[0]);
> > - ops->quick_push (right_op[1]);
> > - ops->quick_push (right_op[0]);
> > - ops->quick_push (left_op[0]);
> > - }
> > - else if (kind == PERM_EVENEVEN && !is_neg)
> > - {
> > - ops->quick_push (l0node[0]);
> > - ops->quick_push (right_op[1]);
> > - ops->quick_push (right_op[0]);
> > - ops->quick_push (left_op[0]);
> > - }
> > else
> > {
> > ops->quick_push (l0node[0]);
> > ops->quick_push (right_op[1]);
> > ops->quick_push (right_op[0]);
> > - ops->quick_push (left_op[1]);
> > + ops->quick_push (left_op[0]);
> > }
> >
> > return ifn;
> > @@ -1272,13 +1319,14 @@ complex_fms_pattern::matches
> (complex_operation_t op,
> >
> > vect_pattern*
> > complex_fms_pattern::recognize (slp_tree_to_load_perm_map_t
> *perm_cache,
> > + slp_compat_nodes_map_t *compat_cache,
> > slp_tree *node)
> > {
> > auto_vec<slp_tree> ops;
> > complex_operation_t op
> > = vect_detect_pair_op (*node, true, &ops);
> > internal_fn ifn
> > - = complex_fms_pattern::matches (op, perm_cache, node, &ops);
> > + = complex_fms_pattern::matches (op, perm_cache, compat_cache,
> node, &ops);
> > if (ifn == IFN_LAST)
> > return NULL;
> >
> > @@ -1305,9 +1353,24 @@ complex_fms_pattern::build (vec_info *vinfo)
> > SLP_TREE_CHILDREN (*this->m_node).create (3);
> >
> > /* First re-arrange the children. */
> > + switch (this->m_ifn)
> > + {
> > + case IFN_COMPLEX_FMS:
> > + {
> > + SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[1]);
> > + SLP_TREE_CHILDREN (*this->m_node).quick_push (newnode);
> > + break;
> > + }
> > + case IFN_COMPLEX_FMS_CONJ:
> > + {
> > + SLP_TREE_CHILDREN (*this->m_node).quick_push (newnode);
> > + SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[1]);
> > + break;
> > + }
> > + default:
> > + gcc_unreachable ();
> > + }
> > SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[0]);
> > - SLP_TREE_CHILDREN (*this->m_node).quick_push (this->m_ops[1]);
> > - SLP_TREE_CHILDREN (*this->m_node).quick_push (newnode);
> >
> > /* And then rewrite the node itself. */
> > complex_pattern::build (vinfo);
> > @@ -1334,11 +1397,12 @@ class complex_operations_pattern : public
> complex_pattern
> > public:
> > void build (vec_info *);
> > static internal_fn
> > - matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
> slp_tree *,
> > - vec<slp_tree> *);
> > + matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
> > + slp_compat_nodes_map_t *, slp_tree *, vec<slp_tree> *);
> >
> > static vect_pattern*
> > - recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
> > + recognize (slp_tree_to_load_perm_map_t *,
> slp_compat_nodes_map_t *,
> > + slp_tree *);
> > };
> >
> > /* Dummy matches implementation for proxy object. */
> > @@ -1347,6 +1411,7 @@ internal_fn
> > complex_operations_pattern::
> > matches (complex_operation_t /* op */,
> > slp_tree_to_load_perm_map_t * /* perm_cache */,
> > + slp_compat_nodes_map_t * /* compat_cache */,
> > slp_tree * /* ref_node */, vec<slp_tree> * /* ops */)
> > {
> > return IFN_LAST;
> > @@ -1356,6 +1421,7 @@ matches (complex_operation_t /* op */,
> >
> > vect_pattern*
> > complex_operations_pattern::recognize (slp_tree_to_load_perm_map_t
> *perm_cache,
> > + slp_compat_nodes_map_t *ccache,
> > slp_tree *node)
> > {
> > auto_vec<slp_tree> ops;
> > @@ -1363,15 +1429,15 @@ complex_operations_pattern::recognize
> (slp_tree_to_load_perm_map_t *perm_cache,
> > = vect_detect_pair_op (*node, true, &ops);
> > internal_fn ifn = IFN_LAST;
> >
> > - ifn = complex_fms_pattern::matches (op, perm_cache, node, &ops);
> > + ifn = complex_fms_pattern::matches (op, perm_cache, ccache, node,
> &ops);
> > if (ifn != IFN_LAST)
> > return complex_fms_pattern::mkInstance (node, &ops, ifn);
> >
> > - ifn = complex_mul_pattern::matches (op, perm_cache, node, &ops);
> > + ifn = complex_mul_pattern::matches (op, perm_cache, ccache, node,
> &ops);
> > if (ifn != IFN_LAST)
> > return complex_mul_pattern::mkInstance (node, &ops, ifn);
> >
> > - ifn = complex_add_pattern::matches (op, perm_cache, node, &ops);
> > + ifn = complex_add_pattern::matches (op, perm_cache, ccache, node,
> &ops);
> > if (ifn != IFN_LAST)
> > return complex_add_pattern::mkInstance (node, &ops, ifn);
> >
> > @@ -1398,11 +1464,13 @@ class addsub_pattern : public vect_pattern
> > void build (vec_info *);
> >
> > static vect_pattern*
> > - recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
> > + recognize (slp_tree_to_load_perm_map_t *,
> slp_compat_nodes_map_t *,
> > + slp_tree *);
> > };
> >
> > vect_pattern *
> > -addsub_pattern::recognize (slp_tree_to_load_perm_map_t *, slp_tree
> *node_)
> > +addsub_pattern::recognize (slp_tree_to_load_perm_map_t *,
> > + slp_compat_nodes_map_t *, slp_tree *node_)
> > {
> > slp_tree node = *node_;
> > if (SLP_TREE_CODE (node) != VEC_PERM_EXPR
> > diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
> > index
> b912c3577df61a694d5bb9e22c5303fe6a48ab6e..cb577f8a612d583254e42bb06
> a6d7a0875de5e75 100644
> > --- a/gcc/tree-vect-slp.c
> > +++ b/gcc/tree-vect-slp.c
> > @@ -804,7 +804,7 @@ vect_get_and_check_slp_defs (vec_info *vinfo,
> unsigned char swap,
> > /* Return true if call statements CALL1 and CALL2 are similar enough
> > to be combined into the same SLP group. */
> >
> > -static bool
> > +bool
> > compatible_calls_p (gcall *call1, gcall *call2)
> > {
> > unsigned int nargs = gimple_call_num_args (call1);
> > @@ -2907,6 +2907,7 @@ optimize_load_redistribution
> (scalar_stmts_to_slp_tree_map_t *bst_map,
> > static bool
> > vect_match_slp_patterns_2 (slp_tree *ref_node, vec_info *vinfo,
> > slp_tree_to_load_perm_map_t *perm_cache,
> > + slp_compat_nodes_map_t *compat_cache,
> > hash_set<slp_tree> *visited)
> > {
> > unsigned i;
> > @@ -2918,11 +2919,13 @@ vect_match_slp_patterns_2 (slp_tree
> *ref_node, vec_info *vinfo,
> > slp_tree child;
> > FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
> > found_p |= vect_match_slp_patterns_2 (&SLP_TREE_CHILDREN
> (node)[i],
> > - vinfo, perm_cache, visited);
> > + vinfo, perm_cache, compat_cache,
> > + visited);
> >
> > for (unsigned x = 0; x < num__slp_patterns; x++)
> > {
> > - vect_pattern *pattern = slp_patterns[x] (perm_cache, ref_node);
> > + vect_pattern *pattern
> > + = slp_patterns[x] (perm_cache, compat_cache, ref_node);
> > if (pattern)
> > {
> > pattern->build (vinfo);
> > @@ -2943,7 +2946,8 @@ vect_match_slp_patterns_2 (slp_tree *ref_node,
> vec_info *vinfo,
> > static bool
> > vect_match_slp_patterns (slp_instance instance, vec_info *vinfo,
> > hash_set<slp_tree> *visited,
> > - slp_tree_to_load_perm_map_t *perm_cache)
> > + slp_tree_to_load_perm_map_t *perm_cache,
> > + slp_compat_nodes_map_t *compat_cache)
> > {
> > DUMP_VECT_SCOPE ("vect_match_slp_patterns");
> > slp_tree *ref_node = &SLP_INSTANCE_TREE (instance);
> > @@ -2953,7 +2957,8 @@ vect_match_slp_patterns (slp_instance instance,
> vec_info *vinfo,
> > "Analyzing SLP tree %p for patterns\n",
> > SLP_INSTANCE_TREE (instance));
> >
> > - return vect_match_slp_patterns_2 (ref_node, vinfo, perm_cache,
> visited);
> > + return vect_match_slp_patterns_2 (ref_node, vinfo, perm_cache,
> compat_cache,
> > + visited);
> > }
> >
> > /* STMT_INFO is a store group of size GROUP_SIZE that we are considering
> > @@ -3437,12 +3442,14 @@ vect_analyze_slp (vec_info *vinfo, unsigned
> max_tree_size)
> >
> > hash_set<slp_tree> visited_patterns;
> > slp_tree_to_load_perm_map_t perm_cache;
> > + slp_compat_nodes_map_t compat_cache;
> >
> > /* See if any patterns can be found in the SLP tree. */
> > bool pattern_found = false;
> > FOR_EACH_VEC_ELT (LOOP_VINFO_SLP_INSTANCES (vinfo), i, instance)
> > pattern_found |= vect_match_slp_patterns (instance, vinfo,
> > - &visited_patterns, &perm_cache);
> > + &visited_patterns, &perm_cache,
> > + &compat_cache);
> >
> > /* If any were found optimize permutations of loads. */
> > if (pattern_found)
> > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> > index
> 2f6e1e268fb07e9de065ff9c45af87546e565d66..83cd0919c7838c65576e1debd8
> 81e0ec636a605a 100644
> > --- a/gcc/tree-vectorizer.h
> > +++ b/gcc/tree-vectorizer.h
> > @@ -2268,6 +2268,7 @@ extern void duplicate_and_interleave (vec_info *,
> gimple_seq *, tree,
> > extern int vect_get_place_in_interleaving_chain (stmt_vec_info,
> stmt_vec_info);
> > extern slp_tree vect_create_new_slp_node (unsigned, tree_code);
> > extern void vect_free_slp_tree (slp_tree);
> > +extern bool compatible_calls_p (gcall *, gcall *);
> >
> > /* In tree-vect-patterns.c. */
> > extern void
> > @@ -2306,6 +2307,12 @@ typedef enum _complex_perm_kinds {
> > typedef hash_map <slp_tree, complex_perm_kinds_t>
> > slp_tree_to_load_perm_map_t;
> >
> > +/* Cache from nodes pair to being compatible or not. */
> > +typedef pair_hash <nofree_ptr_hash <_slp_tree>,
> > + nofree_ptr_hash <_slp_tree>> slp_node_hash;
> > +typedef hash_map <slp_node_hash, bool> slp_compat_nodes_map_t;
> > +
> > +
> > /* Vector pattern matcher base class. All SLP pattern matchers must
> inherit
> > from this type. */
> >
> > @@ -2338,7 +2345,8 @@ class vect_pattern
> > public:
> >
> > /* Create a new instance of the pattern matcher class of the given type.
> */
> > - static vect_pattern* recognize (slp_tree_to_load_perm_map_t *,
> slp_tree *);
> > + static vect_pattern* recognize (slp_tree_to_load_perm_map_t *,
> > + slp_compat_nodes_map_t *, slp_tree *);
> >
> > /* Build the pattern from the data collected so far. */
> > virtual void build (vec_info *) = 0;
> > @@ -2352,6 +2360,7 @@ class vect_pattern
> >
> > /* Function pointer to create a new pattern matcher from a generic type.
> */
> > typedef vect_pattern* (*vect_pattern_decl_t)
> (slp_tree_to_load_perm_map_t *,
> > + slp_compat_nodes_map_t *,
> > slp_tree *);
> >
> > /* List of supported pattern matchers. */
> >
> >
> > --
prev parent reply other threads:[~2022-01-11 7:31 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-17 15:42 Tamar Christina
2021-12-17 15:42 ` [2/3 PATCH]AArch64 use canonical ordering for complex mul, fma and fms Tamar Christina
2021-12-17 16:24 ` Richard Sandiford
2021-12-17 16:48 ` Richard Sandiford
2021-12-20 16:20 ` Tamar Christina
2022-01-11 7:10 ` Tamar Christina
2022-02-01 9:55 ` Tamar Christina
2022-02-01 11:04 ` Richard Sandiford
2021-12-17 15:43 ` [3/3 PATCH][AArch32] " Tamar Christina
2021-12-20 16:22 ` Tamar Christina
2022-01-11 7:10 ` Tamar Christina
2022-02-01 9:54 ` Tamar Christina
2022-02-01 9:56 ` Kyrylo Tkachov
2021-12-17 16:18 ` [1/3 PATCH]middle-end vect: Simplify and extend the complex numbers validation routines Richard Sandiford
2021-12-20 16:18 ` Tamar Christina
2022-01-10 10:16 ` Tamar Christina
2022-01-10 13:00 ` Richard Biener
2022-01-11 7:31 ` Tamar Christina [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=VI1PR08MB5325F1F961DD15AD36539369FF519@VI1PR08MB5325.eurprd08.prod.outlook.com \
--to=tamar.christina@arm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=nd@arm.com \
--cc=rguenther@suse.de \
--cc=richard.guenther@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).