From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <rguenther@suse.de>
Received: from mx2.suse.de (mx2.suse.de [195.135.220.15])
 by sourceware.org (Postfix) with ESMTPS id 282313898004
 for <gcc-patches@gcc.gnu.org>; Tue, 24 Nov 2020 11:37:30 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 282313898004
Authentication-Results: sourceware.org;
 dmarc=none (p=none dis=none) header.from=suse.de
Authentication-Results: sourceware.org;
 spf=pass smtp.mailfrom=rguenther@suse.de
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
Received: from relay2.suse.de (unknown [195.135.221.27])
 by mx2.suse.de (Postfix) with ESMTP id CE597AC2E;
 Tue, 24 Nov 2020 11:37:28 +0000 (UTC)
Date: Tue, 24 Nov 2020 11:37:28 +0000 (UTC)
From: Richard Biener <rguenther@suse.de>
To: Tamar Christina <Tamar.Christina@arm.com>
cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>, nd <nd@arm.com>, 
 "ook@ucw.cz" <ook@ucw.cz>, "hongtao.liu@intel.com" <hongtao.liu@intel.com>
Subject: RE: [PATCH] middle-end: Support complex Addition
In-Reply-To: <VI1PR08MB5325E280F33E7751BC87BD14FFFB0@VI1PR08MB5325.eurprd08.prod.outlook.com>
Message-ID: <nycvar.YFH.7.77.849.2011241122220.7048@jbgna.fhfr.qr>
References: <patch-13812-tamar@arm.com>
 <nycvar.YFH.7.77.849.2011231455590.7048@jbgna.fhfr.qr>
 <VI1PR08MB5325FF9537CCEFD3A3849304FFFC0@VI1PR08MB5325.eurprd08.prod.outlook.com>
 <nycvar.YFH.7.77.849.2011240847350.7048@jbgna.fhfr.qr> 
 <VI1PR08MB5325E280F33E7751BC87BD14FFFB0@VI1PR08MB5325.eurprd08.prod.outlook.com>
User-Agent: Alpine 2.22 (LSU 394 2020-01-19)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
X-Spam-Status: No, score=-10.3 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 KAM_ASCII_DIVIDERS, KAM_DMARC_STATUS, KAM_LOTSOFHASH, KAM_SHORT,
 RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE,
 SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Tue, 24 Nov 2020 11:37:43 -0000

On Tue, 24 Nov 2020, Tamar Christina wrote:

> 
> 
> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Tuesday, November 24, 2020 9:30 AM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; ook@ucw.cz;
> > hongtao.liu@intel.com
> > Subject: RE: [PATCH] middle-end: Support complex Addition
> > 
> > On Mon, 23 Nov 2020, Tamar Christina wrote:
> > 
> > > Hi Richi,
> > >
> > > > -----Original Message-----
> > > > From: Richard Biener <rguenther@suse.de>
> > > > Sent: Monday, November 23, 2020 3:51 PM
> > > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; ook@ucw.cz;
> > > > hongtao.liu@intel.com
> > > > Subject: Re: [PATCH] middle-end: Support complex Addition
> > > >
> > > > On Mon, 23 Nov 2020, Tamar Christina wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > This patch adds support for
> > > > >
> > > > >   * Complex Addition with rotation of 90 and 270.
> > > > >
> > > > >   Addition with rotation of the second argument around the Argand
> > plane.
> > > > >     Supported rotations are 90 and 180.
> > > > >
> > > > >     c = a + (b * I) and c = a + (b * I * I * I)
> > > > >
> > > > > For the full code I have pushed a branch at
> > > > refs/users/tnfchris/heads/complex-numbers.
> > > > >
> > > > > As a side note, I still needed to set
> > > > >
> > > > > STMT_SLP_TYPE (call_stmt_info) = pure_slp;
> > > > >
> > > > > as the new hybrid detection code only runs for loop aware SLP.
> > > > >
> > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues, but
> > > > > sorting out the testcases as TCL is processed before the CPP..
> > > > >
> > > > > Ok for master?
> > > >
> > > > So I failed to apply this patch (and after manual fixup build).
> > > > I went ahead and checked out the branch, patching the tree with
> > > > x86 support for cadd90 with -msse3 or -mavx2 using the attached
> > > > patch.
> > > >
> > >
> > > It requires a patch you have previously approved pending the rest so it's
> > not committed yet ?
> > 
> > Ah, I missed that.
> > 
> > > > For
> > > >
> > > > double c[1024], b[1024], a[1024];
> > > >
> > > > void foo ()
> > > > {
> > > >   for (int i = 0; i < 512; ++i)
> > > >     {
> > > >       c[2*i] = a[2*i] - b[2*i+1];
> > > >       c[2*i+1] = a[2*i+1] + b[2*i];
> > > >     }
> > > > }
> > > >
> > > > I then see
> > > >
> > > > t.c:5:21: note:    Analyzing SLP tree 0x39c0010 for patterns
> > > > t.c:5:21: note:    Found COMPLEX_ADD_ROT90 pattern in SLP tree
> > > > t.c:5:21: note:    Target supports COMPLEX_ADD_ROT90 vectorization
> > with
> > > > mode vector(2) double
> > > > t.c:5:21: note:    Pattern matched SLP tree
> > > > t.c:5:21: note:    node 0x39c0010 (max_nunits=2, refcnt=2)
> > > > t.c:5:21: note:    op template: c[_1] = _5;
> > > > t.c:5:21: note:         stmt 0 c[_1] = _5;
> > > > t.c:5:21: note:         stmt 1 c[_3] = _8;
> > > > t.c:5:21: note:         children 0x39c0080
> > > > t.c:5:21: note:    node 0x39c0080 (max_nunits=2, refcnt=2)
> > > > t.c:5:21: note:    op template: slp_patt_29 = .COMPLEX_ADD_ROT90 (_5,
> > _5);
> > > > t.c:5:21: note:         stmt 0 _5 = _2 - _4;
> > > > t.c:5:21: note:         stmt 1 _8 = _6 + _7;
> > > > t.c:5:21: note:         lane permutation { 0[0] 1[1] }
> > > > t.c:5:21: note:         children 0x39c00f0 0x39c02b0
> > > > t.c:5:21: note:    node 0x39c00f0 (max_nunits=2, refcnt=2)
> > > > t.c:5:21: note:    op template: _2 = a[_1];
> > > > t.c:5:21: note:         stmt 0 _2 = a[_1];
> > > > t.c:5:21: note:         stmt 1 _6 = a[_3];
> > > > t.c:5:21: note:         load permutation { 0 1 }
> > > > t.c:5:21: note:    node 0x39c02b0 (max_nunits=1, refcnt=1)
> > > > t.c:5:21: note:    op: VEC_PERM_EXPR
> > > > t.c:5:21: note:         { }
> > > > t.c:5:21: note:         lane permutation { 0[1] 0[0] }
> > > > t.c:5:21: note:         children 0x39c0160
> > > > t.c:5:21: note:    node 0x39c0160 (max_nunits=2, refcnt=2)
> > > > t.c:5:21: note:    op template: _4 = b[_3];
> > > > t.c:5:21: note:         stmt 0 _4 = b[_3];
> > > > t.c:5:21: note:         stmt 1 _7 = b[_1];
> > > > t.c:5:21: note:         load permutation { 1 0 }
> > > >
> > > > I'm confused about the lane permutation in the .COMPLEX_ADD_ROT90
> > > > node (I guess this permutation is simply ignored by code-generation).
> > > > Should it not be there?
> > >
> > > Yes, I had completely missed that. I forgot to blank it out.
> > 
> > Btw, in this context
> > 
> >       /* Unfortunately still need this on the new pattern because non-loop
> > SLP
> >          doesn't call vect_detect_hybrid_slp so it never updates it.  */
> >       STMT_SLP_TYPE (call_stmt_info) = pure_slp;
> > 
> > this isnt' about the hybrid marker but about vect_mark_slp_stmts
> > which marks all stmts participating in the SLP graph with pure_slp
> > which only marks SLP_TREE_SCALAR_STMTS but not
> > SLP_TREE_REPRESENTATIVE.
> > I think that's OK and thus the above setting of pure_slp is OK as well,
> > just the comment is off.  Maybe make it "Make sure to mark the
> > representative statement pure_slp and relevant".
> > 
> > > >
> > > > Otherwise the outcome is now as expected.  Permute optimization
> > > > later produces
> > > >
> > > > t.c:5:21: note:   node 0x39c0080 (max_nunits=2, refcnt=1)
> > > > t.c:5:21: note:   op template: slp_patt_29 = .COMPLEX_ADD_ROT90 (_5,
> > _5);
> > > > t.c:5:21: note:         stmt 0 _5 = _2 - _4;
> > > > t.c:5:21: note:         stmt 1 _8 = _6 + _7;
> > > > t.c:5:21: note:         lane permutation { 0[0] 1[1] }
> > > > t.c:5:21: note:         children 0x39c00f0 0x39c02b0
> > > > ...
> > > > t.c:5:21: note:   node 0x39c02b0 (max_nunits=1, refcnt=1)
> > > > t.c:5:21: note:   op: VEC_PERM_EXPR
> > > > t.c:5:21: note:         { }
> > > > t.c:5:21: note:         lane permutation { 0[0] 0[1] }
> > > > t.c:5:21: note:         children 0x39c0160
> > > > t.c:5:21: note:   node 0x39c0160 (max_nunits=2, refcnt=1)
> > > > t.c:5:21: note:   op template: _4 = b[_3];
> > > > t.c:5:21: note:         stmt 0 _7 = b[_1];
> > > > t.c:5:21: note:         stmt 1 _4 = b[_3];
> > > >
> > > > where the noop permute is correctly costed (and thus is just a
> > > > cosmetic annoyance):
> > > >
> > > > 0x3a13870 a[_1] 1 times vector_load costs 12 in body
> > > > 0x3a13870 b[_1] 1 times vector_load costs 12 in body
> > > > 0x3a13870 <unknown> 0 times vec_perm costs 0 in body
> > > > 0x3a13870 .COMPLEX_ADD_ROT90 (_5, _5) 1 times vector_stmt costs 12
> > in
> > > > body
> > > > 0x3a13870 _5 1 times vector_store costs 12 in body
> > > >
> > > > Code generated is also superior (-msse3):
> > > >
> > > > .L2:
> > > >         movapd  a(%rax), %xmm0
> > > >         addsubpd        b(%rax), %xmm0
> > > >         addq    $16, %rax
> > > >         movaps  %xmm0, c-16(%rax)
> > > >         cmpq    $8192, %rax
> > > >         jne     .L2
> > > >
> > > > compared to GCC 10 where we have an extra permute
> > > >
> > > > .L2:
> > > >         movapd  b(%rax), %xmm0
> > > >         movapd  a(%rax), %xmm1
> > > >         addq    $16, %rax
> > > >         shufpd  $1, %xmm0, %xmm0
> > > >         addsubpd        %xmm0, %xmm1
> > > >         movaps  %xmm1, c-16(%rax)
> > > >         cmpq    $8192, %rax
> > > >         jne     .L2
> > > >
> > > > which of course makes me wonder whether I have done the x86
> > > > support correctly.  Ah, I have not.  The x86 instructions
> > > > do not embed the even/odd lane swap, they just do the mixed
> > > > sign operation.  So for those we'd need additional optabs
> > > > and patterns then.
> > > >
> > > > So I see the branch contains only the complex add so I'm
> > > > going through the changes there:
> > >
> > > Yes I'm still updating MUL, FMA and FMS are tiny extensions to MUL.
> > >
> > > >
> > > >  /* Create an SLP node for SCALAR_STMTS.  */
> > > >
> > > > -static slp_tree
> > > > +slp_tree
> > > >  vect_create_new_slp_node (slp_tree node,
> > > >                           vec<stmt_vec_info> scalar_stmts, unsigned nops)
> > > >  {
> > > >    SLP_TREE_SCALAR_STMTS (node) = scalar_stmts;
> > > >    SLP_TREE_CHILDREN (node).create (nops);
> > > >    SLP_TREE_DEF_TYPE (node) = vect_internal_def;
> > > > -  SLP_TREE_REPRESENTATIVE (node) = scalar_stmts[0];
> > > > -  SLP_TREE_LANES (node) = scalar_stmts.length ();
> > > > +  if (scalar_stmts.exists ())
> > > > +    {
> > > > +      SLP_TREE_REPRESENTATIVE (node) = scalar_stmts[0];
> > > > +      SLP_TREE_LANES (node) = scalar_stmts.length ();
> > > > +    }
> > > >    return node;
> > > >  }
> > > >
> > > > so I don't like that very much, I guess we instead want a
> > > >
> > > > vect_create_new_perm_node (slp_node node, nops)
> > > >
> > > > which can pre-fill SLP_TREE_CODE.
> > > >
> > > > You add testsuite/gcc.dg/vect/complex/ but there's neither an
> > > > .exp file in it nor is it sourced from vect.exp - I suppose
> > > > some bits are missing here on the branch?
> > >
> > > Ugg, sorry... I forgot a git add...
> > >
> > > >
> > > > +typedef enum _complex_operation : unsigned {
> > > >
> > > > uh, oh - C++ I don't know.  Is : unsigned required?
> > > >
> > >
> > > It requires an enum base, so either enum E : int or enum class E,
> > > which apparently defaults to int.
> > >
> > > >
> > > > +/* Check to see if all loads rooted in ROOT are linear.  Linearity is
> > > > +   defined as having no gaps between values loaded.  */
> > > >
> > > > what is actually returned?
> > >
> > > It returns the load permute that the node being inspected would produce.
> > > Or rather, it shows how the data flows through the tree rooted at that
> > node.
> > >
> > > It's used a to determine if the operation being done does the odd/even
> > lane
> > > swapping.  This becomes more important for MUL as I need to distinguish
> > between
> > > a conjucate and a rotation.  Both of which produce just a negate node, but
> > what they
> > > negate determines what the operation is.
> > 
> > So it basically computes what optimize_slp does in its dataflow of
> > permutes?  But you do
> 
> Yes, unfortunately.. 
> 
> > 
> >   auto_vec<load_permutation_t> all_loads;
> >   bool is_perm = SLP_TREE_LANE_PERMUTATION (root).exists ();
> > 
> >   slp_tree child;
> >   FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (root), i, child)
> >     {
> >       loads = linear_loads_p (perm_cache, child, linear);
> >       if ((!*linear && !is_perm) || !loads.exists ())
> >         return loads;
> > 
> > so when there's a branch in the SLP graph and either one is
> > not linear you return the permute on that branch?  Or if there
> > isn't any permute on one branch you return that.  Whatever comes
> > first?  The code misses at least comments explaining on what
> > it computes for the root of a SLP subgraph (note the graph can
> > now be cyclic as to where I don't really see how that is handled
> > here).  (**)
> 
> No the branch is handled, it only bails out when it's not linear and the 
> node isn't A VEC_PERM.  So if you have for instance an ADD where one of 
> the is not linear then it doesn't care anymore.. Which indeed isn't 
> optimal, but it just rejects the case.

Err, but you just declare it not linear then which means it matches
the "permuted" operand for a cadd and the transform stage then
simply inserts the anticipated 'reverse' permute for that operand,
expecting permute optimization to get rid of.  That is, I think
you'd happily pattern match

 a[0] = a[0] + b[1];
 a[1] = a[1] - b[2];
 a[2] = a[2] + b[3];
 a[3] = a[3] - b[0];

because you can linearize it and it isn't linear?

> > 
> > > >
> > > > +static load_permutation_t
> > > > +linear_loads_p (slp_tree_to_load_perm_map_t *perm_cache, slp_tree
> > > > root,
> > > > +               bool *linear)
> > > > +{
> > > > ...
> > > > +  else if (SLP_TREE_DEF_TYPE (root) == vect_external_def)
> > > > +    {
> > > > +       loads.create (SLP_TREE_LANES (root));
> > > >
> > > > it's weird that you need to dig into vect_external_defs - if the
> > > > vectorizer for whatever reason decided to not make the defs internal
> > > > you shouldn't pick them up here?
> > >
> > > I do so because for the purposes of these instructions you need to have an
> > > alternating sequence. If you say have the same externals { _a , _a } that
> > operation
> > > isn't what the instruction expects. Accepting random externals was also
> > causing ICEs
> > > when compiling SPECFP 2017 but didn't look too deeply into this as I
> > couldn't convince
> > > myself that it should match these.
> > 
> > Did you actually run into a testcase with external loads?
> 
> Yes, quite a few of them actually. WRF and Blender in spec2017 FP have loads of
> these +/- pairs (~200 of them).
> 
> But a simple case is caxpy from LAPACK and BLAS
> 
> #include <stdlib.h>
> #include <complex.h>
> 
> void caxpy_sub(double complex * restrict y, double complex * restrict x, size_t N, double complex f) {
>   for (size_t i = 0; i < N; ++i)
>     y[i] -= x[i]*f;
> }
> 
> void caxpy_plus(double complex * restrict y, double complex * restrict x, size_t N, double complex f) {
>   for (size_t i = 0; i < N; ++i)
>     y[i] += x[i]*f;
> }
> 
> Which match FMA and FMS, but f is not a load.

OK, I see.  But it will break down if f is a complex constant or the
result of any other complex operation.  So I think it's a bit of a
hack - instead externals should be matched as "any way you like"
permute since we can simply code-gen any reverse permute into it
at code-generation time.
 
> The code in the patch set (the full one) generated for this before was
> 
> caxpy_sub:
>         cbz     x2, .L1
>         ins     v0.d[1], v1.d[0]
>         lsl     x3, x2, 4
>         mov     x2, 0
> .L3:
>         ldr     q1, [x0, x2]
>         ldr     q2, [x1, x2]
>         fcmla   v1.2d, v2.2d, v0.2d, #270
>         fcmla   v1.2d, v2.2d, v0.2d, #180
>         str     q1, [x0, x2]
>         add     x2, x2, 16
>         cmp     x2, x3
>         bne     .L3
> .L1:
>         ret
> 
> caxpy_plus:
>         cbz     x2, .L9
>         ins     v0.d[1], v1.d[0]
>         lsl     x3, x2, 4
>         mov     x2, 0
> .L11:
>         ldr     q1, [x0, x2]
>         ldr     q2, [x1, x2]
>         fcmla   v1.2d, v2.2d, v0.2d, #0
>         fcmla   v1.2d, v2.2d, v0.2d, #90
>         str     q1, [x0, x2]
>         add     x2, x2, 16
>         cmp     x2, x3
>         bne     .L11
> .L9:
>         ret
> 
> > 
> > > >
> > > > +  typedef const std::pair<unsigned, unsigned>* cmp_t;
> > > > +  zipped.qsort ([](const void *a, const void *b) -> int
> > > > +    { return (int)((cmp_t)a)->first - (int)((cmp_t)b)->first; });
> > > >
> > > > are we supposed to use lambdas?  I guess not.
> > >
> > > Oh.. wasn't aware lambdas weren't allowed.. I'll make it a function.
> > 
> > Jakub says lambdas are OK, so whatever pleases you more.
> > 
> > (**) so here you are computing a permute to undo that very exact
> > permute you discovered earlier - but I don't see how that discovered
> > permute is reality?
> > 
> > > >
> > > > Anyway, I wonder why we need to make the SLP children "linear"
> > > > in the first place?
> > >
> > > Because the instruction does the permute internally.
> > > It really is reflecting complex arithmetic.
> > 
> > Yes, I understand.
> > 
> > > >
> > > > That said, I wonder whether the x86 pattern here is more sensible
> > > > since if you have a sequence of complex adds I'm not sure your
> > > > "linear verifier" gets things optimal?  That is, in case this
> > > > is not single complex operations but in Ca + Cb Cb ends up
> > > > a complex expression.  If the ARM complex vector operation
> > > > swaps even/odd lanes of the second operand then wouldn't it
> > > > be better (and easier) to match
> > > >
> > > >  a0 = b0 - c0;
> > > >  a1 = b1 - c1;
> > > >
> > >
> > > I assume the second one should be a +?
> > 
> > Yes, sorry.
> > 
> > > > as
> > > >
> > > >  a = cadd90 (b, perm(c, { 1, 0}))
> > > >
> > > > and make the "anticipated" permute of the second operand part
> > > > of the actual pattern and to be eventually optimized by
> > > > permute optimization?  Because it's still cheaper than
> > > > what we have from the two-operator handling, namely
> > > > add, subtract and permute.  The SLP trees pasted above
> > > > do suggest that you add the anticipated permute operation
> > > > so I wonder whether all the linearization is just premature here?
> > >
> > > Consider add270:
> > >
> > >   for (int i=0; i < N; i++)
> > >       c[i] = a[i] + (b[i] * I * I * I);
> > >
> > > note:   Final SLP tree for instance 0x4461b30:
> > > note:   node 0x436c9c0 (max_nunits=4, refcnt=2)
> > > note:   op template: REALPART_EXPR <*_10> = _23;
> > > note:     stmt 0 REALPART_EXPR <*_10> = _23;
> > > note:     stmt 1 IMAGPART_EXPR <*_10> = _4;
> > > note:     children 0x436ca38
> > > note:   node 0x436ca38 (max_nunits=4, refcnt=2)
> > > note:   op: VEC_PERM_EXPR
> > > note:     stmt 0 _23 = _6 + _13;
> > > note:     stmt 1 _4 = _12 - _7;
> > > note:     lane permutation { 0[0] 1[1] }
> > > note:     children 0x436cba0 0x436cc18
> > > note:   node 0x436cba0 (max_nunits=1, refcnt=1)
> > > note:   op template: _23 = _6 + _13;
> > > note:     { }
> > > note:     children 0x436cab0 0x436cb28
> > > note:   node 0x436cab0 (max_nunits=4, refcnt=3)
> > > note:   op template: _13 = REALPART_EXPR <*_3>;
> > > note:     stmt 0 _13 = REALPART_EXPR <*_3>;
> > > note:     stmt 1 _12 = IMAGPART_EXPR <*_3>;
> > > note:     load permutation { 0 1 }
> > > note:   node 0x436cb28 (max_nunits=4, refcnt=3)
> > > note:   op template: _6 = IMAGPART_EXPR <*_5>;
> > > note:     stmt 0 _6 = IMAGPART_EXPR <*_5>;
> > > note:     stmt 1 _7 = REALPART_EXPR <*_5>;
> > > note:     load permutation { 1 0 }
> > > note:   node 0x436cc18 (max_nunits=1, refcnt=1)
> > > note:   op template: _4 = _12 - _7;
> > > note:     { }
> > > note:     children 0x436cab0 0x436cb28
> > >
> > > and add_conj:
> > >
> > >   for (int i=0; i < N; i++)
> > >       c[i] = a[i] + conjf (b[i]);
> > >
> > > note:   Final SLP tree for instance 0x4fbf5a0:
> > > note:   node 0x505d910 (max_nunits=4, refcnt=2)
> > > note:   op template: REALPART_EXPR <*_8> = _23;
> > > note:     stmt 0 REALPART_EXPR <*_8> = _23;
> > > note:     stmt 1 IMAGPART_EXPR <*_8> = _4;
> > > note:     children 0x505d988
> > > note:   node 0x505d988 (max_nunits=4, refcnt=2)
> > > note:   op: VEC_PERM_EXPR
> > > note:     stmt 0 _23 = _11 + _20;
> > > note:     stmt 1 _4 = _10 - _19;
> > > note:     lane permutation { 0[0] 1[1] }
> > > note:     children 0x505daf0 0x505db68
> > > note:   node 0x505daf0 (max_nunits=1, refcnt=1)
> > > note:   op template: _23 = _11 + _20;
> > > note:     { }
> > > note:     children 0x505da00 0x505da78
> > > note:   node 0x505da00 (max_nunits=4, refcnt=3)
> > > note:   op template: _11 = REALPART_EXPR <*_3>;
> > > note:     stmt 0 _11 = REALPART_EXPR <*_3>;
> > > note:     stmt 1 _10 = IMAGPART_EXPR <*_3>;
> > > note:     load permutation { 0 1 }
> > > note:   node 0x505da78 (max_nunits=4, refcnt=3)
> > > note:   op template: _20 = REALPART_EXPR <*_5>;
> > > note:     stmt 0 _20 = REALPART_EXPR <*_5>;
> > > note:     stmt 1 _19 = IMAGPART_EXPR <*_5>;
> > > note:     load permutation { 0 1 }
> > > note:   node 0x505db68 (max_nunits=1, refcnt=1)
> > > note:   op template: _4 = _10 - _19;
> > > note:     { }
> > > note:     children 0x505da00 0x505da78
> > >
> > > These are virtually identical. Aside from the first one having a permute in
> > > 0x436cb28 being {1, 0} and the one in 0x505da78 being {0, 1}.  But they
> > > are quite different operations. (in fact the conj case seems to match what
> > x86 has).
> > >
> > > So the problem with not checking the permutes is that you would treat
> > both of these
> > > the same and emit the instruction with the permute.  Which would
> > produce correct
> > > code but not necessarily efficient code.
> > >
> > > Swapping a {0, 1} permute is trivial, but accepting it means accepting any
> > random permute
> > > where either the permute requires a general permute operation (TBL)
> > which we cost quite
> > > high due to it's impact on register allocation and the fact it requires an index
> > register to be
> > > loaded from memory.
> > 
> > Hmm.  With having all these subtly different operations natively available
> > this indeed complicates things.  But then given a even/odd plus/minus
> > operation without a way to infer what permutation we are looking at
> > is there a good choice as to which of the even/odd lane instructions we
> > want to match?  It sounds add_conj it should be, no?
> > 
> > That said, it looks like a ordering issue with the permute optimization
> > phase to me.
> > 
> > So if we go with some heuristic then what you try to do is figure
> > if one of the operands of the pattern matched operation is already
> > perfectly linear.  For the operand the instruction can do a permutation
> > the exact permute cannot matter since you don't seem to compute an
> > exact permute but emit the "anticipated" one and leave the rest to
> > be (hopefully) optimized later.  The important part (cost-wise) seems
> > to be to not anticipate a permute where there is none.
> > 
> > > This means we will likely end up rejecting such cases based on cost alone
> > and no longer
> > > vectorize in these cases.
> > 
> > Is that so?  Without matching any pattern you'd have a vector plus and
> > a vector minus and then a tbl combining both?
> 
> No, without pattern matching we would have aborted SLP and used load/store lanes
> which results in ld2 and st2 which would have done the permutes.

Ick ;)

> The new approach would force the permutes out of the loads and make them explicit
> so load/store lanes detection won't find them.
> 
> > 
> > > The other case is when I don't even know how to make it "fit" in the
> > instruction. Consider:
> > >
> > >   for (int i=0; i < N; i+=2)
> > >     {
> > >       c[i] = a[i] - b[i];
> > >       c[i+1] = a[i+1] + b[i];
> > >     }
> > >
> > > Which becomes
> > >
> > > note:   Final SLP tree for instance 0x44e25a0:
> > > note:   node 0x45703e0 (max_nunits=2, refcnt=2)
> > > note:   op template: *_7 = _8;
> > > note:     stmt 0 *_7 = _8;
> > > note:     stmt 1 *_13 = _14;
> > > note:     children 0x4570458
> > > note:   node 0x4570458 (max_nunits=2, refcnt=2)
> > > note:   op: VEC_PERM_EXPR
> > > note:     stmt 0 _8 = _4 - _6;
> > > note:     stmt 1 _14 = _6 + _12;
> > > note:     lane permutation { 0[0] 1[1] }
> > > note:     children 0x45705c0 0x4570638
> > > note:   node 0x45705c0 (max_nunits=1, refcnt=1)
> > > note:   op template: _8 = _4 - _6;
> > > note:     { }
> > > note:     children 0x45704d0 0x4570548
> > > note:   node 0x45704d0 (max_nunits=2, refcnt=3)
> > > note:   op template: _4 = *_3;
> > > note:     stmt 0 _4 = *_3;
> > > note:     stmt 1 _12 = *_11;
> > > note:     load permutation { 0 1 }
> > > note:   node 0x4570548 (max_nunits=2, refcnt=3)
> > > note:   op template: _6 = *_5;
> > > note:     stmt 0 _6 = *_5;
> > > note:     stmt 1 _6 = *_5;
> > > note:     load permutation { 0 0 }
> > > note:   node 0x4570638 (max_nunits=1, refcnt=1)
> > > note:   op template: _14 = _6 + _12;
> > > note:     { }
> > > note:     children 0x45704d0 0x4570548
> > >
> > > Which I would need to work out on pen and paper to see if it can even
> > work
> > > With the instruction.. (we generate quite awful code for this atm with float).
> > 
> > Well, clearly the simple-minded match would add a perm node in
> > front of the b[i] load one and the permute optimization phase
> > would currently not elide it as no-op (or maybe it does, surely
> > it could).
> > 
> > > So the problem here is I can't go back to the old code should costing
> > become
> > > very expensive because of the permute it would need to insert.
> > >
> > > So I needed somewhat to reject the cases I know wouldn't generate good
> > code.
> > 
> > Yes - I think we do need to know the pattern is an obvious improvement
> > to the non-pattern state.  But I think it should always be due to the
> > removed add or subtract instruction?  Or are the complex instructions
> > more expensive than a single add or subtract?
> 
> No, the instruction itself is always cheaper, but if it has to do any preparation to
> be able to use the instruction then it may end up being more expensive due to
> secondary effects of generating the permutes.

But without matching you have two vector ops and a permute.  Isn't
the permute/blend code-generated from the two_operators handling
the very same TBL instruction?  Or is the even/odd permute much
more costly than the blend?

> Take as an example a permute where we require the use of TBL to create the valid
> even/odd pair that the instruction expects.   If we have an unrolled loop which would
> need multiple TBLs to accomplish this the cost rises even more.. 
> 
> > 
> > > >
> > > > How would we name the x86 instruction patterns which implement
> > > >
> > > >  a[i] = b[i] - c[i];
> > > >  a[i+1] = b[i+1] + c[i+1];
> > > >
> > > > ?  Those do not implement a full complex operation AFAICS
> > > > so would we name them plusminus<mode>3 and minusplus<mode>3
> > > > and fmas<mode>4, fmsa<mode>4?  They'd be the prefered match
> > > > (no anticipated permute necessary)?
> > >
> > > Yes, that makes sense. If the instructions have no expectations of a
> > > permute.
> > >
> > > So the difficult part here is I don't know how to find the right balance.
> > > You're right in that we should be able to accept the add_conj case and
> > > Just emit a permute there, as we have a single instruction for that permute.
> > >
> > > I also agree with you that it shouldn't be doing "costing" so early on,
> > > But if I don't do so, my only choices here are that it turns out to be cheap to
> > do so WIN,
> > > or it turns out to be expensive to do and we fail vectorization entirely (well
> > the loop vectorizer
> > > would probably try without SLP enabled and generate *something*, but
> > > the non-loop SLP is a bit out of luck..).
> > >
> > > If only there was a way to compare the costs for the non pattern matched
> > tree vs the
> > > pattern matched one.  But that would be quite a big addition at this point.
> > 
> > But what matters is of course the cost after permute optimization did
> > its work.
> > 
> > So I wonder if we can match cadd_conj during pattern matching and
> > wire turning that into cadd90/270 during optimize_slp when we know
> > the permute that is coming along the child?  Yes, that would put
> > knowledge of all of it into that point but thinking of this as
> > all doable in a separate pattern matching (without re-implementing
> > all of the permute optimization) doesn't look like it will work?
> > 
> > That is, when materializing a permute on a cadd_conj child we
> > can instead turn it into a cadd90/270?  We probably need to turn
> > the materialization loop into an ordered one based on the RPO
> > order computed earlier.
> 
> Do we need to do it in the loop? Probably a post step makes things a bit
> easier? Since indeed you don't want to do partial rewriting as you're in the
> loop?

Sure, that's also possible but I don't yet have a good feeling of
how this can work out in practice (also given the pecularity of
the multiplications).

> Ideally, if not for the costing we could have pushed the permute inside
> the pattern node and just have vectorizable_load map it to a different
> call?
> 
> > 
> > And if we just match cadd_conj (and the variant with even/odd
> > swapped) we could do this directly during SLP discovery as well
> > where we handle two_operators.  Do you have
> > 
> > Now the question is of course how this interacts with mul and fma/s
> > but I guess it's always the adds that introduce all the variants.
> > The mla/mls patterns have a comment
> 
> One thing this does kinda of enforce is that to match MUL you have to
> Have the ADD instruction.  Otherwise the pattern matcher has to look for
> both cadd_conj and two_op add/sub.  I don't mind this restriction but just
> pointing it out.
> 
> > 
> > +;; The complex mla/mls operations always need to expand to two
> > instructions.
> > +;; The first operation does half the computation and the second does the
> > +;; remainder.  Because of this, expand early.
> > 
> > so what are the building blocks there?  It makes it sound like
> > this is a widening multiplication or so?  Unfortunately
> > the patterns are half regular RTL and half unspec so they don't
> > really specify what is done semantically :/  It would be nice
> > if the patches with the aarch64 backend changes would be on
> > trunk already ... (on the branch I don't see anything related
> > to add_conj for example)
> 
> They're unspec since it's quite hard to express the blend they do in
> RTL.  I believe I would need one per mode..
> 
> But really the instruction is just MUL, MUL_CONJ.  The fact that on
> Arm architectures we need to expand to two instruction is just an
> ISA particularity. 
> 
> The instruction depending on the rotation value it's given either does
> the multiplication part on the real of imaginary part of the complex
> number.  Depending on the ISA the cmul version may actually exist,
> but on aarch64 we just use the FMA instruction and clear the initial
> accumulator to 0.
> 
>         movi    v1.2d, 0
>         fcmla   v1.2d, v2.2d, v0.2d, #0
>         fcmla   v1.2d, v2.2d, v0.2d, #90
> 
> The difficulty with mul is that you have to generate a permute that blends
> from two nodes.  For which you need to know which two nodes.
> 
> Consider:
> 
> note:   Final SLP tree for instance 0x472aa40:
> note:   node 0x4779730 (max_nunits=4, refcnt=2)
> note:   op template: REALPART_EXPR <*_7> = _25;
> note:     stmt 0 REALPART_EXPR <*_7> = _25;
> note:     stmt 1 IMAGPART_EXPR <*_7> = _26;
> note:     children 0x47797a8
> note:   node 0x47797a8 (max_nunits=4, refcnt=2)
> note:   op: VEC_PERM_EXPR
> note:     stmt 0 _25 = _17 - _22;
> note:     stmt 1 _26 = _23 + _24;
> note:     lane permutation { 0[0] 1[1] }
> note:     children 0x4779af0 0x4779b68
> note:   node 0x4779af0 (max_nunits=1, refcnt=1)
> note:   op template: _25 = _17 - _22;
> note:     { }
> note:     children 0x4779820 0x4779988
> note:   node 0x4779820 (max_nunits=4, refcnt=3)
> note:   op template: _17 = _10 * _19;
> note:     stmt 0 _17 = _10 * _19;
> note:     stmt 1 _23 = _10 * _18;
> note:     children 0x4779898 0x4779910
> note:   node 0x4779898 (max_nunits=4, refcnt=2)
> note:   op template: _10 = REALPART_EXPR <*_3>;
> note:     stmt 0 _10 = REALPART_EXPR <*_3>;
> note:     stmt 1 _10 = REALPART_EXPR <*_3>;
> note:     load permutation { 0 0 }
> note:   node 0x4779910 (max_nunits=4, refcnt=2)
> note:   op template: _19 = REALPART_EXPR <*_5>;
> note:     stmt 0 _19 = REALPART_EXPR <*_5>;
> note:     stmt 1 _18 = IMAGPART_EXPR <*_5>;
> note:     load permutation { 0 1 }
> note:   node 0x4779988 (max_nunits=4, refcnt=3)
> note:   op template: _22 = _9 * _18;
> note:     stmt 0 _22 = _9 * _18;
> note:     stmt 1 _24 = _9 * _19;
> note:     children 0x4779a00 0x4779a78
> note:   node 0x4779a00 (max_nunits=4, refcnt=2)
> note:   op template: _9 = IMAGPART_EXPR <*_3>;
> note:     stmt 0 _9 = IMAGPART_EXPR <*_3>;
> note:     stmt 1 _9 = IMAGPART_EXPR <*_3>;
> note:     load permutation { 1 1 }
> note:   node 0x4779a78 (max_nunits=4, refcnt=2)
> note:   op template: _18 = IMAGPART_EXPR <*_5>;
> note:     stmt 0 _18 = IMAGPART_EXPR <*_5>;
> note:     stmt 1 _19 = REALPART_EXPR <*_5>;
> note:     load permutation { 1 0 }
> note:   node 0x4779b68 (max_nunits=1, refcnt=1)
> note:   op template: _26 = _23 + _24;
> note:     { }
> note:     children 0x4779820 0x4779988
> 
> here 0x4779a00 and 0x4779898 need to be combined.
> To know which nodes need to be combined I currently use the result of the linearity analysis.
> 
> If we're moving away from that, I feel the proper thing to do would be to CSE the loads early
> on and have build_slp output a permute like we talked about.  That way I actually have a normal
> node for *_3 to use.

Yeah, that would be helpful here.  OTOH code-gen wise two
load-and-splat might be better for the CPU than a vector load
and two permutes (if we do not end up matching some fancy complex
operation pattern).  Which is why I didn't make much progress on
this area :/

Now, I see you're then using linearity analysis for correctness
parts of the transform as well (just anticipating a permute
that doesn't happen doesn't hurt correctness wise).

I'll have a more detailed look at this part after some more coffee,
I guess we do need it in some form.

Richard.

> > 
> > Btw, do you have any real-world cases that we want to optimize
> > where there's more than a single to-be-matched operation
> > operating on memory?
> 
> The MUL and FMA would do this. WRF in SPECCPU2017 also has plenty of cases where it did the transformation
> somewhere midway a calculation.  I can extract some of those if you'd like.
> 
> > 
> > Thanks,
> > Richard.
> > 
> > > Regards,
> > > Tamar
> > >
> > > >
> > > > Thanks (I hope we can simplify stuff further),
> > > > Richard.
> > > >
> > > > > Thanks,
> > > > > Tamar
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > > 	* tree-vect-slp-patterns.c: New file.
> > > > > 	* Makefile.in: Add it.
> > > > > 	* doc/passes.texi: Document it.
> > > > > 	* internal-fn.def (COMPLEX_ADD_ROT90, COMPLEX_ADD_ROT270):
> > > > New.
> > > > > 	* optabs.def (cadd90_optab, cadd270_optab): New.
> > > > > 	* doc/md.texi: Document them.
> > > > > 	* tree-vect-slp.c:
> > > > > 	(vect_free_slp_instance, vect_create_new_slp_node): Export.
> > > > > 	(vect_match_slp_patterns_2, vect_match_slp_patterns): New.
> > > > > 	(vect_analyze_slp): Use it.
> > > > > 	* tree-vectorizer.h (vect_free_slp_tree): Export.
> > > > > 	(enum _complex_operation): Forward declare.
> > > > > 	(class vect_pattern): New
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > >         * lib/target-supports.exp
> > > > > 	(check_effective_target_arm_v8_3a_complex_neon_ok_nocache):
> > > > Fix it.
> > > > > 	(check_effective_target_vect_complex_add_byte
> > > > > 	,check_effective_target_vect_complex_add_int
> > > > > 	,check_effective_target_vect_complex_add_short
> > > > > 	,check_effective_target_vect_complex_add_long
> > > > > 	,check_effective_target_vect_complex_add_half
> > > > > 	,check_effective_target_vect_complex_add_float
> > > > > 	,check_effective_target_vect_complex_add_double): New.
> > > > >         * gcc.dg/vect/complex/bb-slp-complex-add-byte.c: New test.
> > > > >         * gcc.dg/vect/complex/bb-slp-complex-add-int.c: New test.
> > > > >         * gcc.dg/vect/complex/bb-slp-complex-add-long.c: New test.
> > > > >         * gcc.dg/vect/complex/bb-slp-complex-add-pattern-byte.c: New
> > test.
> > > > >         * gcc.dg/vect/complex/bb-slp-complex-add-pattern-int.c: New test.
> > > > >         * gcc.dg/vect/complex/bb-slp-complex-add-pattern-long.c: New
> > test.
> > > > >         * gcc.dg/vect/complex/bb-slp-complex-add-pattern-short.c: New
> > test.
> > > > >         * gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-
> > byte.c:
> > > > New test.
> > > > >         * gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-int.c:
> > > > New test.
> > > > >         * gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-
> > long.c:
> > > > New test.
> > > > >         * gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-
> > short.c:
> > > > New test.
> > > > >         * gcc.dg/vect/complex/bb-slp-complex-add-short.c: New test.
> > > > >         * gcc.dg/vect/complex/bb-slp-complex-add-unsigned-byte.c: New
> > test.
> > > > >         * gcc.dg/vect/complex/bb-slp-complex-add-unsigned-int.c: New
> > test.
> > > > >         * gcc.dg/vect/complex/bb-slp-complex-add-unsigned-long.c: New
> > test.
> > > > >         * gcc.dg/vect/complex/bb-slp-complex-add-unsigned-short.c: New
> > > > test.
> > > > >         * gcc.dg/vect/complex/complex-add-pattern-template.c: New test.
> > > > >         * gcc.dg/vect/complex/complex-add-template.c: New test.
> > > > >         * gcc.dg/vect/complex/complex-operations-run.c: New test.
> > > > >         * gcc.dg/vect/complex/complex-operations.c: New test.
> > > > >         * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-double.c:
> > New
> > > > test.
> > > > >         * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-float.c: New
> > > > test.
> > > > >         * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-half-float.c:
> > > > New test.
> > > > >         * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-
> > > > double.c: New test.
> > > > >         * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-
> > float.c:
> > > > New test.
> > > > >         * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-
> > half-
> > > > float.c: New test.
> > > > >         * gcc.dg/vect/complex/fast-math-complex-add-double.c: New test.
> > > > >         * gcc.dg/vect/complex/fast-math-complex-add-float.c: New test.
> > > > >         * gcc.dg/vect/complex/fast-math-complex-add-half-float.c: New
> > test.
> > > > >         * gcc.dg/vect/complex/fast-math-complex-add-pattern-double.c:
> > New
> > > > test.
> > > > >         * gcc.dg/vect/complex/fast-math-complex-add-pattern-float.c:
> > New
> > > > test.
> > > > >         * gcc.dg/vect/complex/fast-math-complex-add-pattern-half-float.c:
> > > > New test.
> > > > >         * gcc.dg/vect/complex/vect-complex-add-byte.c: New test.
> > > > >         * gcc.dg/vect/complex/vect-complex-add-int.c: New test.
> > > > >         * gcc.dg/vect/complex/vect-complex-add-long.c: New test.
> > > > >         * gcc.dg/vect/complex/vect-complex-add-pattern-byte.c: New test.
> > > > >         * gcc.dg/vect/complex/vect-complex-add-pattern-int.c: New test.
> > > > >         * gcc.dg/vect/complex/vect-complex-add-pattern-long.c: New test.
> > > > >         * gcc.dg/vect/complex/vect-complex-add-pattern-short.c: New
> > test.
> > > > >         * gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-byte.c:
> > > > New test.
> > > > >         * gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-int.c:
> > New
> > > > test.
> > > > >         * gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-long.c:
> > > > New test.
> > > > >         * gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-
> > short.c:
> > > > New test.
> > > > >         * gcc.dg/vect/complex/vect-complex-add-short.c: New test.
> > > > >         * gcc.dg/vect/complex/vect-complex-add-unsigned-byte.c: New
> > test.
> > > > >         * gcc.dg/vect/complex/vect-complex-add-unsigned-int.c: New test.
> > > > >         * gcc.dg/vect/complex/vect-complex-add-unsigned-long.c: New
> > test.
> > > > >         * gcc.dg/vect/complex/vect-complex-add-unsigned-short.c: New
> > test.
> > > > >
> > > > > --- inline copy of patch --
> > > > > diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> > > > > index
> > > >
> > 778ec09c75d9af1cb9f2d5e7582b948c0397db65..d80657b089829fa30cede8bcf
> > > > e036dda0ec06682 100644
> > > > > --- a/gcc/Makefile.in
> > > > > +++ b/gcc/Makefile.in
> > > > > @@ -1646,6 +1646,7 @@ OBJS = \
> > > > >  	tree-vect-loop.o \
> > > > >  	tree-vect-loop-manip.o \
> > > > >  	tree-vect-slp.o \
> > > > > +	tree-vect-slp-patterns.o \
> > > > >  	tree-vectorizer.o \
> > > > >  	tree-vector-builder.o \
> > > > >  	tree-vrp.o \
> > > > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > > > > index
> > > >
> > da8c9a283dd42e2b3078ed5f370a37180ee0b538..2a030a1d7373cd2b5837aa1c
> > > > 99936a6a4e4e1480 100644
> > > > > --- a/gcc/doc/md.texi
> > > > > +++ b/gcc/doc/md.texi
> > > > > @@ -6154,6 +6154,54 @@ floating-point mode.
> > > > >
> > > > >  This pattern is not allowed to @code{FAIL}.
> > > > >
> > > > > +@cindex @code{cadd90@var{m}3} instruction pattern
> > > > > +@item @samp{cadd90@var{m}3}
> > > > > +Perform vector add and subtract on even/odd number pairs.  The
> > > > operation being
> > > > > +matched is semantically described as
> > > > > +
> > > > > +@smallexample
> > > > > +  for (int i = 0; i < N; i += 2)
> > > > > +    @{
> > > > > +      c[i] = a[i] - b[i+1];
> > > > > +      c[i+1] = a[i+1] + b[i];
> > > > > +    @}
> > > > > +@end smallexample
> > > > > +
> > > > > +This operation is semantically equivalent to performing a vector
> > addition
> > > > of
> > > > > +complex numbers in operand 1 with operand 2 rotated by 90 degrees
> > > > around
> > > > > +the argand plane and storing the result in operand 0.
> > > > > +
> > > > > +In GCC lane ordering the real part of the number must be in the even
> > > > lanes with
> > > > > +the imaginary part in the odd lanes.
> > > > > +
> > > > > +The operation is only supported for vector modes @var{m}.
> > > > > +
> > > > > +This pattern is not allowed to @code{FAIL}.
> > > > > +
> > > > > +@cindex @code{cadd270@var{m}3} instruction pattern
> > > > > +@item @samp{cadd270@var{m}3}
> > > > > +Perform vector add and subtract on even/odd number pairs.  The
> > > > operation being
> > > > > +matched is semantically described as
> > > > > +
> > > > > +@smallexample
> > > > > +  for (int i = 0; i < N; i += 2)
> > > > > +    @{
> > > > > +      c[i] = a[i] + b[i+1];
> > > > > +      c[i+1] = a[i+1] - b[i];
> > > > > +    @}
> > > > > +@end smallexample
> > > > > +
> > > > > +This operation is semantically equivalent to performing a vector
> > addition
> > > > of
> > > > > +complex numbers in operand 1 with operand 2 rotated by 270 degrees
> > > > around
> > > > > +the argand plane and storing the result in operand 0.
> > > > > +
> > > > > +In GCC lane ordering the real part of the number must be in the even
> > > > lanes with
> > > > > +the imaginary part in the odd lanes.
> > > > > +
> > > > > +The operation is only supported for vector modes @var{m}.
> > > > > +
> > > > > +This pattern is not allowed to @code{FAIL}.
> > > > > +
> > > > >  @cindex @code{ffs@var{m}2} instruction pattern
> > > > >  @item @samp{ffs@var{m}2}
> > > > >  Store into operand 0 one plus the index of the least significant 1-bit
> > > > > diff --git a/gcc/doc/passes.texi b/gcc/doc/passes.texi
> > > > > index
> > > >
> > a5ae4143a8c1293e674b499120372ee5fe5c412b..c86df5cd843084a5b7933ef99
> > > > a23386891a7b0c1 100644
> > > > > --- a/gcc/doc/passes.texi
> > > > > +++ b/gcc/doc/passes.texi
> > > > > @@ -709,7 +709,8 @@ loop.
> > > > >  The pass is implemented in @file{tree-vectorizer.c} (the main driver),
> > > > >  @file{tree-vect-loop.c} and @file{tree-vect-loop-manip.c} (loop
> > specific
> > > > parts
> > > > >  and general loop utilities), @file{tree-vect-slp} (loop-aware SLP
> > > > > -functionality), @file{tree-vect-stmts.c} and @file{tree-vect-data-
> > refs.c}.
> > > > > +functionality), @file{tree-vect-stmts.c}, @file{tree-vect-data-refs.c}
> > and
> > > > > +@file{tree-vect-slp-patterns.c} containing the SLP pattern matcher.
> > > > >  Analysis of data references is in @file{tree-data-ref.c}.
> > > > >
> > > > >  SLP Vectorization.  This pass performs vectorization of straight-line
> > code.
> > > > The
> > > > > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> > > > > index
> > > >
> > 310d37aa53819791b5df1683afca831f08e5892a..33c54be1e158ddea25c4cd6b1
> > > > 148df8cf4a509b5 100644
> > > > > --- a/gcc/internal-fn.def
> > > > > +++ b/gcc/internal-fn.def
> > > > > @@ -277,6 +277,9 @@ DEF_INTERNAL_FLT_FN (SCALB, ECF_CONST,
> > scalb,
> > > > binary)
> > > > >  DEF_INTERNAL_FLT_FLOATN_FN (FMIN, ECF_CONST, fmin, binary)
> > > > >  DEF_INTERNAL_FLT_FLOATN_FN (FMAX, ECF_CONST, fmax, binary)
> > > > >  DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONST, xorsign, binary)
> > > > > +DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT90, ECF_CONST,
> > cadd90,
> > > > binary)
> > > > > +DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST,
> > > > cadd270, binary)
> > > > > +
> > > > >
> > > > >  /* FP scales.  */
> > > > >  DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
> > > > > diff --git a/gcc/optabs.def b/gcc/optabs.def
> > > > > index
> > > >
> > 5607f51e6b4b775a92d1d8ffcd3e9b53e9270d6c..e9727def4dbf941bb9ac8b56f
> > > > 83f8ea0f52b262c 100644
> > > > > --- a/gcc/optabs.def
> > > > > +++ b/gcc/optabs.def
> > > > > @@ -290,6 +290,8 @@ OPTAB_D (atan_optab, "atan$a2")
> > > > >  OPTAB_D (atanh_optab, "atanh$a2")
> > > > >  OPTAB_D (copysign_optab, "copysign$F$a3")
> > > > >  OPTAB_D (xorsign_optab, "xorsign$F$a3")
> > > > > +OPTAB_D (cadd90_optab, "cadd90$a3")
> > > > > +OPTAB_D (cadd270_optab, "cadd270$a3")
> > > > >  OPTAB_D (cos_optab, "cos$a2")
> > > > >  OPTAB_D (cosh_optab, "cosh$a2")
> > > > >  OPTAB_D (exp10_optab, "exp10$a2")
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-
> > byte.c
> > > > b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-byte.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..3b1e0837a323364c55094240b
> > > > 21dcc4938fa37c2
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-byte.c
> > > > > @@ -0,0 +1,9 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_complex_add_byte } */
> > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > +
> > > > > +#define TYPE int8_t
> > > > > +#define N 16
> > > > > +#include <stdint.h>
> > > > > +#include "complex-add-template.c"
> > > > > \ No newline at end of file
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-
> > int.c
> > > > b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-int.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..33d3d13d629bb831272609c48
> > > > 4c78e6d19a7b930
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-int.c
> > > > > @@ -0,0 +1,9 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_complex_add_int } */
> > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > +
> > > > > +#define TYPE int32_t
> > > > > +#define N 16
> > > > > +#include <stdint.h>
> > > > > +#include "complex-add-template.c"
> > > > > \ No newline at end of file
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-
> > long.c
> > > > b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-long.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..54d0f1d6864c41fc656eeb1af3
> > > > 2736ad37dcf381
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-long.c
> > > > > @@ -0,0 +1,9 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_complex_add_long } */
> > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > +
> > > > > +#define TYPE int64_t
> > > > > +#define N 16
> > > > > +#include <stdint.h>
> > > > > +#include "complex-add-template.c"
> > > > > \ No newline at end of file
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-
> > > > pattern-byte.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-
> > add-
> > > > pattern-byte.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..fac77f7b626c985e4b033818a1
> > > > 0f126784d5a9a6
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-
> > pattern-
> > > > byte.c
> > > > > @@ -0,0 +1,9 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_complex_add_byte } */
> > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > +
> > > > > +#define TYPE int8_t
> > > > > +#define N 16
> > > > > +#include <stdint.h>
> > > > > +#include "complex-add-pattern-template.c"
> > > > > \ No newline at end of file
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-
> > > > pattern-int.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-
> > > > pattern-int.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..41a836c10c8f2f45a521912186
> > > > ab8ac5393f69fd
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-
> > pattern-
> > > > int.c
> > > > > @@ -0,0 +1,9 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_complex_add_int } */
> > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > +
> > > > > +#define TYPE int32_t
> > > > > +#define N 16
> > > > > +#include <stdint.h>
> > > > > +#include "complex-add-pattern-template.c"
> > > > > \ No newline at end of file
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-
> > > > pattern-long.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-
> > add-
> > > > pattern-long.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..175f51c46d125578520b5205c8
> > > > 6ca8a836174a2f
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-
> > pattern-
> > > > long.c
> > > > > @@ -0,0 +1,9 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_complex_add_long } */
> > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > +
> > > > > +#define TYPE int64_t
> > > > > +#define N 16
> > > > > +#include <stdint.h>
> > > > > +#include "complex-add-pattern-template.c"
> > > > > \ No newline at end of file
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-
> > > > pattern-short.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-
> > add-
> > > > pattern-short.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..c4fe72712a4d90bb5e89e6f6b
> > > > 2359029715c0bd8
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-
> > pattern-
> > > > short.c
> > > > > @@ -0,0 +1,9 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_complex_add_short } */
> > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > +
> > > > > +#define TYPE int16_t
> > > > > +#define N 16
> > > > > +#include <stdint.h>
> > > > > +#include "complex-add-pattern-template.c"
> > > > > \ No newline at end of file
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-
> > > > pattern-unsigned-byte.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-
> > > > complex-add-pattern-unsigned-byte.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..534a4201d54f73e0419c99a599
> > > > 55900b473107c8
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-
> > pattern-
> > > > unsigned-byte.c
> > > > > @@ -0,0 +1,9 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_complex_add_byte } */
> > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > +
> > > > > +#define TYPE uint8_t
> > > > > +#define N 16
> > > > > +#include <stdint.h>
> > > > > +#include "complex-add-pattern-template.c"
> > > > > \ No newline at end of file
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-
> > > > pattern-unsigned-int.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-
> > > > complex-add-pattern-unsigned-int.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..9e3cf8062668b87962e0c71710
> > > > 579939f950651c
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-
> > pattern-
> > > > unsigned-int.c
> > > > > @@ -0,0 +1,9 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_complex_add_int } */
> > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > +
> > > > > +#define TYPE uint32_t
> > > > > +#define N 16
> > > > > +#include <stdint.h>
> > > > > +#include "complex-add-pattern-template.c"
> > > > > \ No newline at end of file
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-
> > > > pattern-unsigned-long.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-
> > > > complex-add-pattern-unsigned-long.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..398fc94154c88f2f9088910e50c
> > > > 3c1d4cc0ce17f
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-
> > pattern-
> > > > unsigned-long.c
> > > > > @@ -0,0 +1,9 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_complex_add_long } */
> > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > +
> > > > > +#define TYPE uint64_t
> > > > > +#define N 16
> > > > > +#include <stdint.h>
> > > > > +#include "complex-add-pattern-template.c"
> > > > > \ No newline at end of file
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-
> > > > pattern-unsigned-short.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-
> > > > complex-add-pattern-unsigned-short.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..7326d29d86c27056705c6287d
> > > > a41dd0b85d5cc35
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-
> > pattern-
> > > > unsigned-short.c
> > > > > @@ -0,0 +1,9 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_complex_add_short } */
> > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > +
> > > > > +#define TYPE uint16_t
> > > > > +#define N 16
> > > > > +#include <stdint.h>
> > > > > +#include "complex-add-pattern-template.c"
> > > > > \ No newline at end of file
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-
> > short.c
> > > > b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-short.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..c1ce663dc7ab09875a06ad503
> > > > 81acc955dfd1fff
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-short.c
> > > > > @@ -0,0 +1,9 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_complex_add_short } */
> > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > +
> > > > > +#define TYPE int16_t
> > > > > +#define N 16
> > > > > +#include <stdint.h>
> > > > > +#include "complex-add-template.c"
> > > > > \ No newline at end of file
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-
> > > > unsigned-byte.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-
> > add-
> > > > unsigned-byte.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..8d0c817fdae8e6ff6cdc665d6a
> > > > 132b4fc322ea61
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-
> > unsigned-
> > > > byte.c
> > > > > @@ -0,0 +1,9 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_complex_add_byte } */
> > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > +
> > > > > +#define TYPE uint8_t
> > > > > +#define N 16
> > > > > +#include <stdint.h>
> > > > > +#include "complex-add-template.c"
> > > > > \ No newline at end of file
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-
> > > > unsigned-int.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-
> > add-
> > > > unsigned-int.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..3b08ecd0dd80f949ab88d7e74
> > > > 7602bb99fea7acc
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-
> > unsigned-
> > > > int.c
> > > > > @@ -0,0 +1,9 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_complex_add_int } */
> > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > +
> > > > > +#define TYPE uint32_t
> > > > > +#define N 16
> > > > > +#include <stdint.h>
> > > > > +#include "complex-add-template.c"
> > > > > \ No newline at end of file
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-
> > > > unsigned-long.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-
> > add-
> > > > unsigned-long.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..4e069ee8297064dcad7447fff6
> > > > 012a10a34543e3
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-
> > unsigned-
> > > > long.c
> > > > > @@ -0,0 +1,9 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_complex_add_long } */
> > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > +
> > > > > +#define TYPE uint64_t
> > > > > +#define N 16
> > > > > +#include <stdint.h>
> > > > > +#include "complex-add-template.c"
> > > > > \ No newline at end of file
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-
> > > > unsigned-short.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-
> > add-
> > > > unsigned-short.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..88d21abd3c8ee59901df645cf5
> > > > c036c548cc6b1c
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-
> > unsigned-
> > > > short.c
> > > > > @@ -0,0 +1,9 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_complex_add_short } */
> > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > +
> > > > > +#define TYPE uint16_t
> > > > > +#define N 16
> > > > > +#include <stdint.h>
> > > > > +#include "complex-add-template.c"
> > > > > \ No newline at end of file
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/complex-add-pattern-
> > > > template.c b/gcc/testsuite/gcc.dg/vect/complex/complex-add-pattern-
> > > > template.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..e8b8b19d1708673b17564b31d
> > > > 22df3443d667277
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/complex-add-pattern-
> > > > template.c
> > > > > @@ -0,0 +1,60 @@
> > > > > +void add90 (TYPE a[restrict N], TYPE b[restrict N], TYPE c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i+=2)
> > > > > +    {
> > > > > +      c[i] = a[i] - b[i+1];
> > > > > +      c[i+1] = a[i+1] + b[i];
> > > > > +    }
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 1
> > > > "vect" } } */
> > > > > +
> > > > > +void add270 (TYPE a[restrict N], TYPE b[restrict N], TYPE c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i+=2)
> > > > > +    {
> > > > > +      c[i] = a[i] + b[i+1];
> > > > > +      c[i+1] = a[i+1] - b[i];
> > > > > +    }
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270"
> > 1
> > > > "vect" } } */
> > > > > +
> > > > > +void addMixed (TYPE a[restrict N], TYPE b[restrict N], TYPE c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i+=4)
> > > > > +    {
> > > > > +      c[i] = a[i] - b[i+1];
> > > > > +      c[i+1] = a[i+1] + b[i];
> > > > > +      c[i+2] = a[i+2] + b[i+3];
> > > > > +      c[i+3] = a[i+3] - b[i+2];
> > > > > +    }
> > > > > +}
> > > > > +
> > > > > +void add90HandUnrolled (TYPE a[restrict N], TYPE b[restrict N],
> > > > > +			TYPE c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < (N /2); i+=4)
> > > > > +    {
> > > > > +      c[i] = a[i] - b[i+1];
> > > > > +      c[i+2] = a[i+2] - b[i+3];
> > > > > +      c[i+1] = a[i+1] + b[i];
> > > > > +      c[i+3] = a[i+3] + b[i+2];
> > > > > +    }
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 1
> > > > "vect" } } */
> > > > > +
> > > > > +void add90Hybrid (TYPE a[restrict N], TYPE b[restrict N], TYPE c[restrict
> > N],
> > > > > +		  TYPE d[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i+=2)
> > > > > +    {
> > > > > +      c[i] = a[i] - b[i+1];
> > > > > +      c[i+1] = a[i+1] + b[i];
> > > > > +      d[i] = a[i] - b[i];
> > > > > +      d[i+1] = a[i+1] - b[i+1];
> > > > > +    }
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 2
> > > > "vect" } } */
> > > > > \ No newline at end of file
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/complex-add-template.c
> > > > b/gcc/testsuite/gcc.dg/vect/complex/complex-add-template.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..afe08e867473695f0a742de330
> > > > 944f495bc541d7
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/complex-add-template.c
> > > > > @@ -0,0 +1,77 @@
> > > > > +void add0 (TYPE _Complex a[restrict N], TYPE _Complex b[restrict N],
> > > > > +	   TYPE _Complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +    c[i] = a[i] + b[i];
> > > > > +}
> > > > > +
> > > > > +void add90snd (TYPE _Complex a[restrict N], TYPE _Complex b[restrict
> > N],
> > > > > +	       TYPE _Complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +    c[i] = a[i] + (b[i] * 1.0i);
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 1
> > > > "vect" } } */
> > > > > +
> > > > > +void add180snd (TYPE _Complex a[restrict N], TYPE _Complex
> > b[restrict N],
> > > > > +	        TYPE _Complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +    c[i] = a[i] + (b[i] * 1.0i * 1.0i);
> > > > > +}
> > > > > +
> > > > > +void add270snd (TYPE _Complex a[restrict N], TYPE _Complex
> > b[restrict N],
> > > > > +	        TYPE _Complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +    c[i] = a[i] + b[i];
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270"
> > 1
> > > > "vect" } } */
> > > > > +
> > > > > +void add90fst (TYPE _Complex a[restrict N], TYPE _Complex b[restrict
> > N],
> > > > > +	       TYPE _Complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +    c[i] = (a[i] * 1.0i) + b[i];
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 1
> > > > "vect" } } */
> > > > > +
> > > > > +void add180fst (TYPE _Complex a[restrict N], TYPE _Complex b[restrict
> > N],
> > > > > +	        TYPE _Complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +    c[i] = (a[i] * 1.0i * 1.0i) + b[i];
> > > > > +}
> > > > > +
> > > > > +void add270fst (TYPE _Complex a[restrict N], TYPE _Complex b[restrict
> > N],
> > > > > +	        TYPE _Complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +    c[i] = (a[i] * 1.0i * 1.0i * 1.0i) + b[i];
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270"
> > 1
> > > > "vect" } } */
> > > > > +
> > > > > +void addconjfst (TYPE _Complex a[restrict N], TYPE _Complex
> > b[restrict N],
> > > > > +		 TYPE _Complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +    c[i] = ~a[i] + b[i];
> > > > > +}
> > > > > +
> > > > > +void addconjsnd (TYPE _Complex a[restrict N], TYPE _Complex
> > b[restrict
> > > > N],
> > > > > +		 TYPE _Complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +    c[i] = a[i] + ~b[i];
> > > > > +}
> > > > > +
> > > > > +void addconjboth (TYPE _Complex a[restrict N], TYPE _Complex
> > b[restrict
> > > > N],
> > > > > +		  TYPE _Complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +    c[i] = ~a[i] + ~b[i];
> > > > > +}
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/complex-operations-
> > run.c
> > > > b/gcc/testsuite/gcc.dg/vect/complex/complex-operations-run.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..a0348a7041ca384104bc5ab688
> > > > d941c14e5b7381
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/complex-operations-run.c
> > > > > @@ -0,0 +1,103 @@
> > > > > +/* { dg-do run } */
> > > > > +/* { dg-require-effective-target vect_complex_add_double } */
> > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > +
> > > > > +#include <stdio.h>
> > > > > +#include <complex.h>
> > > > > +#include <string.h>
> > > > > +#include <float.h>
> > > > > +#include <math.h>
> > > > > +
> > > > > +#define PREF old
> > > > > +#pragma GCC push_options
> > > > > +#pragma GCC optimize ("no-tree-vectorize")
> > > > > +# include "complex-operations.c"
> > > > > +#pragma GCC pop_options
> > > > > +#undef PREF
> > > > > +
> > > > > +#define PREF new
> > > > > +# include "complex-operations.c"
> > > > > +#undef PREF
> > > > > +
> > > > > +#define TYPE double
> > > > > +#define TYPE2 double
> > > > > +#define EP pow(2, -45)
> > > > > +
> > > > > +#define xstr(s) str(s)
> > > > > +#define str(s) #s
> > > > > +
> > > > > +#define FCMP(A, B) \
> > > > > +  ((fabs (creal (A) - creal (B)) <= EP) && (fabs (cimag (A) - cimag (B)) <=
> > EP))
> > > > > +
> > > > > +#define CMP(A, B) \
> > > > > +  (FCMP(A,B) ? "PASS" : "FAIL")
> > > > > +
> > > > > +#define COMPARE(A,B) \
> > > > > +  memset (&c1, 0, sizeof (c1)); \
> > > > > +  memset (&c2, 0, sizeof (c2)); \
> > > > > +  A; B; \
> > > > > +  if (!FCMP(c1[0],c2[0]) || !FCMP(c1[1], c2[1])) \
> > > > > +  { \
> > > > > +    printf ("=> %s vs %s\n", xstr (A), xstr (B)); \
> > > > > +    printf ("%a\n", creal (c1[0]) - creal (c2[0])); \
> > > > > +    printf ("%a\n", cimag (c1[1]) - cimag (c2[1])); \
> > > > > +    printf ("%.2f+%.2fI == %.2f+%.2fI (%s)\n", creal (c1[0]), cimag (c1[0]),
> > > > creal (c2[0]), cimag (c2[0]), CMP (c1[0], c2[0])); \
> > > > > +    printf ("%.2f+%.2fI == %.2f+%.2fI (%s)\n", creal (c1[1]), cimag (c1[1]),
> > > > creal (c2[1]), cimag (c2[1]), CMP (c1[1], c2[1])); \
> > > > > +    printf ("\n"); \
> > > > > +    __builtin_abort (); \
> > > > > +  }
> > > > > +
> > > > > +int main ()
> > > > > +{
> > > > > +  TYPE2 complex a[] = { 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 *
> > I,
> > > > 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I,
> > 1.0
> > > > + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0
> > +
> > > > 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 +
> > 3.0
> > > > * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0
> > * I,
> > > > 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I };
> > > > > +  TYPE  complex b[] = { 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I,
> > > > 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I,
> > 1.1
> > > > + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1
> > +
> > > > 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 +
> > 3.1
> > > > * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1
> > * I,
> > > > 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I };
> > > > > +  TYPE  complex c2[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> > 0,
> > > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
> > > > > +  TYPE  complex c1[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> > 0,
> > > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
> > > > > +  TYPE  diff1, diff2;
> > > > > +
> > > > > +  COMPARE(fma0_old(a, b, c1), fma0_new(a, b, c2));
> > > > > +  COMPARE(fma90_old(a, b, c1), fma90_new(a, b, c2));
> > > > > +  COMPARE(fma180_old(a, b, c1), fma180_new(a, b, c2));
> > > > > +  COMPARE(fma270_old(a, b, c1), fma270_new(a, b, c2));
> > > > > +  COMPARE(fma0_snd_old(a, b, c1), fma0_snd_new(a, b, c2));
> > > > > +  COMPARE(fma90_snd_old(a, b, c1), fma90_snd_new(a, b, c2));
> > > > > +  COMPARE(fma180_snd_old(a, b, c1), fma180_snd_new(a, b, c2));
> > > > > +  COMPARE(fma270_snd_old(a, b, c1), fma270_snd_new(a, b, c2));
> > > > > +  COMPARE(fma_conj_first_old(a, b, c1), fma_conj_first_new(a, b,
> > c2));
> > > > > +  COMPARE(fma_conj_second_old(a, b, c1), fma_conj_second_new(a,
> > b,
> > > > c2));
> > > > > +  COMPARE(fma_conj_both_old(a, b, c1), fma_conj_both_new(a, b,
> > c2));
> > > > > +  COMPARE(fms0_old(a, b, c1), fms0_new(a, b, c2));
> > > > > +  COMPARE(fms90_old(a, b, c1), fms90_new(a, b, c2));
> > > > > +  COMPARE(fms180_old(a, b, c1), fms180_new(a, b, c2));
> > > > > +  COMPARE(fms270_old(a, b, c1), fms270_new(a, b, c2));
> > > > > +  COMPARE(fms0_snd_old(a, b, c1), fms0_snd_new(a, b, c2));
> > > > > +  COMPARE(fms90_snd_old(a, b, c1), fms90_snd_new(a, b, c2));
> > > > > +  COMPARE(fms180_snd_old(a, b, c1), fms180_snd_new(a, b, c2));
> > > > > +  COMPARE(fms270_snd_old(a, b, c1), fms270_snd_new(a, b, c2));
> > > > > +  COMPARE(fms_conj_first_old(a, b, c1), fms_conj_first_new(a, b,
> > c2));
> > > > > +  COMPARE(fms_conj_second_old(a, b, c1), fms_conj_second_new(a,
> > b,
> > > > c2));
> > > > > +  COMPARE(fms_conj_both_old(a, b, c1), fms_conj_both_new(a, b,
> > c2));
> > > > > +  COMPARE(mul0_old(a, b, c1), mul0_new(a, b, c2));
> > > > > +  COMPARE(mul90_old(a, b, c1), mul90_new(a, b, c2));
> > > > > +  COMPARE(mul180_old(a, b, c1), mul180_new(a, b, c2));
> > > > > +  COMPARE(mul270_old(a, b, c1), mul270_new(a, b, c2));
> > > > > +  COMPARE(mul0_snd_old(a, b, c1), mul0_snd_new(a, b, c2));
> > > > > +  COMPARE(mul90_snd_old(a, b, c1), mul90_snd_new(a, b, c2));
> > > > > +  COMPARE(mul180_snd_old(a, b, c1), mul180_snd_new(a, b, c2));
> > > > > +  COMPARE(mul270_snd_old(a, b, c1), mul270_snd_new(a, b, c2));
> > > > > +  COMPARE(mul_conj_first_old(a, b, c1), mul_conj_first_new(a, b,
> > c2));
> > > > > +  COMPARE(mul_conj_second_old(a, b, c1), mul_conj_second_new(a,
> > b,
> > > > c2));
> > > > > +  COMPARE(mul_conj_both_old(a, b, c1), mul_conj_both_new(a, b,
> > c2));
> > > > > +  COMPARE(add0_old(a, b, c1), add0_new(a, b, c2));
> > > > > +  COMPARE(add90_old(a, b, c1), add90_new(a, b, c2));
> > > > > +  COMPARE(add180_old(a, b, c1), add180_new(a, b, c2));
> > > > > +  COMPARE(add270_old(a, b, c1), add270_new(a, b, c2));
> > > > > +  COMPARE(add0_snd_old(a, b, c1), add0_snd_new(a, b, c2));
> > > > > +  COMPARE(add90_snd_old(a, b, c1), add90_snd_new(a, b, c2));
> > > > > +  COMPARE(add180_snd_old(a, b, c1), add180_snd_new(a, b, c2));
> > > > > +  COMPARE(add270_snd_old(a, b, c1), add270_snd_new(a, b, c2));
> > > > > +  COMPARE(add_conj_first_old(a, b, c1), add_conj_first_new(a, b,
> > c2));
> > > > > +  COMPARE(add_conj_second_old(a, b, c1), add_conj_second_new(a,
> > b,
> > > > c2));
> > > > > +  COMPARE(add_conj_both_old(a, b, c1), add_conj_both_new(a, b,
> > c2));
> > > > > +}
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/complex-operations.c
> > > > b/gcc/testsuite/gcc.dg/vect/complex/complex-operations.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..fdce995481d23c6a536293c8ee
> > > > 59eaf9ca9239bf
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/complex-operations.c
> > > > > @@ -0,0 +1,358 @@
> > > > > +#include <stdio.h>
> > > > > +#include <complex.h>
> > > > > +
> > > > > +#ifndef PREF
> > > > > +#define PREF c
> > > > > +#endif
> > > > > +
> > > > > +#define FX(N,P) P ## _ ## N
> > > > > +#define MK(N,P) FX(P,N)
> > > > > +
> > > > > +#define N 32
> > > > > +#define TYPE double
> > > > > +
> > > > > +// ------ FMA
> > > > > +
> > > > > +// Complex FMA instructions rotating the result
> > > > > +
> > > > > +__attribute__((noinline,noipa))
> > > > > +void MK(fma0, PREF) (TYPE complex a[restrict N], TYPE complex
> > b[restrict
> > > > N], TYPE complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +      c[i] += a[i] * b[i];
> > > > > +}
> > > > > +
> > > > > +__attribute__((noinline,noipa))
> > > > > +void MK(fma90, PREF) (TYPE complex a[restrict N], TYPE complex
> > > > b[restrict N], TYPE complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +      c[i] += a[i] * b[i] * I;
> > > > > +}
> > > > > +
> > > > > +__attribute__((noinline,noipa))
> > > > > +void MK(fma180, PREF) (TYPE complex a[restrict N], TYPE complex
> > > > b[restrict N], TYPE complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +      c[i] += a[i] * b[i] * I * I;
> > > > > +}
> > > > > +
> > > > > +__attribute__((noinline,noipa))
> > > > > +void MK(fma270, PREF) (TYPE complex a[restrict N], TYPE complex
> > > > b[restrict N], TYPE complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +      c[i] += a[i] * b[i] * I * I * I;
> > > > > +}
> > > > > +
> > > > > +// Complex FMA instructions rotating the second parameter.
> > > > > +
> > > > > +
> > > > > +__attribute__((noinline,noipa))
> > > > > +void MK(fma0_snd, PREF) (TYPE complex a[restrict N], TYPE complex
> > > > b[restrict N], TYPE complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +      c[i] += a[i] * b[i];
> > > > > +}
> > > > > +
> > > > > +__attribute__((noinline,noipa))
> > > > > +void MK(fma90_snd, PREF) (TYPE complex a[restrict N], TYPE complex
> > > > b[restrict N], TYPE complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +      c[i] += a[i] * (b[i] * I);
> > > > > +}
> > > > > +
> > > > > +__attribute__((noinline,noipa))
> > > > > +void MK(fma180_snd, PREF) (TYPE complex a[restrict N], TYPE
> > complex
> > > > b[restrict N], TYPE complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +      c[i] += a[i] * (b[i] * I * I);
> > > > > +}
> > > > > +
> > > > > +__attribute__((noinline,noipa))
> > > > > +void MK(fma270_snd, PREF) (TYPE complex a[restrict N], TYPE
> > complex
> > > > b[restrict N], TYPE complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +      c[i] += a[i] * (b[i] * I * I * I);
> > > > > +}
> > > > > +
> > > > > +// Complex FMA instructions with conjucated values.
> > > > > +
> > > > > +
> > > > > +__attribute__((noinline,noipa))
> > > > > +void MK(fma_conj_first, PREF) (TYPE complex a[restrict N], TYPE
> > complex
> > > > b[restrict N], TYPE complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +      c[i] += conj (a[i]) * b[i];
> > > > > +}
> > > > > +
> > > > > +__attribute__((noinline,noipa))
> > > > > +void MK(fma_conj_second, PREF) (TYPE complex a[restrict N], TYPE
> > > > complex b[restrict N], TYPE complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +      c[i] += a[i] * conj (b[i]);
> > > > > +}
> > > > > +
> > > > > +__attribute__((noinline,noipa))
> > > > > +void MK(fma_conj_both, PREF) (TYPE complex a[restrict N], TYPE
> > complex
> > > > b[restrict N], TYPE complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +      c[i] += conj (a[i]) * conj (b[i]);
> > > > > +}
> > > > > +
> > > > > +// ----- FMS
> > > > > +
> > > > > +// Complex FMS instructions rotating the result
> > > > > +
> > > > > +__attribute__((noinline,noipa))
> > > > > +void MK(fms0, PREF) (TYPE complex a[restrict N], TYPE complex
> > b[restrict
> > > > N], TYPE complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +      c[i] -= a[i] * b[i];
> > > > > +}
> > > > > +
> > > > > +__attribute__((noinline,noipa))
> > > > > +void MK(fms90, PREF) (TYPE complex a[restrict N], TYPE complex
> > > > b[restrict N], TYPE complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +      c[i] -= a[i] * b[i] * I;
> > > > > +}
> > > > > +
> > > > > +__attribute__((noinline,noipa))
> > > > > +void MK(fms180, PREF) (TYPE complex a[restrict N], TYPE complex
> > > > b[restrict N], TYPE complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +      c[i] -= a[i] * b[i] * I * I;
> > > > > +}
> > > > > +
> > > > > +__attribute__((noinline,noipa))
> > > > > +void MK(fms270, PREF) (TYPE complex a[restrict N], TYPE complex
> > > > b[restrict N], TYPE complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +      c[i] -= a[i] * b[i] * I * I * I;
> > > > > +}
> > > > > +
> > > > > +// Complex FMS instructions rotating the second parameter.
> > > > > +
> > > > > +__attribute__((noinline,noipa))
> > > > > +void MK(fms0_snd, PREF) (TYPE complex a[restrict N], TYPE complex
> > > > b[restrict N], TYPE complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +      c[i] -= a[i] * b[i];
> > > > > +}
> > > > > +
> > > > > +__attribute__((noinline,noipa))
> > > > > +void MK(fms90_snd, PREF) (TYPE complex a[restrict N], TYPE complex
> > > > b[restrict N], TYPE complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +      c[i] -= a[i] * (b[i] * I);
> > > > > +}
> > > > > +
> > > > > +__attribute__((noinline,noipa))
> > > > > +void MK(fms180_snd, PREF) (TYPE complex a[restrict N], TYPE
> > complex
> > > > b[restrict N], TYPE complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +      c[i] -= a[i] * (b[i] * I * I);
> > > > > +}
> > > > > +
> > > > > +__attribute__((noinline,noipa))
> > > > > +void MK(fms270_snd, PREF) (TYPE complex a[restrict N], TYPE
> > complex
> > > > b[restrict N], TYPE complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +      c[i] -= a[i] * (b[i] * I * I * I);
> > > > > +}
> > > > > +
> > > > > +// Complex FMS instructions with conjucated values.
> > > > > +
> > > > > +__attribute__((noinline,noipa))
> > > > > +void MK(fms_conj_first, PREF) (TYPE complex a[restrict N], TYPE
> > complex
> > > > b[restrict N], TYPE complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +      c[i] -= conj (a[i]) * b[i];
> > > > > +}
> > > > > +
> > > > > +__attribute__((noinline,noipa))
> > > > > +void MK(fms_conj_second, PREF) (TYPE complex a[restrict N], TYPE
> > > > complex b[restrict N], TYPE complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +      c[i] -= a[i] * conj (b[i]);
> > > > > +}
> > > > > +
> > > > > +__attribute__((noinline,noipa))
> > > > > +void MK(fms_conj_both, PREF) (TYPE complex a[restrict N], TYPE
> > complex
> > > > b[restrict N], TYPE complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +      c[i] -= conj (a[i]) * conj (b[i]);
> > > > > +}
> > > > > +
> > > > > +
> > > > > +// ----- MUL
> > > > > +
> > > > > +// Complex MUL instructions rotating the result
> > > > > +
> > > > > +__attribute__((noinline,noipa))
> > > > > +void MK(mul0, PREF) (TYPE complex a[restrict N], TYPE complex
> > b[restrict
> > > > N], TYPE complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +      c[i] = a[i] * b[i];
> > > > > +}
> > > > > +
> > > > > +__attribute__((noinline,noipa))
> > > > > +void MK(mul90, PREF) (TYPE complex a[restrict N], TYPE complex
> > > > b[restrict N], TYPE complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +      c[i] = a[i] * b[i] * I;
> > > > > +}
> > > > > +
> > > > > +__attribute__((noinline,noipa))
> > > > > +void MK(mul180, PREF) (TYPE complex a[restrict N], TYPE complex
> > > > b[restrict N], TYPE complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +      c[i] = a[i] * b[i] * I * I;
> > > > > +}
> > > > > +
> > > > > +__attribute__((noinline,noipa))
> > > > > +void MK(mul270, PREF) (TYPE complex a[restrict N], TYPE complex
> > > > b[restrict N], TYPE complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +      c[i] = a[i] * b[i] * I * I * I;
> > > > > +}
> > > > > +
> > > > > +// Complex MUL instructions rotating the second parameter.
> > > > > +
> > > > > +__attribute__((noinline,noipa))
> > > > > +void MK(mul0_snd, PREF) (TYPE complex a[restrict N], TYPE complex
> > > > b[restrict N], TYPE complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +      c[i] = a[i] * b[i];
> > > > > +}
> > > > > +
> > > > > +__attribute__((noinline,noipa))
> > > > > +void MK(mul90_snd, PREF) (TYPE complex a[restrict N], TYPE complex
> > > > b[restrict N], TYPE complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +      c[i] = a[i] * (b[i] * I);
> > > > > +}
> > > > > +
> > > > > +__attribute__((noinline,noipa))
> > > > > +void MK(mul180_snd, PREF) (TYPE complex a[restrict N], TYPE
> > complex
> > > > b[restrict N], TYPE complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +      c[i] = a[i] * (b[i] * I * I);
> > > > > +}
> > > > > +
> > > > > +__attribute__((noinline,noipa))
> > > > > +void MK(mul270_snd, PREF) (TYPE complex a[restrict N], TYPE
> > complex
> > > > b[restrict N], TYPE complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +      c[i] = a[i] * (b[i] * I * I * I);
> > > > > +}
> > > > > +
> > > > > +// Complex FMS instructions with conjucated values.
> > > > > +
> > > > > +__attribute__((noinline,noipa))
> > > > > +void MK(mul_conj_first, PREF) (TYPE complex a[restrict N], TYPE
> > complex
> > > > b[restrict N], TYPE complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +      c[i] = conj (a[i]) * b[i];
> > > > > +}
> > > > > +
> > > > > +__attribute__((noinline,noipa))
> > > > > +void MK(mul_conj_second, PREF) (TYPE complex a[restrict N], TYPE
> > > > complex b[restrict N], TYPE complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +      c[i] = a[i] * conj (b[i]);
> > > > > +}
> > > > > +
> > > > > +__attribute__((noinline,noipa))
> > > > > +void MK(mul_conj_both, PREF) (TYPE complex a[restrict N], TYPE
> > complex
> > > > b[restrict N], TYPE complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +      c[i] = conj (a[i]) * conj (b[i]);
> > > > > +}
> > > > > +
> > > > > +
> > > > > +// ----- ADD
> > > > > +
> > > > > +// Complex ADD instructions rotating the result
> > > > > +
> > > > > +__attribute__((noinline,noipa))
> > > > > +void MK(add0, PREF) (TYPE complex a[restrict N], TYPE complex
> > b[restrict
> > > > N], TYPE complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +      c[i] = a[i] + b[i];
> > > > > +}
> > > > > +
> > > > > +__attribute__((noinline,noipa))
> > > > > +void MK(add90, PREF) (TYPE complex a[restrict N], TYPE complex
> > > > b[restrict N], TYPE complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +      c[i] = (a[i] + b[i]) * I;
> > > > > +}
> > > > > +
> > > > > +__attribute__((noinline,noipa))
> > > > > +void MK(add180, PREF) (TYPE complex a[restrict N], TYPE complex
> > > > b[restrict N], TYPE complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +      c[i] = (a[i] + b[i]) * I * I;
> > > > > +}
> > > > > +
> > > > > +__attribute__((noinline,noipa))
> > > > > +void MK(add270, PREF) (TYPE complex a[restrict N], TYPE complex
> > > > b[restrict N], TYPE complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +      c[i] = (a[i] + b[i]) * I * I * I;
> > > > > +}
> > > > > +
> > > > > +// Complex ADD instructions rotating the second parameter.
> > > > > +
> > > > > +__attribute__((noinline,noipa))
> > > > > +void MK(add0_snd, PREF) (TYPE complex a[restrict N], TYPE complex
> > > > b[restrict N], TYPE complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +      c[i] = a[i] + b[i];
> > > > > +}
> > > > > +
> > > > > +__attribute__((noinline,noipa))
> > > > > +void MK(add90_snd, PREF) (TYPE complex a[restrict N], TYPE complex
> > > > b[restrict N], TYPE complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +      c[i] = a[i] + (b[i] * I);
> > > > > +}
> > > > > +
> > > > > +__attribute__((noinline,noipa))
> > > > > +void MK(add180_snd, PREF) (TYPE complex a[restrict N], TYPE
> > complex
> > > > b[restrict N], TYPE complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +      c[i] = a[i] + (b[i] * I * I);
> > > > > +}
> > > > > +
> > > > > +__attribute__((noinline,noipa))
> > > > > +void MK(add270_snd, PREF) (TYPE complex a[restrict N], TYPE
> > complex
> > > > b[restrict N], TYPE complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +      c[i] = a[i] + (b[i] * I * I * I);
> > > > > +}
> > > > > +
> > > > > +// Complex ADD instructions with conjucated values.
> > > > > +
> > > > > +__attribute__((noinline,noipa))
> > > > > +void MK(add_conj_first, PREF) (TYPE complex a[restrict N], TYPE
> > complex
> > > > b[restrict N], TYPE complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +      c[i] = conj (a[i]) + b[i];
> > > > > +}
> > > > > +
> > > > > +__attribute__((noinline,noipa))
> > > > > +void MK(add_conj_second, PREF) (TYPE complex a[restrict N], TYPE
> > > > complex b[restrict N], TYPE complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +      c[i] = a[i] + conj (b[i]);
> > > > > +}
> > > > > +
> > > > > +__attribute__((noinline,noipa))
> > > > > +void MK(add_conj_both, PREF) (TYPE complex a[restrict N], TYPE
> > complex
> > > > b[restrict N], TYPE complex c[restrict N])
> > > > > +{
> > > > > +  for (int i=0; i < N; i++)
> > > > > +      c[i] = conj (a[i]) + conj (b[i]);
> > > > > +}
> > > > > +
> > > > > +
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-
> > complex-
> > > > add-double.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-
> > > > complex-add-double.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..b5c252b176c7c21c9484574edc
> > > > 9a56d9d142e13c
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-
> > add-
> > > > double.c
> > > > > @@ -0,0 +1,8 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_complex_add_double } */
> > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > +
> > > > > +#define TYPE double
> > > > > +#define N 16
> > > > > +#include "complex-add-template.c"
> > > > > \ No newline at end of file
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-
> > complex-
> > > > add-float.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-
> > complex-
> > > > add-float.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..1a08e00bcede874d6acac9e2e
> > > > bece5851c583530
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-
> > add-
> > > > float.c
> > > > > @@ -0,0 +1,8 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_complex_add_float } */
> > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > +
> > > > > +#define TYPE float
> > > > > +#define N 16
> > > > > +#include "complex-add-template.c"
> > > > > \ No newline at end of file
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-
> > complex-
> > > > add-half-float.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-
> > > > complex-add-half-float.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..e4d5c55c0a88f4ac8d45262ee1
> > > > 3632443318931f
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-
> > add-
> > > > half-float.c
> > > > > @@ -0,0 +1,8 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_complex_add_half } */
> > > > > +/* { dg-add-options arm_v8_3a_fp16_complex_neon } */
> > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > +
> > > > > +#define TYPE _Float16
> > > > > +#define N 16
> > > > > +#include "complex-add-template.c"
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-
> > complex-
> > > > add-pattern-double.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-
> > bb-
> > > > slp-complex-add-pattern-double.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..6dd3f98a7a52b21f0365cd6c43
> > > > 94b20927a6a320
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-
> > add-
> > > > pattern-double.c
> > > > > @@ -0,0 +1,8 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_complex_add_double } */
> > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > +
> > > > > +#define TYPE double
> > > > > +#define N 16
> > > > > +#include "complex-add-pattern-template.c"
> > > > > \ No newline at end of file
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-
> > complex-
> > > > add-pattern-float.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-
> > slp-
> > > > complex-add-pattern-float.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..3d02cd455340e9510ae536d8d
> > > > 109b39f811743f0
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-
> > add-
> > > > pattern-float.c
> > > > > @@ -0,0 +1,8 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_complex_add_float } */
> > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > +
> > > > > +#define TYPE float
> > > > > +#define N 16
> > > > > +#include "complex-add-pattern-template.c"
> > > > > \ No newline at end of file
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-
> > complex-
> > > > add-pattern-half-float.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-
> > bb-
> > > > slp-complex-add-pattern-half-float.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..51dcd2724f51cb2d91f0aa234a
> > > > bc39f92275aa42
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp-complex-
> > add-
> > > > pattern-half-float.c
> > > > > @@ -0,0 +1,8 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_complex_add_half } */
> > > > > +/* { dg-add-options arm_v8_3a_fp16_complex_neon } */
> > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > +
> > > > > +#define TYPE _Float16
> > > > > +#define N 16
> > > > > +#include "complex-add-pattern-template.c"
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-
> > > > double.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-
> > > > double.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..606b8992b4890e4e221315776
> > > > 1bfac62f72aa40e
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-
> > > > double.c
> > > > > @@ -0,0 +1,8 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_complex_add_double } */
> > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > +
> > > > > +#define TYPE double
> > > > > +#define N 200
> > > > > +#include "complex-add-template.c"
> > > > > \ No newline at end of file
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-
> > > > float.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-
> > float.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..5c640f0b14107b7cb8ad153597
> > > > 5d266e00b1d1b2
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-
> > float.c
> > > > > @@ -0,0 +1,8 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_complex_add_float } */
> > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > +
> > > > > +#define TYPE float
> > > > > +#define N 200
> > > > > +#include "complex-add-template.c"
> > > > > \ No newline at end of file
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-
> > > > half-float.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-
> > add-
> > > > half-float.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..6111356cbd4a9c86a9356bf674
> > > > 70512db44cfed2
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-
> > half-
> > > > float.c
> > > > > @@ -0,0 +1,8 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_complex_add_half } */
> > > > > +/* { dg-add-options arm_v8_3a_fp16_complex_neon } */
> > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > +
> > > > > +#define TYPE _Float16
> > > > > +#define N 200
> > > > > +#include "complex-add-template.c"
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-
> > > > pattern-double.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-
> > complex-
> > > > add-pattern-double.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..00f383d8cfddd1176cf4894ac7f
> > > > d4d0ae9bcb297
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-
> > > > pattern-double.c
> > > > > @@ -0,0 +1,8 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_complex_add_double } */
> > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > +
> > > > > +#define TYPE double
> > > > > +#define N 200
> > > > > +#include "complex-add-pattern-template.c"
> > > > > \ No newline at end of file
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-
> > > > pattern-float.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-
> > > > add-pattern-float.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..ed108b14a3b704819a3c425b4
> > > > d19d1103aeb432d
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-
> > > > pattern-float.c
> > > > > @@ -0,0 +1,8 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_complex_add_float } */
> > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > +
> > > > > +#define TYPE float
> > > > > +#define N 200
> > > > > +#include "complex-add-pattern-template.c"
> > > > > \ No newline at end of file
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-
> > > > pattern-half-float.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-
> > > > complex-add-pattern-half-float.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..aa239445a6563ea0ee15751a7
> > > > f6a989fb1c9d9a7
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex-add-
> > > > pattern-half-float.c
> > > > > @@ -0,0 +1,8 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_complex_add_half } */
> > > > > +/* { dg-add-options arm_v8_3a_fp16_complex_neon } */
> > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > +
> > > > > +#define TYPE _Float16
> > > > > +#define N 200
> > > > > +#include "complex-add-pattern-template.c"
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-
> > byte.c
> > > > b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-byte.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..4001f689671e0973b64665e6b
> > > > 9ea96c755277fae
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-byte.c
> > > > > @@ -0,0 +1,9 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_complex_add_byte } */
> > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > +
> > > > > +#define TYPE int8_t
> > > > > +#define N 200
> > > > > +#include <stdint.h>
> > > > > +#include "complex-add-template.c"
> > > > > \ No newline at end of file
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-int.c
> > > > b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-int.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..1f006556af09027f22cefe12947
> > > > 5bd7e977054a0
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-int.c
> > > > > @@ -0,0 +1,9 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_complex_add_int } */
> > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > +
> > > > > +#define TYPE int32_t
> > > > > +#define N 200
> > > > > +#include <stdint.h>
> > > > > +#include "complex-add-template.c"
> > > > > \ No newline at end of file
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-long.c
> > > > b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-long.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..1e82657abf8316228e13651d1
> > > > 11b7d256d0f266f
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-long.c
> > > > > @@ -0,0 +1,9 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_complex_add_long } */
> > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > +
> > > > > +#define TYPE int64_t
> > > > > +#define N 200
> > > > > +#include <stdint.h>
> > > > > +#include "complex-add-template.c"
> > > > > \ No newline at end of file
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-
> > pattern-
> > > > byte.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-
> > > > byte.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..db72e147c9dc4511fb46a0366
> > > > 79b7ba77b97ffe3
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-
> > > > byte.c
> > > > > @@ -0,0 +1,9 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_complex_add_byte } */
> > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > +
> > > > > +#define TYPE int8_t
> > > > > +#define N 200
> > > > > +#include <stdint.h>
> > > > > +#include "complex-add-pattern-template.c"
> > > > > \ No newline at end of file
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-
> > pattern-
> > > > int.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-
> > int.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..8d350d69ae0eefba073aba8ae
> > > > 7b3da4b39c845df
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-
> > int.c
> > > > > @@ -0,0 +1,9 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_complex_add_int } */
> > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > +
> > > > > +#define TYPE int32_t
> > > > > +#define N 200
> > > > > +#include <stdint.h>
> > > > > +#include "complex-add-pattern-template.c"
> > > > > \ No newline at end of file
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-
> > pattern-
> > > > long.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-
> > > > long.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..c8e56cd4f91bc6254a5fb2177b
> > > > 1f2484859bcf98
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-
> > > > long.c
> > > > > @@ -0,0 +1,9 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_complex_add_long } */
> > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > +
> > > > > +#define TYPE int64_t
> > > > > +#define N 200
> > > > > +#include <stdint.h>
> > > > > +#include "complex-add-pattern-template.c"
> > > > > \ No newline at end of file
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-
> > pattern-
> > > > short.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-
> > > > short.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..2c54d756c9b2f54352d6dba97c
> > > > cf05d37865cbaa
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-
> > > > short.c
> > > > > @@ -0,0 +1,9 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_complex_add_short } */
> > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > +
> > > > > +#define TYPE int16_t
> > > > > +#define N 200
> > > > > +#include <stdint.h>
> > > > > +#include "complex-add-pattern-template.c"
> > > > > \ No newline at end of file
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-
> > pattern-
> > > > unsigned-byte.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-
> > add-
> > > > pattern-unsigned-byte.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..f54b903aa308a5dc68654b9ffd
> > > > 0a0c230f58e4cc
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-
> > > > unsigned-byte.c
> > > > > @@ -0,0 +1,9 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_complex_add_byte } */
> > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > +
> > > > > +#define TYPE uint8_t
> > > > > +#define N 200
> > > > > +#include <stdint.h>
> > > > > +#include "complex-add-pattern-template.c"
> > > > > \ No newline at end of file
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-
> > pattern-
> > > > unsigned-int.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-
> > > > pattern-unsigned-int.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..96824f16b821236f5499dcb904
> > > > 54e72a1326df5c
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-
> > > > unsigned-int.c
> > > > > @@ -0,0 +1,9 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_complex_add_int } */
> > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > +
> > > > > +#define TYPE uint32_t
> > > > > +#define N 200
> > > > > +#include <stdint.h>
> > > > > +#include "complex-add-pattern-template.c"
> > > > > \ No newline at end of file
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-
> > pattern-
> > > > unsigned-long.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-
> > add-
> > > > pattern-unsigned-long.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..8bd9f077b233eaf6e0c4ff4df9
> > > > b97c109df7d002
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-
> > > > unsigned-long.c
> > > > > @@ -0,0 +1,9 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_complex_add_long } */
> > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > +
> > > > > +#define TYPE uint64_t
> > > > > +#define N 200
> > > > > +#include <stdint.h>
> > > > > +#include "complex-add-pattern-template.c"
> > > > > \ No newline at end of file
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-
> > pattern-
> > > > unsigned-short.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-
> > add-
> > > > pattern-unsigned-short.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..7e5154d73703512dceda39e37
> > > > f0ebd0eb7c2e057
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-pattern-
> > > > unsigned-short.c
> > > > > @@ -0,0 +1,9 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_complex_add_short } */
> > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > +
> > > > > +#define TYPE uint16_t
> > > > > +#define N 200
> > > > > +#include <stdint.h>
> > > > > +#include "complex-add-pattern-template.c"
> > > > > \ No newline at end of file
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-
> > short.c
> > > > b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-short.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..ca0d618b991255f3ba34ee40f
> > > > b876fd053e8121b
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-short.c
> > > > > @@ -0,0 +1,9 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_complex_add_short } */
> > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > +
> > > > > +#define TYPE int16_t
> > > > > +#define N 200
> > > > > +#include <stdint.h>
> > > > > +#include "complex-add-template.c"
> > > > > \ No newline at end of file
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-
> > > > unsigned-byte.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-
> > add-
> > > > unsigned-byte.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..925cfc2ea27b0d4ffbdadfb86a
> > > > bc5c198f57469d
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-unsigned-
> > > > byte.c
> > > > > @@ -0,0 +1,9 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_complex_add_byte } */
> > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > +
> > > > > +#define TYPE uint8_t
> > > > > +#define N 200
> > > > > +#include <stdint.h>
> > > > > +#include "complex-add-template.c"
> > > > > \ No newline at end of file
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-
> > > > unsigned-int.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-
> > > > unsigned-int.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..6a70c6ebf0586c11a17cb1ad2c
> > > > add0d5927c2aca
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-unsigned-
> > > > int.c
> > > > > @@ -0,0 +1,9 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_complex_add_int } */
> > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > +
> > > > > +#define TYPE uint32_t
> > > > > +#define N 200
> > > > > +#include <stdint.h>
> > > > > +#include "complex-add-template.c"
> > > > > \ No newline at end of file
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-
> > > > unsigned-long.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-
> > add-
> > > > unsigned-long.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..084080aeb4386bf41b0e23d0c
> > > > 684917b2b0435d1
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-unsigned-
> > > > long.c
> > > > > @@ -0,0 +1,9 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_complex_add_long } */
> > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > +
> > > > > +#define TYPE uint64_t
> > > > > +#define N 200
> > > > > +#include <stdint.h>
> > > > > +#include "complex-add-template.c"
> > > > > \ No newline at end of file
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-
> > > > unsigned-short.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-
> > add-
> > > > unsigned-short.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..1379608a60310fd26b18e3db2
> > > > b6294c28bf5bf2e
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-unsigned-
> > > > short.c
> > > > > @@ -0,0 +1,9 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-require-effective-target vect_complex_add_short } */
> > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */
> > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */
> > > > > +
> > > > > +#define TYPE uint16_t
> > > > > +#define N 200
> > > > > +#include <stdint.h>
> > > > > +#include "complex-add-template.c"
> > > > > \ No newline at end of file
> > > > > diff --git a/gcc/testsuite/lib/target-supports.exp
> > b/gcc/testsuite/lib/target-
> > > > supports.exp
> > > > > index
> > > >
> > 22acda2a74fdfa51aebbc311d5cc84763b0ffc63..baa5e4a569263edda2125bd8ac
> > > > a6f5b19bbad783 100644
> > > > > --- a/gcc/testsuite/lib/target-supports.exp
> > > > > +++ b/gcc/testsuite/lib/target-supports.exp
> > > > > @@ -3355,7 +3355,102 @@ proc check_effective_target_vect_int { } {
> > > > >  	}}]
> > > > >  }
> > > > >
> > > > > -# Return 1 if the target supports signed int->float conversion
> > > > > +# Return 1 if the target supports hardware vectorization of complex
> > > > additions of
> > > > > +# byte, 0 otherwise.
> > > > > +#
> > > > > +# This won't change for different subtargets so cache the result.
> > > > > +
> > > > > +proc check_effective_target_vect_complex_add_byte { } {
> > > > > +    return [check_cached_effective_target_indexed
> > > > vect_complex_add_byte {
> > > > > +      expr {
> > > > > +	 [check_effective_target_aarch64_sve2]
> > > > > +	 || [check_effective_target_arm_v8_1m_mve_fp_ok]
> > > > > +	}}]
> > > > > +}
> > > > > +
> > > > > +# Return 1 if the target supports hardware vectorization of complex
> > > > additions of
> > > > > +# short, 0 otherwise.
> > > > > +#
> > > > > +# This won't change for different subtargets so cache the result.
> > > > > +
> > > > > +proc check_effective_target_vect_complex_add_short { } {
> > > > > +    return [check_cached_effective_target_indexed
> > > > vect_complex_add_short {
> > > > > +      expr {
> > > > > +	 [check_effective_target_aarch64_sve2]
> > > > > +	 || [check_effective_target_arm_v8_1m_mve_fp_ok]
> > > > > +	}}]
> > > > > +}
> > > > > +
> > > > > +# Return 1 if the target supports hardware vectorization of complex
> > > > additions of
> > > > > +# int, 0 otherwise.
> > > > > +#
> > > > > +# This won't change for different subtargets so cache the result.
> > > > > +
> > > > > +proc check_effective_target_vect_complex_add_int { } {
> > > > > +    return [check_cached_effective_target_indexed
> > > > vect_complex_add_int {
> > > > > +      expr {
> > > > > +	 [check_effective_target_aarch64_sve2]
> > > > > +	 || [check_effective_target_arm_v8_1m_mve_fp_ok]
> > > > > +	}}]
> > > > > +}
> > > > > +
> > > > > +# Return 1 if the target supports hardware vectorization of complex
> > > > additions of
> > > > > +# long, 0 otherwise.
> > > > > +#
> > > > > +# This won't change for different subtargets so cache the result.
> > > > > +
> > > > > +proc check_effective_target_vect_complex_add_long { } {
> > > > > +    return [check_cached_effective_target_indexed
> > > > vect_complex_add_long {
> > > > > +      expr {
> > > > > +	 [check_effective_target_aarch64_sve2]
> > > > > +	 || [check_effective_target_arm_v8_1m_mve_fp_ok]
> > > > > +	}}]
> > > > > +}
> > > > > +
> > > > > +# Return 1 if the target supports hardware vectorization of complex
> > > > additions of
> > > > > +# half, 0 otherwise.
> > > > > +#
> > > > > +# This won't change for different subtargets so cache the result.
> > > > > +
> > > > > +proc check_effective_target_vect_complex_add_half { } {
> > > > > +    return [check_cached_effective_target_indexed
> > > > vect_complex_add_half {
> > > > > +      expr {
> > > > > +	 [check_effective_target_arm_v8_3a_complex_neon_ok
> > > > > +	  && check_effective_target_arm_v8_2a_fp16_neon_ok]
> > > > > +	 || [check_effective_target_aarch64_sve2]
> > > > > +	 || [check_effective_target_arm_v8_1m_mve_fp_ok]
> > > > > +	}}]
> > > > > +}
> > > > > +
> > > > > +# Return 1 if the target supports hardware vectorization of complex
> > > > additions of
> > > > > +# float, 0 otherwise.
> > > > > +#
> > > > > +# This won't change for different subtargets so cache the result.
> > > > > +
> > > > > +proc check_effective_target_vect_complex_add_float { } {
> > > > > +    return [check_cached_effective_target_indexed
> > > > vect_complex_add_float {
> > > > > +      expr {
> > > > > +	 [check_effective_target_arm_v8_3a_complex_neon_ok]
> > > > > +	 || [check_effective_target_aarch64_sve2]
> > > > > +	 || [check_effective_target_arm_v8_1m_mve_fp_ok]
> > > > > +	}}]
> > > > > +}
> > > > > +
> > > > > +# Return 1 if the target supports hardware vectorization of complex
> > > > additions of
> > > > > +# double, 0 otherwise.
> > > > > +#
> > > > > +# This won't change for different subtargets so cache the result.
> > > > > +
> > > > > +proc check_effective_target_vect_complex_add_double { } {
> > > > > +    return [check_cached_effective_target_indexed
> > > > vect_complex_add_double {
> > > > > +      expr {
> > > > > +	 [check_effective_target_arm_v8_3a_complex_neon_ok]
> > > > > +	 || [check_effective_target_aarch64_sve2]
> > > > > +	 || [check_effective_target_arm_v8_1m_mve_fp_ok]
> > > > > +	}}]
> > > > > +}
> > > > > +
> > > > > +# Return 1 if the target supports signed int->float conversion
> > > > >  #
> > > > >
> > > > >  proc check_effective_target_vect_intfloat_cvt { } {
> > > > > @@ -10367,7 +10462,7 @@ proc
> > > > check_effective_target_arm_v8_3a_complex_neon_ok_nocache { } {
> > > > >      set et_arm_v8_3a_complex_neon_flags ""
> > > > >
> > > > >      if { ![istarget arm*-*-*] && ![istarget aarch64*-*-*] } {
> > > > > -        return 0;
> > > > > +        return 1;
> > > > >      }
> > > > >
> > > > >      # Iterate through sets of options to find the compiler flags that
> > > > > @@ -10380,11 +10475,11 @@ proc
> > > > check_effective_target_arm_v8_3a_complex_neon_ok_nocache { } {
> > > > >              #endif
> > > > >          } "$flags -march=armv8.3-a"] } {
> > > > >              set et_arm_v8_3a_complex_neon_flags "$flags -
> > march=armv8.3-a"
> > > > > -            return 1
> > > > > +            return 0;
> > > > >          }
> > > > >      }
> > > > >
> > > > > -    return 0;
> > > > > +    return 1;
> > > > >  }
> > > > >
> > > > >  proc check_effective_target_arm_v8_3a_complex_neon_ok { } {
> > > > > @@ -10400,13 +10495,57 @@ proc
> > > > add_options_for_arm_v8_3a_complex_neon { flags } {
> > > > >      return "$flags $et_arm_v8_3a_complex_neon_flags"
> > > > >  }
> > > > >
> > > > > +# Return 1 if the target supports ARMv8.3 Adv.SIMD + FP16 Complex
> > > > instructions
> > > > > +# instructions, 0 otherwise.  The test is valid for ARM and for AArch64.
> > > > > +# Record the command line options needed.
> > > > > +
> > > > > +proc
> > > > check_effective_target_arm_v8_3a_fp16_complex_neon_ok_nocache { }
> > {
> > > > > +    global et_arm_v8_3a_fp16_complex_neon_flags
> > > > > +    set et_arm_v8_3a_fp16_complex_neon_flags ""
> > > > > +
> > > > > +    if { ![istarget arm*-*-*] && ![istarget aarch64*-*-*] } {
> > > > > +        return 1;
> > > > > +    }
> > > > > +
> > > > > +    # Iterate through sets of options to find the compiler flags that
> > > > > +    # need to be added to the -march option.
> > > > > +    foreach flags {"" "-mfloat-abi=softfp -mfpu=auto" "-mfloat-abi=hard
> > -
> > > > mfpu=auto"} {
> > > > > +        if { [check_no_compiler_messages_nocache \
> > > > > +                  arm_v8_3a_fp16_complex_neon_ok object {
> > > > > +            #if !defined (__ARM_FEATURE_COMPLEX)
> > > > > +            #error "__ARM_FEATURE_COMPLEX not defined"
> > > > > +            #endif
> > > > > +        } "$flags -march=armv8.3-a+fp16"] } {
> > > > > +            set et_arm_v8_3a_fp16_complex_neon_flags \
> > > > > +			"$flags -march=armv8.3-a+fp16"
> > > > > +            return 0;
> > > > > +        }
> > > > > +    }
> > > > > +
> > > > > +    return 1;
> > > > > +}
> > > > > +
> > > > > +proc check_effective_target_arm_v8_3a_fp16_complex_neon_ok { }
> > {
> > > > > +    return [check_cached_effective_target
> > > > arm_v8_3a_fp16_complex_neon_ok \
> > > > > +
> > > > check_effective_target_arm_v8_3a_fp16_complex_neon_ok_nocache]
> > > > > +}
> > > > > +
> > > > > +proc add_options_for_arm_v8_3a_fp16_complex_neon { flags } {
> > > > > +    if { ! [check_effective_target_arm_v8_3a_fp16_complex_neon_ok] }
> > {
> > > > > +        return "$flags"
> > > > > +    }
> > > > > +    global et_arm_v8_3a_fp16_complex_neon_flags
> > > > > +    return "$flags $et_arm_v8_3a_fp16_complex_neon_flags"
> > > > > +}
> > > > > +
> > > > > +
> > > > >  # Return 1 if the target supports executing AdvSIMD instructions from
> > > > ARMv8.3
> > > > >  # with the complex instruction extension, 0 otherwise.  The test is valid
> > for
> > > > >  # ARM and for AArch64.
> > > > >
> > > > >  proc check_effective_target_arm_v8_3a_complex_neon_hw { } {
> > > > >      if { ![check_effective_target_arm_v8_3a_complex_neon_ok] } {
> > > > > -        return 0;
> > > > > +        return 1;
> > > > >      }
> > > > >      return [check_runtime arm_v8_3a_complex_neon_hw_available {
> > > > >          #include "arm_neon.h"
> > > > > @@ -10431,7 +10570,7 @@ proc
> > > > check_effective_target_arm_v8_3a_complex_neon_hw { } {
> > > > >                 : /* No clobbers.  */);
> > > > >            #endif
> > > > >
> > > > > -          return (results[0] == 8 && results[1] == 24) ? 1 : 0;
> > > > > +          return (results[0] == 8 && results[1] == 24) ? 0 : 1;
> > > > >          }
> > > > >      } [add_options_for_arm_v8_3a_complex_neon ""]]
> > > > >  }
> > > > > diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..aeb402289277c4bb48b62b7e9
> > > > e074850a99d3182
> > > > > --- /dev/null
> > > > > +++ b/gcc/tree-vect-slp-patterns.c
> > > > > @@ -0,0 +1,739 @@
> > > > > +/* SLP - Pattern matcher on SLP trees
> > > > > +   Copyright (C) 2020 Free Software Foundation, Inc.
> > > > > +
> > > > > +This file is part of GCC.
> > > > > +
> > > > > +GCC is free software; you can redistribute it and/or modify it under
> > > > > +the terms of the GNU General Public License as published by the Free
> > > > > +Software Foundation; either version 3, or (at your option) any later
> > > > > +version.
> > > > > +
> > > > > +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> > > > > +WARRANTY; without even the implied warranty of MERCHANTABILITY
> > or
> > > > > +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> > > > License
> > > > > +for more details.
> > > > > +
> > > > > +You should have received a copy of the GNU General Public License
> > > > > +along with GCC; see the file COPYING3.  If not see
> > > > > +<http://www.gnu.org/licenses/>.  */
> > > > > +
> > > > > +#include "config.h"
> > > > > +#include "system.h"
> > > > > +#include "coretypes.h"
> > > > > +#include "backend.h"
> > > > > +#include "target.h"
> > > > > +#include "rtl.h"
> > > > > +#include "tree.h"
> > > > > +#include "gimple.h"
> > > > > +#include "tree-pass.h"
> > > > > +#include "ssa.h"
> > > > > +#include "optabs-tree.h"
> > > > > +#include "insn-config.h"
> > > > > +#include "recog.h"		/* FIXME: for insn_data */
> > > > > +#include "fold-const.h"
> > > > > +#include "stor-layout.h"
> > > > > +#include "gimple-iterator.h"
> > > > > +#include "cfgloop.h"
> > > > > +#include "tree-vectorizer.h"
> > > > > +#include "langhooks.h"
> > > > > +#include "gimple-walk.h"
> > > > > +#include "dbgcnt.h"
> > > > > +#include "tree-vector-builder.h"
> > > > > +#include "vec-perm-indices.h"
> > > > > +#include "gimple-fold.h"
> > > > > +#include "internal-fn.h"
> > > > > +
> > > > > +/* SLP Pattern matching mechanism.
> > > > > +
> > > > > +  This extension to the SLP vectorizer allows one to transform the
> > > > generated SLP
> > > > > +  tree based on any pattern.  The difference between this and the
> > normal
> > > > vect
> > > > > +  pattern matcher is that unlike the former, this matcher allows you to
> > > > match
> > > > > +  with instructions that do not belong to the same SSA dominator
> > graph.
> > > > > +
> > > > > +  The only requirement that this pattern matcher has is that you are
> > only
> > > > > +  only allowed to either match an entire group or none.
> > > > > +
> > > > > +  The pattern matcher currently only allows you to perform
> > replacements
> > > > to
> > > > > +  internal functions.
> > > > > +
> > > > > +  Once the patterns are matched it is one way, these cannot be
> > undone.  It
> > > > is
> > > > > +  currently not supported to match patterns recursively.
> > > > > +
> > > > > +  To add a new pattern, implement the vect_pattern class and add the
> > > > type to
> > > > > +  slp_patterns.
> > > > > +
> > > > > +*/
> > > > > +
> > > > >
> > > >
> > +/*********************************************************
> > > > **********************
> > > > > + * vect_pattern class
> > > > > +
> > > >
> > **********************************************************
> > > > ********************/
> > > > > +
> > > > > +/* Default implementation of recognize that peforms matching,
> > validation
> > > > and
> > > > > +   replacement of nodes but that can be overriden if required.  */
> > > > > +
> > > > > +static bool
> > > > > +vect_pattern_validate_optab (internal_fn ifn, slp_tree node)
> > > > > +{
> > > > > +  tree vectype = SLP_TREE_VECTYPE (node);
> > > > > +  if (ifn == IFN_LAST || !vectype)
> > > > > +    return false;
> > > > > +
> > > > > +  if (dump_enabled_p ())
> > > > > +    dump_printf_loc (MSG_NOTE, vect_location,
> > > > > +		     "Found %s pattern in SLP tree\n",
> > > > > +		     internal_fn_name (ifn));
> > > > > +
> > > > > +  if (direct_internal_fn_supported_p (ifn, vectype,
> > > > OPTIMIZE_FOR_SPEED))
> > > > > +    {
> > > > > +      if (dump_enabled_p ())
> > > > > +	dump_printf_loc (MSG_NOTE, vect_location,
> > > > > +			 "Target supports %s vectorization with mode %T\n",
> > > > > +			 internal_fn_name (ifn), vectype);
> > > > > +    }
> > > > > +  else
> > > > > +    {
> > > > > +      if (dump_enabled_p ())
> > > > > +        {
> > > > > +	  if (!vectype)
> > > > > +	    dump_printf_loc (MSG_NOTE, vect_location,
> > > > > +			     "Target does not support vector type for %T\n",
> > > > > +			     SLP_TREE_DEF_TYPE (node));
> > > > > +	  else
> > > > > +	    dump_printf_loc (MSG_NOTE, vect_location,
> > > > > +			     "Target does not support %s for vector type "
> > > > > +			     "%T\n", internal_fn_name (ifn), vectype);
> > > > > +	}
> > > > > +      return false;
> > > > > +    }
> > > > > +  return true;
> > > > > +}
> > > > > +
> > > > >
> > > >
> > +/*********************************************************
> > > > **********************
> > > > > + * General helper types
> > > > > +
> > > >
> > **********************************************************
> > > > ********************/
> > > > > +
> > > > > +/* The COMPLEX_OPERATION enum denotes the possible pair of
> > > > operations that can
> > > > > +   be matched when looking for expressions that we are interested
> > > > matching for
> > > > > +   complex numbers addition and mla.  */
> > > > > +
> > > > > +typedef enum _complex_operation : unsigned {
> > > > > +  PLUS_PLUS,
> > > > > +  MINUS_PLUS,
> > > > > +  PLUS_MINUS,
> > > > > +  MULT_MULT,
> > > > > +  CMPLX_NONE
> > > > > +} complex_operation_t;
> > > > > +
> > > > >
> > > >
> > +/*********************************************************
> > > > **********************
> > > > > + * General helper functions
> > > > > +
> > > >
> > **********************************************************
> > > > ********************/
> > > > > +
> > > > > +/* Helper function of linear_loads_p that checks to see if the load
> > > > permutation
> > > > > +   is sequential and in monotonically increasing order of loads with no
> > gaps.
> > > > > +*/
> > > > > +
> > > > > +static inline bool
> > > > > +is_linear_load_p (load_permutation_t loads)
> > > > > +{
> > > > > +  if (loads.length() == 0)
> > > > > +    return false;
> > > > > +
> > > > > +  unsigned leader = loads[0];
> > > > > +  unsigned load, i;
> > > > > +  FOR_EACH_VEC_ELT_FROM (loads, i, load, 1)
> > > > > +    if (load != ++leader)
> > > > > +      return false;
> > > > > +  return true;
> > > > > +}
> > > > > +
> > > > > +
> > > > > +/* Check to see if all loads rooted in ROOT are linear.  Linearity is
> > > > > +   defined as having no gaps between values loaded.  */
> > > > > +
> > > > > +static load_permutation_t
> > > > > +linear_loads_p (slp_tree_to_load_perm_map_t *perm_cache,
> > slp_tree
> > > > root,
> > > > > +		bool *linear)
> > > > > +{
> > > > > +  *linear = false;
> > > > > +  if (!root)
> > > > > +    return vNULL;
> > > > > +
> > > > > +  unsigned i;
> > > > > +  load_permutation_t loads = vNULL;
> > > > > +  load_permutation_t *tmp;
> > > > > +
> > > > > +  if ((tmp = perm_cache->get (root)) != NULL)
> > > > > +    {
> > > > > +      *linear = is_linear_load_p (*tmp);
> > > > > +      return *tmp;
> > > > > +    }
> > > > > +
> > > > > +  perm_cache->put (root, vNULL);
> > > > > +
> > > > > +  /* If it's a load node, then just read the load permute.  */
> > > > > +  if (SLP_TREE_LOAD_PERMUTATION (root).exists ())
> > > > > +    {
> > > > > +      loads = SLP_TREE_LOAD_PERMUTATION (root);
> > > > > +      perm_cache->put (root, loads);
> > > > > +      if (!is_linear_load_p (loads))
> > > > > +	return loads;
> > > > > +    }
> > > > > +  else if (SLP_TREE_DEF_TYPE (root) == vect_external_def)
> > > > > +    {
> > > > > +       loads.create (SLP_TREE_LANES (root));
> > > > > +       tree op;
> > > > > +       FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_OPS (root), i, op)
> > > > > +	 {
> > > > > +	   if (TREE_CODE (op) != SSA_NAME)
> > > > > +	     return vNULL;
> > > > > +
> > > > > +	   gimple *defstmt = SSA_NAME_DEF_STMT (op);
> > > > > +	   if (!is_gimple_assign (defstmt))
> > > > > +	     return vNULL;
> > > > > +
> > > > > +	   switch (gimple_assign_rhs_code (defstmt))
> > > > > +	   {
> > > > > +	     case IMAGPART_EXPR:
> > > > > +	       loads.safe_push (1);
> > > > > +	       break;
> > > > > +	     case REALPART_EXPR:
> > > > > +	       loads.safe_push (0);
> > > > > +	       break;
> > > > > +	     default:
> > > > > +	       {
> > > > > +		 loads.release ();
> > > > > +		 return vNULL;
> > > > > +	       }
> > > > > +	   }
> > > > > +	 }
> > > > > +
> > > > > +       perm_cache->put (root, loads);
> > > > > +       if (!is_linear_load_p (loads))
> > > > > +	 return loads;
> > > > > +    }
> > > > > +  else if (SLP_TREE_DEF_TYPE (root) != vect_internal_def)
> > > > > +    return vNULL;
> > > > > +
> > > > > +  auto_vec<load_permutation_t> all_loads;
> > > > > +  bool is_perm = SLP_TREE_LANE_PERMUTATION (root).exists ();
> > > > > +
> > > > > +  slp_tree child;
> > > > > +  FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (root), i, child)
> > > > > +    {
> > > > > +      loads = linear_loads_p (perm_cache, child, linear);
> > > > > +      if ((!*linear && !is_perm) || !loads.exists ())
> > > > > +	return loads;
> > > > > +
> > > > > +      all_loads.safe_push (loads);
> > > > > +    }
> > > > > +
> > > > > +  if (is_perm)
> > > > > +    {
> > > > > +      lane_permutation_t perm = SLP_TREE_LANE_PERMUTATION
> > (root);
> > > > > +      load_permutation_t nloads;
> > > > > +      nloads.create (SLP_TREE_LANES (root));
> > > > > +      nloads.quick_grow (SLP_TREE_LANES (root));
> > > > > +      for (i = 0; i < SLP_TREE_LANES (root); i++)
> > > > > +	nloads[i] = all_loads[perm[i].first][perm[i].second];
> > > > > +
> > > > > +      perm_cache->put (root, nloads);
> > > > > +      if (!is_linear_load_p (nloads))
> > > > > +	return nloads;
> > > > > +      loads = nloads;
> > > > > +    }
> > > > > +
> > > > > +  perm_cache->put (root, loads);
> > > > > +  *linear = true;
> > > > > +  return loads;
> > > > > +}
> > > > > +
> > > > > +
> > > > > +/* This function attempts to make a node rooted in NODE with parent
> > > > PARENT
> > > > > +   linear.  If the node if already linear than the node itself is returned
> > > > > +   in RESULT.
> > > > > +
> > > > > +   If the node is not linear then a new VEC_PERM_EXPR node is created
> > > > with a
> > > > > +   lane permute that when applied will make the node linear.   If such a
> > > > > +   permute cannot be created then FALSE is returned from the function.
> > > > > +
> > > > > +   Here linearity is defined as having a sequential, monotically
> > increasing
> > > > > +   load position inside the load permute generated by the loads
> > reachable
> > > > from
> > > > > +   NODE.  */
> > > > > +
> > > > > +static bool
> > > > > +vect_slp_make_linear (slp_tree_to_load_perm_map_t *perm_cache,
> > > > > +		      slp_tree parent, slp_tree node, slp_tree *result)
> > > > > +{
> > > > > +  bool is_linear = false;
> > > > > +  unsigned x, val;
> > > > > +  load_permutation_t load_perm = linear_loads_p (perm_cache, node,
> > > > &is_linear);
> > > > > +  if (is_linear)
> > > > > +    {
> > > > > +      *result = node;
> > > > > +      SLP_TREE_REF_COUNT (node)++;
> > > > > +      return true;
> > > > > +    }
> > > > > +
> > > > > +  /* Attempt to linearise the permute.  */
> > > > > +  vec<std::pair<unsigned, unsigned> > zipped;
> > > > > +  zipped.create (load_perm.length ());
> > > > > +  FOR_EACH_VEC_ELT (load_perm, x, val)
> > > > > +    zipped.quick_push (std::make_pair (val, x));
> > > > > +
> > > > > +  typedef const std::pair<unsigned, unsigned>* cmp_t;
> > > > > +  zipped.qsort ([](const void *a, const void *b) -> int
> > > > > +    { return (int)((cmp_t)a)->first - (int)((cmp_t)b)->first; });
> > > > > +
> > > > > +  /* Verify if we have a linear permute sequence.  */
> > > > > +  if (zipped.length () > 0)
> > > > > +    {
> > > > > +      unsigned leader = zipped[0].first;
> > > > > +      for (x = 1; x < zipped.length (); x++)
> > > > > +	if(!(is_linear = (zipped[x].first == ++leader)))
> > > > > +	  break;
> > > > > +    }
> > > > > +
> > > > > +  if (!is_linear)
> > > > > +    {
> > > > > +      if (dump_enabled_p ())
> > > > > +	dump_printf_loc (MSG_NOTE, vect_location,
> > > > > +			"Loads could not be made linear %p\n",
> > > > > +			node);
> > > > > +      zipped.release ();
> > > > > +      return false;
> > > > > +  }
> > > > > +
> > > > > +  for (x = 0; x < zipped.length (); x++)
> > > > > +    zipped[x].first = 0;
> > > > > +
> > > > > +  /* Create the new permute node and store it instead.  */
> > > > > +  slp_tree vnode = vect_create_new_slp_node (vNULL, 1);
> > > > > +  SLP_TREE_CODE (vnode) = VEC_PERM_EXPR;
> > > > > +  SLP_TREE_LANE_PERMUTATION (vnode) = zipped;
> > > > > +  SLP_TREE_VECTYPE (vnode) = SLP_TREE_VECTYPE (parent);
> > > > > +  SLP_TREE_CHILDREN (vnode).quick_push (node);
> > > > > +  SLP_TREE_REF_COUNT (vnode) = 1;
> > > > > +  SLP_TREE_LANES (vnode) = SLP_TREE_LANES (node);
> > > > > +  SLP_TREE_REPRESENTATIVE (vnode) = SLP_TREE_REPRESENTATIVE
> > > > (parent);
> > > > > +  SLP_TREE_REF_COUNT (node)++;
> > > > > +  *result = vnode;
> > > > > +  return is_linear;
> > > > > +}
> > > > > +
> > > > > +/* Checks to see of the expression represented by NODE is a gimple
> > > > assign with
> > > > > +   code CODE.  */
> > > > > +
> > > > > +static inline bool
> > > > > +vect_match_expression_p (slp_tree node, tree_code code)
> > > > > +{
> > > > > +  if (!node
> > > > > +      || !SLP_TREE_REPRESENTATIVE (node))
> > > > > +    return false;
> > > > > +
> > > > > +  gimple* expr = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE
> > > > (node));
> > > > > +  if (!is_gimple_assign (expr)
> > > > > +      || gimple_assign_rhs_code (expr) != code)
> > > > > +    return false;
> > > > > +
> > > > > +  return true;
> > > > > +}
> > > > > +
> > > > > +/* Check if the given lane permute in PERMUTES matches an
> > alternating
> > > > sequence
> > > > > +   of {P0 P1 P0 P1 ...}.  This to account for unrolled loops.  Further mode
> > > > > +   there resulting permute must be linear.   */
> > > > > +
> > > > > +static inline bool
> > > > > +vect_check_lane_permute (lane_permutation_t &permutes,
> > > > > +			 unsigned p0, unsigned p1)
> > > > > +{
> > > > > +  if (permutes.length () == 0)
> > > > > +    return false;
> > > > > +
> > > > > +  unsigned val[2] = {p0, p1};
> > > > > +  unsigned seed = permutes[0].second;
> > > > > +  for (unsigned i = 0; i < permutes.length (); i++)
> > > > > +    if (permutes[i].first != val[i % 2]
> > > > > +	|| permutes[i].second != seed++)
> > > > > +      return false;
> > > > > +
> > > > > +  return true;
> > > > > +}
> > > > > +
> > > > > +/* This function will match the two gimple expressions representing
> > > > NODE1 and
> > > > > +   NODE2 in parallel and returns the pair operation that represents the
> > two
> > > > > +   expressions in the two statements.
> > > > > +
> > > > > +   If match is successful then the corresponding complex_operation is
> > > > > +   returned and the arguments to the two matched operations are
> > > > returned in OPS.
> > > > > +
> > > > > +   If TWO_OPERANDS it is expected that the LANES of the parent
> > > > VEC_PERM select
> > > > > +   from the two nodes alternatingly.
> > > > > +
> > > > > +   If unsuccessful then CMPLX_NONE is returned and OPS is untouched.
> > > > > +
> > > > > +   e.g. the following gimple statements
> > > > > +
> > > > > +   stmt 0 _39 = _37 + _12;
> > > > > +   stmt 1 _6 = _38 - _36;
> > > > > +
> > > > > +   will return PLUS_MINUS along with OPS containing {_37, _12, _38,
> > _36}.
> > > > > +*/
> > > > > +
> > > > > +static complex_operation_t
> > > > > +vect_detect_pair_op (slp_tree node1, slp_tree node2,
> > > > lane_permutation_t &lanes,
> > > > > +		     bool two_operands = true, vec<slp_tree> *ops = NULL)
> > > > > +{
> > > > > +  complex_operation_t result = CMPLX_NONE;
> > > > > +
> > > > > +  if (vect_match_expression_p (node1, MINUS_EXPR)
> > > > > +      && vect_match_expression_p (node2, PLUS_EXPR)
> > > > > +      && (!two_operands || vect_check_lane_permute (lanes, 0, 1)))
> > > > > +    result = MINUS_PLUS;
> > > > > +  else if (vect_match_expression_p (node1, PLUS_EXPR)
> > > > > +	   && vect_match_expression_p (node2, MINUS_EXPR)
> > > > > +	   && (!two_operands || vect_check_lane_permute (lanes, 0, 1)))
> > > > > +    result = PLUS_MINUS;
> > > > > +  else if (vect_match_expression_p (node1, PLUS_EXPR)
> > > > > +	   && vect_match_expression_p (node2, PLUS_EXPR))
> > > > > +    result = PLUS_PLUS;
> > > > > +  else if (vect_match_expression_p (node1, MULT_EXPR)
> > > > > +	   && vect_match_expression_p (node2, MULT_EXPR))
> > > > > +    result = MULT_MULT;
> > > > > +
> > > > > +  if (result != CMPLX_NONE && ops != NULL)
> > > > > +    {
> > > > > +      ops->create (2);
> > > > > +      ops->quick_push (node1);
> > > > > +      ops->quick_push (node2);
> > > > > +    }
> > > > > +  return result;
> > > > > +}
> > > > > +
> > > > > +/* Overload of vect_detect_pair_op that matches against the
> > > > representative
> > > > > +   statements in the children of NODE.  It is expected that NODE has
> > > > exactly
> > > > > +   two children and when TWO_OPERANDS then NODE must be a
> > > > VEC_PERM.  */
> > > > > +
> > > > > +static complex_operation_t
> > > > > +vect_detect_pair_op (slp_tree node, bool two_operands = true,
> > > > > +		     vec<slp_tree> *ops = NULL)
> > > > > +{
> > > > > +  if (!two_operands && SLP_TREE_CODE (node) == VEC_PERM_EXPR)
> > > > > +    return CMPLX_NONE;
> > > > > +
> > > > > +  if (SLP_TREE_CHILDREN (node).length () != 2)
> > > > > +    return CMPLX_NONE;
> > > > > +
> > > > > +  vec<slp_tree> children = SLP_TREE_CHILDREN (node);
> > > > > +  lane_permutation_t &lanes = SLP_TREE_LANE_PERMUTATION
> > (node);
> > > > > +
> > > > > +  return vect_detect_pair_op (children[0], children[1], lanes,
> > > > two_operands,
> > > > > +			      ops);
> > > > > +}
> > > > > +
> > > > >
> > > >
> > +/*********************************************************
> > > > **********************
> > > > > + * complex_pattern class
> > > > > +
> > > >
> > **********************************************************
> > > > ********************/
> > > > > +
> > > > > +/* SLP Complex Numbers pattern matching.
> > > > > +
> > > > > +  As an example, the following simple loop:
> > > > > +
> > > > > +    double a[restrict N]; double b[restrict N]; double c[restrict N];
> > > > > +
> > > > > +    for (int i=0; i < N; i+=2)
> > > > > +    {
> > > > > +      c[i] = a[i] - b[i+1];
> > > > > +      c[i+1] = a[i+1] + b[i];
> > > > > +    }
> > > > > +
> > > > > +  which represents a complex addition on with a rotation of 90* around
> > > > the
> > > > > +  argand plane. i.e. if `a` and `b` were complex numbers then this
> > would be
> > > > the
> > > > > +  same as `a + (b * I)`.
> > > > > +
> > > > > +  Here the expressions for `c[i]` and `c[i+1]` are independent but have
> > to
> > > > be
> > > > > +  both recognized in order for the pattern to work.  As an SLP tree this
> > is
> > > > > +  represented as
> > > > > +
> > > > > +                +--------------------------------+
> > > > > +                |       stmt 0 *_9 = _10;        |
> > > > > +                |       stmt 1 *_15 = _16;       |
> > > > > +                +--------------------------------+
> > > > > +                                |
> > > > > +                                |
> > > > > +                                v
> > > > > +                +--------------------------------+
> > > > > +                |     stmt 0 _10 = _4 - _8;      |
> > > > > +                |    stmt 1 _16 = _12 + _14;     |
> > > > > +                | lane permutation { 0[0] 1[1] } |
> > > > > +                +--------------------------------+
> > > > > +                            |        |
> > > > > +                            |        |
> > > > > +                            |        |
> > > > > +               +-----+      |        |      +-----+
> > > > > +               |     |      |        |      |     |
> > > > > +         +-----| { } |<-----+        +----->| { } --------+
> > > > > +         |     |     |   +------------------|     |       |
> > > > > +         |     +-----+   |                  +-----+       |
> > > > > +         |        |      |                                |
> > > > > +         |        |      |                                |
> > > > > +         |        +------|------------------+             |
> > > > > +         |               |                  |             |
> > > > > +         v               v                  v             v
> > > > > +     +--------------------------+     +--------------------------------+
> > > > > +     |     stmt 0 _8 = *_7;     |     |        stmt 0 _4 = *_3;        |
> > > > > +     |    stmt 1 _14 = *_13;    |     |       stmt 1 _12 = *_11;       |
> > > > > +     | load permutation { 1 0 } |     |    load permutation { 0 1 }    |
> > > > > +     +--------------------------+     +--------------------------------+
> > > > > +
> > > > > +  The pattern matcher allows you to replace both statements 0 and 1
> > or
> > > > none at
> > > > > +  all.  Because this operation is a two operands operation the actual
> > nodes
> > > > > +  being replaced are those in the { } nodes.  The actual scalar
> > statements
> > > > > +  themselves are not replaced or used during the matching but instead
> > the
> > > > > +  SLP_TREE_REPRESENTATIVE statements are inspected.  You are also
> > > > allowed to
> > > > > +  replace and match on any number of nodes.
> > > > > +
> > > > > +  Because the pattern matcher matches on the representative
> > statement
> > > > for the
> > > > > +  SLP node the case of two_operators it allows you to match the
> > children
> > > > of the
> > > > > +  node.  This is done using the method `recognize ()`.
> > > > > +
> > > > > +*/
> > > > > +
> > > > > +/* The complex_pattern class contains common code for pattern
> > > > matchers that work
> > > > > +   on complex numbers.  These provide functionality to allow de-
> > > > construction and
> > > > > +   validation of sequences depicting/transforming REAL and IMAG pairs.
> > */
> > > > > +
> > > > > +class complex_pattern : public vect_pattern
> > > > > +{
> > > > > +  protected:
> > > > > +    auto_vec<slp_tree> m_workset;
> > > > > +    complex_pattern (slp_tree *node, vec<slp_tree> *m_ops,
> > internal_fn
> > > > ifn)
> > > > > +      : vect_pattern (node, m_ops, ifn)
> > > > > +    {
> > > > > +      this->m_workset.safe_push (*node);
> > > > > +    }
> > > > > +
> > > > > +  public:
> > > > > +    void build (slp_tree_to_load_perm_map_t *, vec_info *);
> > > > > +
> > > > > +    static internal_fn
> > > > > +    matches (complex_operation_t op, slp_tree_to_load_perm_map_t
> > *,
> > > > > +	     vec<slp_tree> *);
> > > > > +};
> > > > > +
> > > > > +/* Create a replacement pattern statement for each node in m_node
> > and
> > > > inserts
> > > > > +   the new statement into m_node as the new representative
> > statement.
> > > > The old
> > > > > +   statement is marked as being in a pattern defined by the new
> > statement.
> > > > The
> > > > > +   statement is created as call to internal function IFN with
> > m_num_args
> > > > > +   arguments.
> > > > > +
> > > > > +   Futhermore the new pattern is also added to the vectorization
> > > > information
> > > > > +   structure VINFO and the old statement STMT_INFO is marked as
> > unused
> > > > while
> > > > > +   the new statement is marked as used and the number of SLP uses of
> > the
> > > > new
> > > > > +   statement is incremented.
> > > > > +
> > > > > +   The newly created SLP nodes are marked as SLP only and will be
> > > > dissolved
> > > > > +   if SLP is aborted.
> > > > > +
> > > > > +   The newly created gimple call is returned and the BB remains
> > unchanged.
> > > > > +
> > > > > +   This default method is designed to only match against simple
> > operands
> > > > where
> > > > > +   all the input and output types are the same.
> > > > > +*/
> > > > > +
> > > > > +void
> > > > > +complex_pattern::build (slp_tree_to_load_perm_map_t
> > *perm_cache,
> > > > > +			vec_info *vinfo)
> > > > > +{
> > > > > +  stmt_vec_info stmt_info;
> > > > > +
> > > > > +  auto_vec<tree> args;
> > > > > +  args.create (this->m_num_args);
> > > > > +  args.quick_grow_cleared (this->m_num_args);
> > > > > +  slp_tree node;
> > > > > +  unsigned ix;
> > > > > +  stmt_vec_info call_stmt_info;
> > > > > +  gcall *call_stmt = NULL;
> > > > > +  auto_vec<slp_tree> nodes;
> > > > > +  slp_tree tmp = NULL;
> > > > > +  node = this->m_ops[0];
> > > > > +
> > > > > +  /* First re-arrange the children.  */
> > > > > +
> > > > > +  FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), ix, tmp)
> > > > > +    {
> > > > > +      slp_tree vnode = NULL;
> > > > > +      if (vect_slp_make_linear (perm_cache, node, tmp, &vnode))
> > > > > +	nodes.safe_push (vnode);
> > > > > +      else
> > > > > +	{
> > > > > +	  FOR_EACH_VEC_ELT (nodes, ix, tmp)
> > > > > +	    vect_free_slp_tree (tmp);
> > > > > +
> > > > > +	  return;
> > > > > +	}
> > > > > +    }
> > > > > +
> > > > > +  FOR_EACH_VEC_ELT (this->m_ops, ix, node)
> > > > > +    vect_free_slp_tree (node);
> > > > > +
> > > > > +  SLP_TREE_CHILDREN (*this->m_node).truncate (0);
> > > > > +  SLP_TREE_CHILDREN (*this->m_node).safe_splice (nodes);
> > > > > +
> > > > > +  /* Now modify the nodes themselves.  */
> > > > > +  FOR_EACH_VEC_ELT (this->m_workset, ix, node)
> > > > > +    {
> > > > > +      /* Calculate the location of the statement in NODE to replace.  */
> > > > > +      stmt_info = SLP_TREE_REPRESENTATIVE (node);
> > > > > +      gimple* old_stmt = STMT_VINFO_STMT (stmt_info);
> > > > > +      tree lhs_old_stmt = gimple_get_lhs (old_stmt);
> > > > > +      tree type = TREE_TYPE (lhs_old_stmt);
> > > > > +
> > > > > +      /* Create the argument set for use by
> > gimple_build_call_internal_vec.
> > > > */
> > > > > +      for (unsigned i = 0; i < this->m_num_args; i++)
> > > > > +	args[i] = lhs_old_stmt;
> > > > > +
> > > > > +      /* Create the new pattern statements.  */
> > > > > +      call_stmt = gimple_build_call_internal_vec (this->m_ifn, args);
> > > > > +      tree var = make_temp_ssa_name (type, call_stmt, "slp_patt");
> > > > > +      gimple_call_set_lhs (call_stmt, var);
> > > > > +      gimple_set_location (call_stmt, gimple_location (old_stmt));
> > > > > +      gimple_call_set_nothrow (call_stmt, true);
> > > > > +
> > > > > +      /* Adjust the book-keeping for the new and old statements for use
> > > > during
> > > > > +	 SLP.  This is required to get the right VF and statement during SLP
> > > > > +	 analysis.  These changes are created after relevancy has been set for
> > > > > +	 the nodes as such we need to manually update them.  Any changes
> > > > will be
> > > > > +	 undone if SLP is cancelled.  */
> > > > > +      call_stmt_info
> > > > > +	= vinfo->add_pattern_stmt (call_stmt, stmt_info);
> > > > > +      STMT_VINFO_RELEVANT (call_stmt_info) = vect_used_in_scope;
> > > > > +
> > > > > +      /* Unfortunately still need this on the new pattern because non-
> > loop
> > > > SLP
> > > > > +	 doesn't call vect_detect_hybrid_slp so it never updates it.  */
> > > > > +      STMT_SLP_TYPE (call_stmt_info) = pure_slp;
> > > > > +
> > > > > +      /* add_pattern_stmt can't be done in vect_mark_pattern_stmts
> > > > because
> > > > > +	 the non-SLP pattern matchers already have added the statement to
> > > > VINFO
> > > > > +	 by the time it is called.  Some of them need to modify the returned
> > > > > +	 stmt_info.  vect_mark_pattern_stmts is called by recog_pattern and
> > > > it
> > > > > +	 would increase the size of each pattern with boilerplate code to
> > > > make
> > > > > +	 the call there.  */
> > > > > +      vect_mark_pattern_stmts (vinfo, stmt_info, call_stmt,
> > > > > +			       SLP_TREE_VECTYPE (node));
> > > > > +
> > > > > +      /* Since we are replacing all the statements in the group with the
> > same
> > > > > +	 thing it doesn't really matter.  So just set it every time a new stmt
> > > > > +	 is created.  */
> > > > > +      SLP_TREE_REPRESENTATIVE (node) = call_stmt_info;
> > > > > +      SLP_TREE_CODE (node) = CALL_EXPR;
> > > > > +    }
> > > > > +}
> > > > > +
> > > > >
> > > >
> > +/*********************************************************
> > > > **********************
> > > > > + * complex_add_pattern class
> > > > > +
> > > >
> > **********************************************************
> > > > ********************/
> > > > > +
> > > > > +class complex_add_pattern : public complex_pattern
> > > > > +{
> > > > > +  protected:
> > > > > +    complex_add_pattern (slp_tree *node, vec<slp_tree> *m_ops,
> > > > internal_fn ifn)
> > > > > +      : complex_pattern (node, m_ops, ifn)
> > > > > +    {
> > > > > +      this->m_num_args = 2;
> > > > > +    }
> > > > > +
> > > > > +  public:
> > > > > +    static internal_fn
> > > > > +    matches (complex_operation_t op, slp_tree_to_load_perm_map_t
> > *,
> > > > > +	     vec<slp_tree> *);
> > > > > +
> > > > > +    static vect_pattern*
> > > > > +    recognize (slp_tree_to_load_perm_map_t *, slp_tree *);
> > > > > +};
> > > > > +
> > > > > +/* Pattern matcher for trying to match complex addition pattern in SLP
> > > > tree.
> > > > > +
> > > > > +   If no match is found then IFN is set to IFN_LAST.
> > > > > +   This function matches the patterns shaped as:
> > > > > +
> > > > > +   c[i] = a[i] - b[i+1];
> > > > > +   c[i+1] = a[i+1] + b[i];
> > > > > +
> > > > > +   If a match occurred then TRUE is returned, else FALSE.  The initial
> > match
> > > > is
> > > > > +   expected to be in OP1 and the initial match operands in args0.  */
> > > > > +
> > > > > +internal_fn
> > > > > +complex_add_pattern::matches (complex_operation_t op,
> > > > > +			      slp_tree_to_load_perm_map_t *perm_cache,
> > > > > +			      vec<slp_tree> *ops)
> > > > > +{
> > > > > +  internal_fn ifn = IFN_LAST;
> > > > > +
> > > > > +  /* Find the two components.  Rotation in the complex plane will
> > modify
> > > > > +     the operations:
> > > > > +
> > > > > +      * Rotation  0: + +
> > > > > +      * Rotation 90: - +
> > > > > +      * Rotation 180: - -
> > > > > +      * Rotation 270: + -
> > > > > +
> > > > > +      Rotation 0 and 180 can be handled by normal SIMD code, so we
> > don't
> > > > need
> > > > > +      to care about them here.  */
> > > > > +  if (op == MINUS_PLUS)
> > > > > +    ifn = IFN_COMPLEX_ADD_ROT90;
> > > > > +  else if (op == PLUS_MINUS)
> > > > > +    ifn = IFN_COMPLEX_ADD_ROT270;
> > > > > +  else
> > > > > +    return ifn;
> > > > > +
> > > > > +  /* verify that there is a permute, otherwise this isn't a pattern we
> > > > > +     we support.  */
> > > > > +  bool is_linear = false;
> > > > > +  gcc_assert (ops->length () == 2);
> > > > > +
> > > > > +  vec<slp_tree> children = SLP_TREE_CHILDREN ((*ops)[0]);
> > > > > +
> > > > > +  /* First node must be unpermuted.  */
> > > > > +  linear_loads_p (perm_cache, children[0], &is_linear);
> > > > > +  if (!is_linear)
> > > > > +    return IFN_LAST;
> > > > > +
> > > > > +  /* Second node must be permuted.  */
> > > > > +  if (linear_loads_p (perm_cache, children[1], &is_linear).length () > 0
> > > > > +      && is_linear)
> > > > > +    return IFN_LAST;
> > > > > +
> > > > > +  return ifn;
> > > > > +}
> > > > > +
> > > > > +vect_pattern*
> > > > > +complex_add_pattern::recognize (slp_tree_to_load_perm_map_t
> > > > *perm_cache,
> > > > > +				slp_tree *node)
> > > > > +{
> > > > > +  auto_vec<slp_tree> ops;
> > > > > +  complex_operation_t op
> > > > > +    = vect_detect_pair_op (*node, true, &ops);
> > > > > +  internal_fn ifn = complex_add_pattern::matches (op, perm_cache,
> > > > &ops);
> > > > > +  if (!vect_pattern_validate_optab (ifn, *node))
> > > > > +    return NULL;
> > > > > +
> > > > > +  return new complex_add_pattern (node, &ops, ifn);
> > > > > +}
> > > > > +
> > > > >
> > > >
> > +/*********************************************************
> > > > **********************
> > > > > + * Pattern matching definitions
> > > > > +
> > > >
> > **********************************************************
> > > > ********************/
> > > > > +
> > > > > +#define SLP_PATTERN(x) &x::recognize
> > > > > +vect_pattern_decl_t slp_patterns[]
> > > > > +{
> > > > > +  /* For least amount of back-tracking and more efficient matching
> > > > > +     order patterns from the largest to the smallest.  Especially if they
> > > > > +     overlap in what they can detect.  */
> > > > > +
> > > > > +  SLP_PATTERN (complex_add_pattern),
> > > > > +};
> > > > > +#undef SLP_PATTERN
> > > > > +
> > > > > +/* Set the number of SLP pattern matchers available.  */
> > > > > +size_t num__slp_patterns =
> > > > sizeof(slp_patterns)/sizeof(vect_pattern_decl_t);
> > > > > diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
> > > > > index
> > > >
> > d19874f175703a96b1c1110874067fdbec48c068..7f5fbdbd4969036b5db1cb698
> > > > da970304c87b03b 100644
> > > > > --- a/gcc/tree-vect-slp.c
> > > > > +++ b/gcc/tree-vect-slp.c
> > > > > @@ -105,7 +105,7 @@ _slp_tree::~_slp_tree ()
> > > > >
> > > > >  /* Recursively free the memory allocated for the SLP tree rooted at
> > NODE.
> > > > */
> > > > >
> > > > > -static void
> > > > > +void
> > > > >  vect_free_slp_tree (slp_tree node)
> > > > >  {
> > > > >    int i;
> > > > > @@ -148,7 +148,7 @@ vect_free_slp_instance (slp_instance instance)
> > > > >
> > > > >  /* Create an SLP node for SCALAR_STMTS.  */
> > > > >
> > > > > -slp_tree
> > > > > +static slp_tree
> > > > >  vect_create_new_slp_node (slp_tree node,
> > > > >  			  vec<stmt_vec_info> scalar_stmts, unsigned nops)
> > > > >  {
> > > > > @@ -165,7 +165,7 @@ vect_create_new_slp_node (slp_tree node,
> > > > >
> > > > >  /* Create an SLP node for SCALAR_STMTS.  */
> > > > >
> > > > > -static slp_tree
> > > > > +slp_tree
> > > > >  vect_create_new_slp_node (vec<stmt_vec_info> scalar_stmts,
> > unsigned
> > > > nops)
> > > > >  {
> > > > >    return vect_create_new_slp_node (new _slp_tree, scalar_stmts,
> > nops);
> > > > > @@ -2175,6 +2175,84 @@ calculate_unrolling_factor (poly_uint64
> > nunits,
> > > > unsigned int group_size)
> > > > >    return exact_div (common_multiple (nunits, group_size), group_size);
> > > > >  }
> > > > >
> > > > > +/* Helper function of vect_match_slp_patterns.
> > > > > +
> > > > > +   Attempts to match patterns against the slp tree rooted in REF_NODE
> > > > using
> > > > > +   VINFO.  Patterns are matched in post-order traversal.
> > > > > +
> > > > > +   If matching is successful the value in REF_NODE is updated and
> > returned,
> > > > if
> > > > > +   not then it is returned unchanged.  */
> > > > > +
> > > > > +static bool
> > > > > +vect_match_slp_patterns_2 (slp_tree *ref_node, vec_info *vinfo,
> > > > > +			   slp_tree_to_load_perm_map_t *perm_cache,
> > > > > +			   hash_set<slp_tree> *visited)
> > > > > +{
> > > > > +  unsigned i;
> > > > > +  slp_tree node = *ref_node;
> > > > > +  bool found_p = false;
> > > > > +  if (!node || visited->add (node))
> > > > > +    return false;
> > > > > +
> > > > > +  slp_tree child;
> > > > > +  FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
> > > > > +    found_p |= vect_match_slp_patterns_2 (&SLP_TREE_CHILDREN
> > > > (node)[i],
> > > > > +					  vinfo, perm_cache, visited);
> > > > > +
> > > > > +  for (unsigned x = 0; x < num__slp_patterns; x++)
> > > > > +    {
> > > > > +      vect_pattern *pattern = slp_patterns[x] (perm_cache, ref_node);
> > > > > +      if (pattern)
> > > > > +	{
> > > > > +	  pattern->build (perm_cache, vinfo);
> > > > > +	  delete pattern;
> > > > > +	  found_p = true;
> > > > > +	}
> > > > > +    }
> > > > > +
> > > > > +  return found_p;
> > > > > +}
> > > > > +
> > > > > +/* Applies pattern matching to the given SLP tree rooted in REF_NODE
> > > > using
> > > > > +   vec_info VINFO.
> > > > > +
> > > > > +   The modified tree is returned.  Patterns are tried in order and
> > multiple
> > > > > +   patterns may match.  */
> > > > > +
> > > > > +static bool
> > > > > +vect_match_slp_patterns (slp_instance instance, vec_info *vinfo,
> > > > > +			 hash_set<slp_tree> *visited,
> > > > > +			 slp_tree_to_load_perm_map_t *perm_cache,
> > > > > +			 scalar_stmts_to_slp_tree_map_t * /* bst_map */)
> > > > > +{
> > > > > +  DUMP_VECT_SCOPE ("vect_match_slp_patterns");
> > > > > +  slp_tree *ref_node = &SLP_INSTANCE_TREE (instance);
> > > > > +
> > > > > +  if (dump_enabled_p ())
> > > > > +    dump_printf_loc (MSG_NOTE, vect_location,
> > > > > +		     "Analyzing SLP tree %p for patterns\n",
> > > > > +		     SLP_INSTANCE_TREE (instance));
> > > > > +
> > > > > +  bool found_p
> > > > > +    = vect_match_slp_patterns_2 (ref_node, vinfo, perm_cache,
> > visited);
> > > > > +
> > > > > +  if (found_p)
> > > > > +    {
> > > > > +      if (dump_enabled_p ())
> > > > > +	{
> > > > > +	  dump_printf_loc (MSG_NOTE, vect_location,
> > > > > +			   "Pattern matched SLP tree\n");
> > > > > +	  vect_print_slp_graph (MSG_NOTE, vect_location, *ref_node);
> > > > > +	}
> > > > > +    }
> > > > > +
> > > > > +  return found_p;
> > > > > +}
> > > > > +
> > > > > +/* Analyze an SLP instance starting from a group of grouped stores.
> > Call
> > > > > +   vect_build_slp_tree to build a tree of packed stmts if possible.
> > > > > +   Return FALSE if it's impossible to SLP any stmt in the loop.  */
> > > > > +
> > > > >  static bool
> > > > >  vect_analyze_slp_instance (vec_info *vinfo,
> > > > >  			   scalar_stmts_to_slp_tree_map_t *bst_map,
> > > > > @@ -2540,6 +2618,7 @@ vect_analyze_slp (vec_info *vinfo, unsigned
> > > > max_tree_size)
> > > > >  {
> > > > >    unsigned int i;
> > > > >    stmt_vec_info first_element;
> > > > > +  slp_instance instance;
> > > > >
> > > > >    DUMP_VECT_SCOPE ("vect_analyze_slp");
> > > > >
> > > > > @@ -2586,6 +2665,13 @@ vect_analyze_slp (vec_info *vinfo, unsigned
> > > > max_tree_size)
> > > > >  				   slp_inst_kind_reduc_group,
> > > > max_tree_size);
> > > > >      }
> > > > >
> > > > > +  hash_set<slp_tree> visited_patterns;
> > > > > +  slp_tree_to_load_perm_map_t perm_cache;
> > > > > +  /* See if any patterns can be found in the SLP tree.  */
> > > > > +  FOR_EACH_VEC_ELT (LOOP_VINFO_SLP_INSTANCES (vinfo), i,
> > instance)
> > > > > +    vect_match_slp_patterns (instance, vinfo, &visited_patterns,
> > > > &perm_cache,
> > > > > +			     bst_map);
> > > > > +
> > > > >    /* The map keeps a reference on SLP nodes built, release that.  */
> > > > >    for (scalar_stmts_to_slp_tree_map_t::iterator it = bst_map->begin ();
> > > > >         it != bst_map->end (); ++it)
> > > > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> > > > > index
> > > >
> > 91e2e10761d591b99ad55467e4719219ea5c0e49..ea39f56365e6c6fcbaaeb9cde
> > > > 769a81a109d6af3 100644
> > > > > --- a/gcc/tree-vectorizer.h
> > > > > +++ b/gcc/tree-vectorizer.h
> > > > > @@ -27,6 +27,7 @@ typedef class _stmt_vec_info *stmt_vec_info;
> > > > >  #include "tree-hash-traits.h"
> > > > >  #include "target.h"
> > > > >  #include "alloc-pool.h"
> > > > > +#include "internal-fn.h"
> > > > >
> > > > >
> > > > >  /* Used for naming of new temporaries.  */
> > > > > @@ -1994,6 +1995,7 @@ extern void duplicate_and_interleave
> > (vec_info *,
> > > > gimple_seq *, tree,
> > > > >  extern int vect_get_place_in_interleaving_chain (stmt_vec_info,
> > > > stmt_vec_info);
> > > > >  extern bool vect_update_shared_vectype (stmt_vec_info, tree);
> > > > >  extern slp_tree vect_create_new_slp_node (vec<stmt_vec_info>,
> > > > unsigned);
> > > > > +extern void vect_free_slp_tree (slp_tree);
> > > > >
> > > > >  /* In tree-vect-patterns.c.  */
> > > > >  extern void
> > > > > @@ -2010,4 +2012,67 @@ void vect_free_loop_info_assumptions
> > (class
> > > > loop *);
> > > > >  gimple *vect_loop_vectorized_call (class loop *, gcond **cond = NULL);
> > > > >  bool vect_stmt_dominates_stmt_p (gimple *, gimple *);
> > > > >
> > > > > +/* SLP Pattern matcher types, tree-vect-slp-patterns.c.  */
> > > > > +
> > > > > +/* Forward declaration of possible two operands operation that can
> > be
> > > > matched
> > > > > +   by the complex numbers pattern matchers.  */
> > > > > +enum _complex_operation : unsigned;
> > > > > +
> > > > > +/* Cache from nodes to the load permutation they represent.  */
> > > > > +typedef hash_map <slp_tree, load_permutation_t >
> > > > > +  slp_tree_to_load_perm_map_t;
> > > > > +
> > > > > +/* Vector pattern matcher base class.  All SLP pattern matchers must
> > > > inherit
> > > > > +   from this type.  */
> > > > > +
> > > > > +class vect_pattern
> > > > > +{
> > > > > +  protected:
> > > > > +    /* The number of arguments that the IFN requires.  */
> > > > > +    unsigned m_num_args;
> > > > > +
> > > > > +    /* The internal function that will be used when a pattern is created.
> > */
> > > > > +    internal_fn m_ifn;
> > > > > +
> > > > > +    /* The current node being inspected.  */
> > > > > +    slp_tree *m_node;
> > > > > +
> > > > > +    /* The list of operands to be the children for the node produced
> > when
> > > > the
> > > > > +       internal function is created.  */
> > > > > +    vec<slp_tree> m_ops;
> > > > > +
> > > > > +    /* Default constructor where NODE is the root of the tree to inspect.
> > */
> > > > > +    vect_pattern (slp_tree *node, vec<slp_tree> *m_ops, internal_fn
> > ifn)
> > > > > +    {
> > > > > +      this->m_ifn = ifn;
> > > > > +      this->m_node = node;
> > > > > +      this->m_ops.create (0);
> > > > > +      this->m_ops.safe_splice (*m_ops);
> > > > > +    }
> > > > > +
> > > > > +  public:
> > > > > +
> > > > > +    /* Create a new instance of the pattern matcher class of the given
> > type.
> > > > */
> > > > > +    static vect_pattern* recognize (slp_tree_to_load_perm_map_t *,
> > > > slp_tree *);
> > > > > +
> > > > > +    /* Build the pattern from the data collected so far.  */
> > > > > +    virtual void build (slp_tree_to_load_perm_map_t *, vec_info *) = 0;
> > > > > +
> > > > > +    /* Default destructor.  */
> > > > > +    virtual ~vect_pattern ()
> > > > > +    {
> > > > > +	this->m_ops.release ();
> > > > > +    }
> > > > > +};
> > > > > +
> > > > > +/* Function pointer to create a new pattern matcher from a generic
> > type.
> > > > */
> > > > > +typedef vect_pattern* (*vect_pattern_decl_t)
> > > > (slp_tree_to_load_perm_map_t *,
> > > > > +					      slp_tree *);
> > > > > +
> > > > > +/* List of supported pattern matchers.  */
> > > > > +extern vect_pattern_decl_t slp_patterns[];
> > > > > +
> > > > > +/* Number of supported pattern matchers.  */
> > > > > +extern size_t num__slp_patterns;
> > > > > +
> > > > >  #endif  /* GCC_TREE_VECTORIZER_H  */
> > > > >
> > > > >
> > > > >
> > > >
> > > > --
> > > > Richard Biener <rguenther@suse.de>
> > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> > > > Nuernberg,
> > > > Germany; GF: Felix Imend
> > >
> > 
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> > Nuernberg,
> > Germany; GF: Felix Imend
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend