From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by sourceware.org (Postfix) with ESMTPS id 2C52B39724A3 for ; Fri, 27 Nov 2020 10:30:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 2C52B39724A3 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rguenther@suse.de X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id C894BAD09; Fri, 27 Nov 2020 10:30:03 +0000 (UTC) Date: Fri, 27 Nov 2020 10:30:03 +0000 (UTC) From: Richard Biener To: Tamar Christina cc: "gcc-patches@gcc.gnu.org" , nd , "ook@ucw.cz" , "hongtao.liu@intel.com" Subject: RE: [PATCH] middle-end: Support complex Addition In-Reply-To: Message-ID: References: User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-10.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_DMARC_STATUS, KAM_LOTSOFHASH, KAM_SHORT, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 Nov 2020 10:30:21 -0000 On Thu, 26 Nov 2020, Tamar Christina wrote: > Hi Richi, > > > -----Original Message----- > > From: Richard Biener > > Sent: Tuesday, November 24, 2020 2:15 PM > > To: Tamar Christina > > Cc: gcc-patches@gcc.gnu.org; nd ; ook@ucw.cz; > > hongtao.liu@intel.com > > Subject: RE: [PATCH] middle-end: Support complex Addition > > > > On Tue, 24 Nov 2020, Tamar Christina wrote: > > > > > > > > > > > > -----Original Message----- > > > > From: Richard Biener > > > > Sent: Tuesday, November 24, 2020 12:24 PM > > > > To: Tamar Christina > > > > Cc: gcc-patches@gcc.gnu.org; nd ; ook@ucw.cz; > > > > hongtao.liu@intel.com > > > > Subject: RE: [PATCH] middle-end: Support complex Addition > > > > > > > > On Tue, 24 Nov 2020, Tamar Christina wrote: > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > From: Richard Biener > > > > > > Sent: Tuesday, November 24, 2020 10:54 AM > > > > > > To: Tamar Christina > > > > > > Cc: gcc-patches@gcc.gnu.org; nd ; ook@ucw.cz; > > > > > > hongtao.liu@intel.com > > > > > > Subject: RE: [PATCH] middle-end: Support complex Addition > > > > > > > > > > > > On Tue, 24 Nov 2020, Richard Biener wrote: > > > > > > > > > > > > > On Mon, 23 Nov 2020, Tamar Christina wrote: > > > > > > > > > > > > > > > Hi Richi, > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > > From: Richard Biener > > > > > > > > > Sent: Monday, November 23, 2020 3:51 PM > > > > > > > > > To: Tamar Christina > > > > > > > > > Cc: gcc-patches@gcc.gnu.org; nd ; ook@ucw.cz; > > > > > > > > > hongtao.liu@intel.com > > > > > > > > > Subject: Re: [PATCH] middle-end: Support complex Addition > > > > > > > > > > > > > > > > > > On Mon, 23 Nov 2020, Tamar Christina wrote: > > > > > > > > > > > > > > > > > > > Hi All, > > > > > > > > > > > > > > > > > > > > This patch adds support for > > > > > > > > > > > > > > > > > > > > * Complex Addition with rotation of 90 and 270. > > > > > > > > > > > > > > > > > > > > Addition with rotation of the second argument around the > > > > Argand > > > > > > plane. > > > > > > > > > > Supported rotations are 90 and 180. > > > > > > > > > > > > > > > > > > > > c = a + (b * I) and c = a + (b * I * I * I) > > > > > > > > > > > > > > > > > > > > For the full code I have pushed a branch at > > > > > > > > > refs/users/tnfchris/heads/complex-numbers. > > > > > > > > > > > > > > > > > > > > As a side note, I still needed to set > > > > > > > > > > > > > > > > > > > > STMT_SLP_TYPE (call_stmt_info) = pure_slp; > > > > > > > > > > > > > > > > > > > > as the new hybrid detection code only runs for loop aware SLP. > > > > > > > > > > > > > > > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no > > issues, > > > > > > but > > > > > > > > > > sorting out the testcases as TCL is processed before the CPP.. > > > > > > > > > > > > > > > > > > > > Ok for master? > > > > > > > > > > > > > > > > > > So I failed to apply this patch (and after manual fixup build). > > > > > > > > > I went ahead and checked out the branch, patching the tree > > with > > > > > > > > > x86 support for cadd90 with -msse3 or -mavx2 using the > > attached > > > > > > > > > patch. > > > > > > > > > > > > > > > > > > > > > > > > > It requires a patch you have previously approved pending the rest > > so > > > > it's > > > > > > not committed yet ? > > > > > > > > > > > > > > Ah, I missed that. > > > > > > > > > > > > > > > > For > > > > > > > > > > > > > > > > > > double c[1024], b[1024], a[1024]; > > > > > > > > > > > > > > > > > > void foo () > > > > > > > > > { > > > > > > > > > for (int i = 0; i < 512; ++i) > > > > > > > > > { > > > > > > > > > c[2*i] = a[2*i] - b[2*i+1]; > > > > > > > > > c[2*i+1] = a[2*i+1] + b[2*i]; > > > > > > > > > } > > > > > > > > > } > > > > > > > > > > > > > > > > > > I then see > > > > > > > > > > > > > > > > > > t.c:5:21: note: Analyzing SLP tree 0x39c0010 for patterns > > > > > > > > > t.c:5:21: note: Found COMPLEX_ADD_ROT90 pattern in SLP > > tree > > > > > > > > > t.c:5:21: note: Target supports COMPLEX_ADD_ROT90 > > > > vectorization > > > > > > with > > > > > > > > > mode vector(2) double > > > > > > > > > t.c:5:21: note: Pattern matched SLP tree > > > > > > > > > t.c:5:21: note: node 0x39c0010 (max_nunits=2, refcnt=2) > > > > > > > > > t.c:5:21: note: op template: c[_1] = _5; > > > > > > > > > t.c:5:21: note: stmt 0 c[_1] = _5; > > > > > > > > > t.c:5:21: note: stmt 1 c[_3] = _8; > > > > > > > > > t.c:5:21: note: children 0x39c0080 > > > > > > > > > t.c:5:21: note: node 0x39c0080 (max_nunits=2, refcnt=2) > > > > > > > > > t.c:5:21: note: op template: slp_patt_29 > > = .COMPLEX_ADD_ROT90 > > > > (_5, > > > > > > _5); > > > > > > > > > t.c:5:21: note: stmt 0 _5 = _2 - _4; > > > > > > > > > t.c:5:21: note: stmt 1 _8 = _6 + _7; > > > > > > > > > t.c:5:21: note: lane permutation { 0[0] 1[1] } > > > > > > > > > t.c:5:21: note: children 0x39c00f0 0x39c02b0 > > > > > > > > > t.c:5:21: note: node 0x39c00f0 (max_nunits=2, refcnt=2) > > > > > > > > > t.c:5:21: note: op template: _2 = a[_1]; > > > > > > > > > t.c:5:21: note: stmt 0 _2 = a[_1]; > > > > > > > > > t.c:5:21: note: stmt 1 _6 = a[_3]; > > > > > > > > > t.c:5:21: note: load permutation { 0 1 } > > > > > > > > > t.c:5:21: note: node 0x39c02b0 (max_nunits=1, refcnt=1) > > > > > > > > > t.c:5:21: note: op: VEC_PERM_EXPR > > > > > > > > > t.c:5:21: note: { } > > > > > > > > > t.c:5:21: note: lane permutation { 0[1] 0[0] } > > > > > > > > > t.c:5:21: note: children 0x39c0160 > > > > > > > > > t.c:5:21: note: node 0x39c0160 (max_nunits=2, refcnt=2) > > > > > > > > > t.c:5:21: note: op template: _4 = b[_3]; > > > > > > > > > t.c:5:21: note: stmt 0 _4 = b[_3]; > > > > > > > > > t.c:5:21: note: stmt 1 _7 = b[_1]; > > > > > > > > > t.c:5:21: note: load permutation { 1 0 } > > > > > > > > > > > > > > > > > > I'm confused about the lane permutation in > > > > > > the .COMPLEX_ADD_ROT90 > > > > > > > > > node (I guess this permutation is simply ignored by code- > > > > generation). > > > > > > > > > Should it not be there? > > > > > > > > > > > > > > > > Yes, I had completely missed that. I forgot to blank it out. > > > > > > > > > > > > > > Btw, in this context > > > > > > > > > > > > > > /* Unfortunately still need this on the new pattern because non- > > > > loop > > > > > > > SLP > > > > > > > doesn't call vect_detect_hybrid_slp so it never updates it. */ > > > > > > > STMT_SLP_TYPE (call_stmt_info) = pure_slp; > > > > > > > > > > > > > > this isnt' about the hybrid marker but about vect_mark_slp_stmts > > > > > > > which marks all stmts participating in the SLP graph with pure_slp > > > > > > > which only marks SLP_TREE_SCALAR_STMTS but not > > > > > > SLP_TREE_REPRESENTATIVE. > > > > > > > I think that's OK and thus the above setting of pure_slp is OK as well, > > > > > > > just the comment is off. Maybe make it "Make sure to mark the > > > > > > > representative statement pure_slp and relevant". > > > > > > > > > > > > > > > > > > > > > > > > > Otherwise the outcome is now as expected. Permute > > optimization > > > > > > > > > later produces > > > > > > > > > > > > > > > > > > t.c:5:21: note: node 0x39c0080 (max_nunits=2, refcnt=1) > > > > > > > > > t.c:5:21: note: op template: slp_patt_29 > > = .COMPLEX_ADD_ROT90 > > > > (_5, > > > > > > _5); > > > > > > > > > t.c:5:21: note: stmt 0 _5 = _2 - _4; > > > > > > > > > t.c:5:21: note: stmt 1 _8 = _6 + _7; > > > > > > > > > t.c:5:21: note: lane permutation { 0[0] 1[1] } > > > > > > > > > t.c:5:21: note: children 0x39c00f0 0x39c02b0 > > > > > > > > > ... > > > > > > > > > t.c:5:21: note: node 0x39c02b0 (max_nunits=1, refcnt=1) > > > > > > > > > t.c:5:21: note: op: VEC_PERM_EXPR > > > > > > > > > t.c:5:21: note: { } > > > > > > > > > t.c:5:21: note: lane permutation { 0[0] 0[1] } > > > > > > > > > t.c:5:21: note: children 0x39c0160 > > > > > > > > > t.c:5:21: note: node 0x39c0160 (max_nunits=2, refcnt=1) > > > > > > > > > t.c:5:21: note: op template: _4 = b[_3]; > > > > > > > > > t.c:5:21: note: stmt 0 _7 = b[_1]; > > > > > > > > > t.c:5:21: note: stmt 1 _4 = b[_3]; > > > > > > > > > > > > > > > > > > where the noop permute is correctly costed (and thus is just a > > > > > > > > > cosmetic annoyance): > > > > > > > > > > > > > > > > > > 0x3a13870 a[_1] 1 times vector_load costs 12 in body > > > > > > > > > 0x3a13870 b[_1] 1 times vector_load costs 12 in body > > > > > > > > > 0x3a13870 0 times vec_perm costs 0 in body > > > > > > > > > 0x3a13870 .COMPLEX_ADD_ROT90 (_5, _5) 1 times vector_stmt > > > > costs > > > > > > 12 in > > > > > > > > > body > > > > > > > > > 0x3a13870 _5 1 times vector_store costs 12 in body > > > > > > > > > > > > > > > > > > Code generated is also superior (-msse3): > > > > > > > > > > > > > > > > > > .L2: > > > > > > > > > movapd a(%rax), %xmm0 > > > > > > > > > addsubpd b(%rax), %xmm0 > > > > > > > > > addq $16, %rax > > > > > > > > > movaps %xmm0, c-16(%rax) > > > > > > > > > cmpq $8192, %rax > > > > > > > > > jne .L2 > > > > > > > > > > > > > > > > > > compared to GCC 10 where we have an extra permute > > > > > > > > > > > > > > > > > > .L2: > > > > > > > > > movapd b(%rax), %xmm0 > > > > > > > > > movapd a(%rax), %xmm1 > > > > > > > > > addq $16, %rax > > > > > > > > > shufpd $1, %xmm0, %xmm0 > > > > > > > > > addsubpd %xmm0, %xmm1 > > > > > > > > > movaps %xmm1, c-16(%rax) > > > > > > > > > cmpq $8192, %rax > > > > > > > > > jne .L2 > > > > > > > > > > > > > > > > > > which of course makes me wonder whether I have done the x86 > > > > > > > > > support correctly. Ah, I have not. The x86 instructions > > > > > > > > > do not embed the even/odd lane swap, they just do the mixed > > > > > > > > > sign operation. So for those we'd need additional optabs > > > > > > > > > and patterns then. > > > > > > > > > > > > > > > > > > So I see the branch contains only the complex add so I'm > > > > > > > > > going through the changes there: > > > > > > > > > > > > > > > > Yes I'm still updating MUL, FMA and FMS are tiny extensions to > > MUL. > > > > > > > > > > > > > > > > > > > > > > > > > > /* Create an SLP node for SCALAR_STMTS. */ > > > > > > > > > > > > > > > > > > -static slp_tree > > > > > > > > > +slp_tree > > > > > > > > > vect_create_new_slp_node (slp_tree node, > > > > > > > > > vec scalar_stmts, unsigned nops) > > > > > > > > > { > > > > > > > > > SLP_TREE_SCALAR_STMTS (node) = scalar_stmts; > > > > > > > > > SLP_TREE_CHILDREN (node).create (nops); > > > > > > > > > SLP_TREE_DEF_TYPE (node) = vect_internal_def; > > > > > > > > > - SLP_TREE_REPRESENTATIVE (node) = scalar_stmts[0]; > > > > > > > > > - SLP_TREE_LANES (node) = scalar_stmts.length (); > > > > > > > > > + if (scalar_stmts.exists ()) > > > > > > > > > + { > > > > > > > > > + SLP_TREE_REPRESENTATIVE (node) = scalar_stmts[0]; > > > > > > > > > + SLP_TREE_LANES (node) = scalar_stmts.length (); > > > > > > > > > + } > > > > > > > > > return node; > > > > > > > > > } > > > > > > > > > > > > > > > > > > so I don't like that very much, I guess we instead want a > > > > > > > > > > > > > > > > > > vect_create_new_perm_node (slp_node node, nops) > > > > > > > > > > > > > > > > > > which can pre-fill SLP_TREE_CODE. > > > > > > > > > > > > > > > > > > You add testsuite/gcc.dg/vect/complex/ but there's neither an > > > > > > > > > .exp file in it nor is it sourced from vect.exp - I suppose > > > > > > > > > some bits are missing here on the branch? > > > > > > > > > > > > > > > > Ugg, sorry... I forgot a git add... > > > > > > > > > > > > > > > > > > > > > > > > > > +typedef enum _complex_operation : unsigned { > > > > > > > > > > > > > > > > > > uh, oh - C++ I don't know. Is : unsigned required? > > > > > > > > > > > > > > > > > > > > > > > > > It requires an enum base, so either enum E : int or enum class E, > > > > > > > > which apparently defaults to int. > > > > > > > > > > > > > > > > > > > > > > > > > > +/* Check to see if all loads rooted in ROOT are linear. Linearity > > is > > > > > > > > > + defined as having no gaps between values loaded. */ > > > > > > > > > > > > > > > > > > what is actually returned? > > > > > > > > > > > > > > > > It returns the load permute that the node being inspected would > > > > > > produce. > > > > > > > > Or rather, it shows how the data flows through the tree rooted at > > that > > > > > > node. > > > > > > > > > > > > > > > > It's used a to determine if the operation being done does the > > > > odd/even > > > > > > lane > > > > > > > > swapping. This becomes more important for MUL as I need to > > > > distinguish > > > > > > between > > > > > > > > a conjucate and a rotation. Both of which produce just a negate > > node, > > > > > > but what they > > > > > > > > negate determines what the operation is. > > > > > > > > > > > > > > So it basically computes what optimize_slp does in its dataflow of > > > > > > > permutes? But you do > > > > > > > > > > > > > > auto_vec all_loads; > > > > > > > bool is_perm = SLP_TREE_LANE_PERMUTATION (root).exists (); > > > > > > > > > > > > > > slp_tree child; > > > > > > > FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (root), i, child) > > > > > > > { > > > > > > > loads = linear_loads_p (perm_cache, child, linear); > > > > > > > if ((!*linear && !is_perm) || !loads.exists ()) > > > > > > > return loads; > > > > > > > > > > > > > > so when there's a branch in the SLP graph and either one is > > > > > > > not linear you return the permute on that branch? Or if there > > > > > > > isn't any permute on one branch you return that. Whatever comes > > > > > > > first? The code misses at least comments explaining on what > > > > > > > it computes for the root of a SLP subgraph (note the graph can > > > > > > > now be cyclic as to where I don't really see how that is handled > > > > > > > here). (**) > > > > > > > > > > > > > > > > > > > > > > > > > +static load_permutation_t > > > > > > > > > +linear_loads_p (slp_tree_to_load_perm_map_t *perm_cache, > > > > > > slp_tree > > > > > > > > > root, > > > > > > > > > + bool *linear) > > > > > > > > > +{ > > > > > > > > > ... > > > > > > > > > + else if (SLP_TREE_DEF_TYPE (root) == vect_external_def) > > > > > > > > > + { > > > > > > > > > + loads.create (SLP_TREE_LANES (root)); > > > > > > > > > > > > > > > > > > it's weird that you need to dig into vect_external_defs - if the > > > > > > > > > vectorizer for whatever reason decided to not make the defs > > > > internal > > > > > > > > > you shouldn't pick them up here? > > > > > > > > > > > > > > > > I do so because for the purposes of these instructions you need to > > > > have > > > > > > an > > > > > > > > alternating sequence. If you say have the same externals { _a , _a } > > > > that > > > > > > operation > > > > > > > > isn't what the instruction expects. Accepting random externals > > was > > > > also > > > > > > causing ICEs > > > > > > > > when compiling SPECFP 2017 but didn't look too deeply into this as > > I > > > > > > couldn't convince > > > > > > > > myself that it should match these. > > > > > > > > > > > > > > Did you actually run into a testcase with external loads? > > > > > > > > > > > > > > > > > > > > > > > > > + typedef const std::pair* cmp_t; > > > > > > > > > + zipped.qsort ([](const void *a, const void *b) -> int > > > > > > > > > + { return (int)((cmp_t)a)->first - (int)((cmp_t)b)->first; }); > > > > > > > > > > > > > > > > > > are we supposed to use lambdas? I guess not. > > > > > > > > > > > > > > > > Oh.. wasn't aware lambdas weren't allowed.. I'll make it a function. > > > > > > > > > > > > > > Jakub says lambdas are OK, so whatever pleases you more. > > > > > > > > > > > > > > (**) so here you are computing a permute to undo that very exact > > > > > > > permute you discovered earlier - but I don't see how that > > discovered > > > > > > > permute is reality? > > > > > > > > > > > > > > > > > > > > > > > > > Anyway, I wonder why we need to make the SLP children > > "linear" > > > > > > > > > in the first place? > > > > > > > > > > > > > > > > Because the instruction does the permute internally. > > > > > > > > It really is reflecting complex arithmetic. > > > > > > > > > > > > > > Yes, I understand. > > > > > > > > > > > > > > > > > > > > > > > > > That said, I wonder whether the x86 pattern here is more > > sensible > > > > > > > > > since if you have a sequence of complex adds I'm not sure your > > > > > > > > > "linear verifier" gets things optimal? That is, in case this > > > > > > > > > is not single complex operations but in Ca + Cb Cb ends up > > > > > > > > > a complex expression. If the ARM complex vector operation > > > > > > > > > swaps even/odd lanes of the second operand then wouldn't it > > > > > > > > > be better (and easier) to match > > > > > > > > > > > > > > > > > > a0 = b0 - c0; > > > > > > > > > a1 = b1 - c1; > > > > > > > > > > > > > > > > > > > > > > > > > I assume the second one should be a +? > > > > > > > > > > > > > > Yes, sorry. > > > > > > > > > > > > > > > > as > > > > > > > > > > > > > > > > > > a = cadd90 (b, perm(c, { 1, 0})) > > > > > > > > > > > > > > > > > > and make the "anticipated" permute of the second operand > > part > > > > > > > > > of the actual pattern and to be eventually optimized by > > > > > > > > > permute optimization? Because it's still cheaper than > > > > > > > > > what we have from the two-operator handling, namely > > > > > > > > > add, subtract and permute. The SLP trees pasted above > > > > > > > > > do suggest that you add the anticipated permute operation > > > > > > > > > so I wonder whether all the linearization is just premature here? > > > > > > > > > > > > > > > > Consider add270: > > > > > > > > > > > > > > > > for (int i=0; i < N; i++) > > > > > > > > c[i] = a[i] + (b[i] * I * I * I); > > > > > > > > > > > > > > > > note: Final SLP tree for instance 0x4461b30: > > > > > > > > note: node 0x436c9c0 (max_nunits=4, refcnt=2) > > > > > > > > note: op template: REALPART_EXPR <*_10> = _23; > > > > > > > > note: stmt 0 REALPART_EXPR <*_10> = _23; > > > > > > > > note: stmt 1 IMAGPART_EXPR <*_10> = _4; > > > > > > > > note: children 0x436ca38 > > > > > > > > note: node 0x436ca38 (max_nunits=4, refcnt=2) > > > > > > > > note: op: VEC_PERM_EXPR > > > > > > > > note: stmt 0 _23 = _6 + _13; > > > > > > > > note: stmt 1 _4 = _12 - _7; > > > > > > > > note: lane permutation { 0[0] 1[1] } > > > > > > > > note: children 0x436cba0 0x436cc18 > > > > > > > > note: node 0x436cba0 (max_nunits=1, refcnt=1) > > > > > > > > note: op template: _23 = _6 + _13; > > > > > > > > note: { } > > > > > > > > note: children 0x436cab0 0x436cb28 > > > > > > > > note: node 0x436cab0 (max_nunits=4, refcnt=3) > > > > > > > > note: op template: _13 = REALPART_EXPR <*_3>; > > > > > > > > note: stmt 0 _13 = REALPART_EXPR <*_3>; > > > > > > > > note: stmt 1 _12 = IMAGPART_EXPR <*_3>; > > > > > > > > note: load permutation { 0 1 } > > > > > > > > note: node 0x436cb28 (max_nunits=4, refcnt=3) > > > > > > > > note: op template: _6 = IMAGPART_EXPR <*_5>; > > > > > > > > note: stmt 0 _6 = IMAGPART_EXPR <*_5>; > > > > > > > > note: stmt 1 _7 = REALPART_EXPR <*_5>; > > > > > > > > note: load permutation { 1 0 } > > > > > > > > note: node 0x436cc18 (max_nunits=1, refcnt=1) > > > > > > > > note: op template: _4 = _12 - _7; > > > > > > > > note: { } > > > > > > > > note: children 0x436cab0 0x436cb28 > > > > > > > > > > > > > > > > and add_conj: > > > > > > > > > > > > > > > > for (int i=0; i < N; i++) > > > > > > > > c[i] = a[i] + conjf (b[i]); > > > > > > > > > > > > > > > > note: Final SLP tree for instance 0x4fbf5a0: > > > > > > > > note: node 0x505d910 (max_nunits=4, refcnt=2) > > > > > > > > note: op template: REALPART_EXPR <*_8> = _23; > > > > > > > > note: stmt 0 REALPART_EXPR <*_8> = _23; > > > > > > > > note: stmt 1 IMAGPART_EXPR <*_8> = _4; > > > > > > > > note: children 0x505d988 > > > > > > > > note: node 0x505d988 (max_nunits=4, refcnt=2) > > > > > > > > note: op: VEC_PERM_EXPR > > > > > > > > note: stmt 0 _23 = _11 + _20; > > > > > > > > note: stmt 1 _4 = _10 - _19; > > > > > > > > note: lane permutation { 0[0] 1[1] } > > > > > > > > note: children 0x505daf0 0x505db68 > > > > > > > > note: node 0x505daf0 (max_nunits=1, refcnt=1) > > > > > > > > note: op template: _23 = _11 + _20; > > > > > > > > note: { } > > > > > > > > note: children 0x505da00 0x505da78 > > > > > > > > note: node 0x505da00 (max_nunits=4, refcnt=3) > > > > > > > > note: op template: _11 = REALPART_EXPR <*_3>; > > > > > > > > note: stmt 0 _11 = REALPART_EXPR <*_3>; > > > > > > > > note: stmt 1 _10 = IMAGPART_EXPR <*_3>; > > > > > > > > note: load permutation { 0 1 } > > > > > > > > note: node 0x505da78 (max_nunits=4, refcnt=3) > > > > > > > > note: op template: _20 = REALPART_EXPR <*_5>; > > > > > > > > note: stmt 0 _20 = REALPART_EXPR <*_5>; > > > > > > > > note: stmt 1 _19 = IMAGPART_EXPR <*_5>; > > > > > > > > note: load permutation { 0 1 } > > > > > > > > note: node 0x505db68 (max_nunits=1, refcnt=1) > > > > > > > > note: op template: _4 = _10 - _19; > > > > > > > > note: { } > > > > > > > > note: children 0x505da00 0x505da78 > > > > > > > > > > > > > > > > These are virtually identical. Aside from the first one having a > > > > permute in > > > > > > > > 0x436cb28 being {1, 0} and the one in 0x505da78 being {0, 1}. But > > they > > > > > > > > are quite different operations. (in fact the conj case seems to > > match > > > > what > > > > > > x86 has). > > > > > > > > > > > > > > > > So the problem with not checking the permutes is that you would > > > > treat > > > > > > both of these > > > > > > > > the same and emit the instruction with the permute. Which > > would > > > > > > produce correct > > > > > > > > code but not necessarily efficient code. > > > > > > > > > > > > > > > > Swapping a {0, 1} permute is trivial, but accepting it means > > accepting > > > > any > > > > > > random permute > > > > > > > > where either the permute requires a general permute operation > > (TBL) > > > > > > which we cost quite > > > > > > > > high due to it's impact on register allocation and the fact it requires > > an > > > > > > index register to be > > > > > > > > loaded from memory. > > > > > > > > > > > > > > Hmm. With having all these subtly different operations natively > > > > available > > > > > > > this indeed complicates things. But then given a even/odd > > plus/minus > > > > > > > operation without a way to infer what permutation we are looking > > at > > > > > > > is there a good choice as to which of the even/odd lane instructions > > we > > > > > > > want to match? It sounds add_conj it should be, no? > > > > > > > > > > > > > > That said, it looks like a ordering issue with the permute > > optimization > > > > > > > phase to me. > > > > > > > > > > > > > > So if we go with some heuristic then what you try to do is figure > > > > > > > if one of the operands of the pattern matched operation is already > > > > > > > perfectly linear. For the operand the instruction can do a > > permutation > > > > > > > the exact permute cannot matter since you don't seem to compute > > an > > > > > > > exact permute but emit the "anticipated" one and leave the rest to > > > > > > > be (hopefully) optimized later. The important part (cost-wise) > > seems > > > > > > > to be to not anticipate a permute where there is none. > > > > > > > > > > > > > > > This means we will likely end up rejecting such cases based on cost > > > > alone > > > > > > and no longer > > > > > > > > vectorize in these cases. > > > > > > > > > > > > > > Is that so? Without matching any pattern you'd have a vector plus > > and > > > > > > > a vector minus and then a tbl combining both? > > > > > > > > > > > > > > > The other case is when I don't even know how to make it "fit" in > > the > > > > > > instruction. Consider: > > > > > > > > > > > > > > > > for (int i=0; i < N; i+=2) > > > > > > > > { > > > > > > > > c[i] = a[i] - b[i]; > > > > > > > > c[i+1] = a[i+1] + b[i]; > > > > > > > > } > > > > > > > > > > > > > > > > Which becomes > > > > > > > > > > > > > > > > note: Final SLP tree for instance 0x44e25a0: > > > > > > > > note: node 0x45703e0 (max_nunits=2, refcnt=2) > > > > > > > > note: op template: *_7 = _8; > > > > > > > > note: stmt 0 *_7 = _8; > > > > > > > > note: stmt 1 *_13 = _14; > > > > > > > > note: children 0x4570458 > > > > > > > > note: node 0x4570458 (max_nunits=2, refcnt=2) > > > > > > > > note: op: VEC_PERM_EXPR > > > > > > > > note: stmt 0 _8 = _4 - _6; > > > > > > > > note: stmt 1 _14 = _6 + _12; > > > > > > > > note: lane permutation { 0[0] 1[1] } > > > > > > > > note: children 0x45705c0 0x4570638 > > > > > > > > note: node 0x45705c0 (max_nunits=1, refcnt=1) > > > > > > > > note: op template: _8 = _4 - _6; > > > > > > > > note: { } > > > > > > > > note: children 0x45704d0 0x4570548 > > > > > > > > note: node 0x45704d0 (max_nunits=2, refcnt=3) > > > > > > > > note: op template: _4 = *_3; > > > > > > > > note: stmt 0 _4 = *_3; > > > > > > > > note: stmt 1 _12 = *_11; > > > > > > > > note: load permutation { 0 1 } > > > > > > > > note: node 0x4570548 (max_nunits=2, refcnt=3) > > > > > > > > note: op template: _6 = *_5; > > > > > > > > note: stmt 0 _6 = *_5; > > > > > > > > note: stmt 1 _6 = *_5; > > > > > > > > note: load permutation { 0 0 } > > > > > > > > note: node 0x4570638 (max_nunits=1, refcnt=1) > > > > > > > > note: op template: _14 = _6 + _12; > > > > > > > > note: { } > > > > > > > > note: children 0x45704d0 0x4570548 > > > > > > > > > > > > > > > > Which I would need to work out on pen and paper to see if it can > > > > even > > > > > > work > > > > > > > > With the instruction.. (we generate quite awful code for this atm > > with > > > > > > float). > > > > > > > > > > > > > > Well, clearly the simple-minded match would add a perm node in > > > > > > > front of the b[i] load one and the permute optimization phase > > > > > > > would currently not elide it as no-op (or maybe it does, surely > > > > > > > it could). > > > > > > > > > > > > > > > So the problem here is I can't go back to the old code should > > costing > > > > > > become > > > > > > > > very expensive because of the permute it would need to insert. > > > > > > > > > > > > > > > > So I needed somewhat to reject the cases I know wouldn't > > generate > > > > > > good code. > > > > > > > > > > > > > > Yes - I think we do need to know the pattern is an obvious > > > > improvement > > > > > > > to the non-pattern state. But I think it should always be due to the > > > > > > > removed add or subtract instruction? Or are the complex > > instructions > > > > > > > more expensive than a single add or subtract? > > > > > > > > > > > > > > > > > > > > > > > > > How would we name the x86 instruction patterns which > > implement > > > > > > > > > > > > > > > > > > a[i] = b[i] - c[i]; > > > > > > > > > a[i+1] = b[i+1] + c[i+1]; > > > > > > > > > > > > > > > > > > ? Those do not implement a full complex operation AFAICS > > > > > > > > > so would we name them plusminus3 and > > > > minusplus3 > > > > > > > > > and fmas4, fmsa4? They'd be the prefered > > match > > > > > > > > > (no anticipated permute necessary)? > > > > > > > > > > > > > > > > Yes, that makes sense. If the instructions have no expectations of > > a > > > > > > > > permute. > > > > > > > > > > > > > > > > So the difficult part here is I don't know how to find the right > > balance. > > > > > > > > You're right in that we should be able to accept the add_conj case > > and > > > > > > > > Just emit a permute there, as we have a single instruction for that > > > > > > permute. > > > > > > > > > > > > > > > > I also agree with you that it shouldn't be doing "costing" so early > > on, > > > > > > > > But if I don't do so, my only choices here are that it turns out to be > > > > cheap > > > > > > to do so WIN, > > > > > > > > or it turns out to be expensive to do and we fail vectorization > > entirely > > > > > > (well the loop vectorizer > > > > > > > > would probably try without SLP enabled and generate > > *something*, > > > > but > > > > > > > > the non-loop SLP is a bit out of luck..). > > > > > > > > > > > > > > > > If only there was a way to compare the costs for the non pattern > > > > > > matched tree vs the > > > > > > > > pattern matched one. But that would be quite a big addition at > > this > > > > point. > > > > > > > > > > > > > > But what matters is of course the cost after permute optimization > > did > > > > > > > its work. > > > > > > > > > > > > > > So I wonder if we can match cadd_conj during pattern matching and > > > > > > > wire turning that into cadd90/270 during optimize_slp when we > > know > > > > > > > the permute that is coming along the child? Yes, that would put > > > > > > > knowledge of all of it into that point but thinking of this as > > > > > > > all doable in a separate pattern matching (without re-implementing > > > > > > > all of the permute optimization) doesn't look like it will work? > > > > > > > > > > > > > > That is, when materializing a permute on a cadd_conj child we > > > > > > > can instead turn it into a cadd90/270? We probably need to turn > > > > > > > the materialization loop into an ordered one based on the RPO > > > > > > > order computed earlier. > > > > > > > > > > > > > > And if we just match cadd_conj (and the variant with even/odd > > > > > > > swapped) we could do this directly during SLP discovery as well > > > > > > > where we handle two_operators. Do you have > > > > > > > > > > > > > > Now the question is of course how this interacts with mul and fma/s > > > > > > > but I guess it's always the adds that introduce all the variants. > > > > > > > The mla/mls patterns have a comment > > > > > > > > > > > > > > +;; The complex mla/mls operations always need to expand to two > > > > > > > instructions. > > > > > > > +;; The first operation does half the computation and the second > > does > > > > the > > > > > > > +;; remainder. Because of this, expand early. > > > > > > > > > > > > > > so what are the building blocks there? It makes it sound like > > > > > > > this is a widening multiplication or so? > > > > > > > > > > > > So following up myself after reading the ARM docs regarding to > > > > > > those. It seems this is about FCMLA where two of those can be > > > > > > used to perform full complex multiplication. I think we want > > > > > > to model the individual FCMLA operations and not the complex > > > > > > multiplication itself and also expose the FCMLAs as optabs, > > > > > > not complex multiplication. There seem to be four variants > > > > > > (as opposed to the two cadd ones). > > > > > > > > > > > > rot '00' > > > > > > a[2*i] += b[2*i] * c[2*i] > > > > > > a[2*i+1] += b[2*i] * c[2*i+1] > > > > > > > > > > > > rot '01' > > > > > > a[2*i] += b[2*i+1] * -c[2*i+1] > > > > > > a[2*i+1] += b[2*i+1] * c[2*i] > > > > > > > > > > > > rot '10' > > > > > > a[2*i] += b[2*i] * -c[2*i] > > > > > > a[2*i+1] += b[2*i] * -c[2*i+1] > > > > > > > > > > > > rot '11' > > > > > > a[2*i] += b[2*i+1] * c[2*i+1] > > > > > > a[2*i+1] += b[2*i+1] * -c[2*i] > > > > > > > > > > > > where in practice we'll see the negate handled by turning > > > > > > the add into a subtract which then means the thing to > > > > > > pattern match is scalar by vector multiplication? Again > > > > > > "which" scalar (lane) against which permute of the other > > > > > > vector is going to interact with permute optimizations. > > > > > > But that leaves us with almost nothing special from a regular > > > > > > multiplication - the mixed sign operation will be the add > > > > > > again ... > > > > > > > > > > But the difficulty here is that you need to have both calculations of > > > > > Rot '00' and rot '01' for instance work together, not in parallel. > > > > > > > > > > That is, you have to have Rot '00' go before Rot '01' so the accumulation > > > > > value is correct. You also don't want the vectorizer to think it needs a > > load > > > > > duplicate. Since e.g. rot '00' only uses b[2*i] it would need a lane perm > > > > loading > > > > > [0 0] and you don't want to materialize that. > > > > > > > > > > Partially due to the costing, but also it's really hard to undo permutes in > > RTL. > > > > > > > > > > So I don't think treating them as separate instructions is the best thing > > here. > > > > > > > > > > If instead it's treated like semantically what you want to do this gives > > me > > > > some > > > > > freedom to operate. E.g. We don't have these instructions in NEON > > and > > > > SVE1 on > > > > > integers. But for certain modes they're easy to emulate. > > > > > > > > > > It's a lot easier for a target just to have to implement COMPLEX_MUL > > > > rather than > > > > > the AArch64 semantics for these instructions. > > > > > > > > That's true but we'd leave using the instructions for cases where it > > > > really only does "half" of the COMPLEX_MUL on the plate. On x86 > > > > the instructions are again even less capable by omitting the permute > > > > and just doing even/odd plus-minus FMA variants. > > > > > > But I really don't see how this can be done. > > > > > > If you only have half the operation, you end up with: > > > > > > rot '00' > > > a[2*i] += b[2*i] * c[2*i] > > > a[2*i+1] += b[2*i] * c[2*i+1] > > > > > > Which afaik can't be done on the scalar pattern matcher because when > > seeing a[2*i] you'd > > > need to know about a[2*i+1]. > > > > > > If you do it after SLP construction that's a completely different tree than > > you'd get from > > > COMPLEX_MUL. So this form should never appear in your SLP tree if you > > have both operations > > > to form a valid complex operation. > > > > But the sequence of adds/mults and permutes can appear outside of > > complex > > context. And the SLP pattern matcher would miss the above even though > > there's a 1:1 instruction available because it only looks for the > > combination of two instructions which make up a complex multiplication. > > > > > > So looking at the patch again I see > > > > > > > > void > > > > complex_pattern::build (slp_tree_to_load_perm_map_t *perm_cache, > > > > vec_info *vinfo) > > > > { > > > > ... > > > > FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), ix, tmp) > > > > { > > > > slp_tree vnode = NULL; > > > > if (vect_slp_make_linear (perm_cache, node, tmp, &vnode)) > > > > nodes.safe_push (vnode); > > > > > > > > so we're relying on an exact precise lane order being detected > > > > by the linear stuff rather than it being a heuristic. > > > > > > > > I'd have materialized the very specific reverse permute > > > > anticipated by the actual chosen complex IFN on the second > > > > operand. That's never going to be incorrect then, at most > > > > sub-optimal. Your variant might be incorrect and also > > > > sub-optimal (you still rely on permute optimization to > > > > cancel the linearization permute). > > > > > > But to materialize the reverse permute I'd have to know the > > > Original permute. But what happens if you have both ADDSUB > > > And COMPLEX_ADD? For SVE for instance we can easily emulate > > > ADDSUB using predication, which is likely cheaper if the data requires > > > no permutation.. > > > > Well, if the pattern is a CADD90 then there is a specific permute > > done on the second operand by this very pattern operation. You > > then insert the reverse on the edge to the second operand. > > > > The permute is specified by the CADD90, not by whatever permute > > arrives because you have to reflect what CADD90 does to the > > rest of the SLP tree. No? > > > > > > I hope the specific review comments do not get lost in the > > > > thread discussion the general approach of matching the ARM > > > > complex ops ;) > > > > > > > > > > I usually extract them into one place before I start working. > > > > > > But atm I'm stuck a bit as I don't think we've agreed on an approach > > > that would also work for MUL and MLA. > > > > Honestly I don't have a good idea that is guaranteed to work. I think > > you showed that your approach works to the extent you tested it and > > thus this is the way forward if we want to make GCC 11. As said on > > IRC these patterns somewhat feel like they need a global [permute] > > optimization framework rather than a local pattern matching > > (the Intel x86 vector extensions with just even/odd lane negates > > would be pure local matches). > > > > So below are some more comments on the lane tracking. > > > > static load_permutation_t > > linear_loads_p (slp_tree_to_load_perm_map_t *perm_cache, slp_tree root, > > bool *linear) > > { > > ... > > if ((tmp = perm_cache->get (root)) != NULL) > > { > > *linear = is_linear_load_p (*tmp); > > return *tmp; > > } > > > > it would be nice to avoid is_linear_load_p on cached entries which > > means we'd like to reflect it in the cache itself. From what I > > understand (but what is not documented), the following holds for > > cache entries: > > > > vNULL - nothing is known about 'node' (but we've visited it), kind of > > VARYING > > lperm - the lanes are permuted according to lperm in 'node' based on > > some unknown node(s) > > > > we could make the cache entry a std::pair > > with the enum denoting UNKNOWN, LINEAR, and PERMUTED where for the > > first two the load_permutation_t could be vNULL. > > > > /* If it's a load node, then just read the load permute. */ > > if (SLP_TREE_LOAD_PERMUTATION (root).exists ()) > > { > > loads = SLP_TREE_LOAD_PERMUTATION (root); > > perm_cache->put (root, loads); > > if (!is_linear_load_p (loads)) > > return loads; > > > > since loads are terminal we can always return 'loads' and > > init *linear from is_linear_load_p (loads)? > > > > And as said elsewhere I'd simply treat vect_external_defs and > > > > else if (SLP_TREE_DEF_TYPE (root) != vect_internal_def) > > return vNULL; > > > > as linear. Alternatively there could be a fourth state, VTOP, > > meaning to merge with any other state (aka, we can permute > > externals as we wish at no cost). > > > > slp_tree child; > > FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (root), i, child) > > { > > loads = linear_loads_p (perm_cache, child, linear); > > if ((!*linear && !is_perm) || !loads.exists ()) > > return loads; > > > > all_loads.safe_push (loads); > > } > > > > at merges we have to treat any UNKNOWN child by returning > > UNKNOWN, any VTOP child we can ignore (or pass on if all > > are VTOP), and both LINEAR and PERMUTED needs to match > > for all children to do anything sensible, otherwise we > > need to fall back to UNKNOWN. > > > > Unless we can conservatively treat UNKNOWN as LINEAR > > (just assume we're starting a new vector here). > > > > if (is_perm) > > { > > > > and this then permutes the common lane state. > > > > Since you pre-load with vNULL anything participating in > > cycles will drop to UNKNOWN, but I guess that's fine. > > > > I think we should not need vect_slp_make_linear at all. > > Each and every pattern recognized has a specific intrinsic > > permute we have to reflect - the permute analysis above > > is just to decide which of the patterns we want to choose - it > > can be seen as a heuristic (for anything external or for > > the case we go from UNKNOWN to newly LINEAR). > > Don't I still need it? Albeit a simplified form since I need to > materialize the inverse permute? > > But what It doesn't need to do anymore is re-analyze the permute? Yes, you don't need vect_slp_make_linear or do any analysis, you simply based on the chosen IFN materialize the inverse permute as to what the IFN does internally. So if cadd90 internally permutes operand 2 as { 1, 0 } then you emit the inverse on the child node (which is also { 1, 0 } here). The idea is of course that the internal permute and the emitted cancel. Richard. > Regards, > Tamar > > > > > That would leave the multiplication case were you want to > > merge the splat uses { a[0], a[0] } and { a[1], a[1] }. > > But there it's the same as with the intrinsic permutes > > we model - we have a extract so we anticipate a merge. > > Sth like > > > > note: node 0x4779a00 (max_nunits=4, refcnt=2) > > note: op template: _9 = IMAGPART_EXPR <*_3>; > > note: stmt 0 _9 = IMAGPART_EXPR <*_3>; > > note: stmt 1 _9 = IMAGPART_EXPR <*_3>; > > note: load permutation { 1 1 } > > note: node 0x4779898 (max_nunits=4, refcnt=2) > > note: op template: _10 = REALPART_EXPR <*_3>; > > note: stmt 0 _10 = REALPART_EXPR <*_3>; > > note: stmt 1 _10 = REALPART_EXPR <*_3>; > > note: load permutation { 0 0 } > > > > add: > > > > node > > op: VEC_PERM_EXPR > > lane permutation { 0[0], 1[1] } > > children 0x4779898 0x4779a00 > > > > note when both children are external this will currently > > break so we do have to "fold" that by simplifying it to > > a new external node extracting from the appropriate lanes. > > > > So I think it should work doing it this way. Then the > > most simplistic lane permute analysis would just look > > at the node itself and treat any not load-permuted node > > as LINEAR while returning the actual load permutation > > for load-permuted nodes. > > > > Richard. > > > > > > > Regards, > > > Tamar > > > > > > > Richard. > > > > > > > > > > > > > > > > > Cheers, > > > > > Tamar > > > > > > > > > > > > > > > > > It might be feasible to handle the case of the SLP children > > > > > > being loads themselves in the pattern matching process but > > > > > > I guess you've run into more complex situations since you > > > > > > implemented that "propagation" stuff? The testcases included > > > > > > on the branch seem to be simple direct cases of the ops operating > > > > > > on memory. > > > > > > > > > > > > So in the end matching the ARM operations boils down to > > > > > > exactly tracing participating lanes which sounds more like > > > > > > a dataflow problem rather than a simple (local) pattern matching > > > > > > one. > > > > > > > > > > > > Meh. > > > > > > > > > > > > Richard. > > > > > > > > > > > > > > > > > > > Unfortunately > > > > > > > the patterns are half regular RTL and half unspec so they don't > > > > > > > really specify what is done semantically :/ It would be nice > > > > > > > if the patches with the aarch64 backend changes would be on > > > > > > > trunk already ... (on the branch I don't see anything related > > > > > > > to add_conj for example) > > > > > > > > > > > > > > Btw, do you have any real-world cases that we want to optimize > > > > > > > where there's more than a single to-be-matched operation > > > > > > > operating on memory? > > > > > > > > > > > > > > Thanks, > > > > > > > Richard. > > > > > > > > > > > > > > > Regards, > > > > > > > > Tamar > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks (I hope we can simplify stuff further), > > > > > > > > > Richard. > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > Tamar > > > > > > > > > > > > > > > > > > > > gcc/ChangeLog: > > > > > > > > > > > > > > > > > > > > * tree-vect-slp-patterns.c: New file. > > > > > > > > > > * Makefile.in: Add it. > > > > > > > > > > * doc/passes.texi: Document it. > > > > > > > > > > * internal-fn.def (COMPLEX_ADD_ROT90, > > > > COMPLEX_ADD_ROT270): > > > > > > > > > New. > > > > > > > > > > * optabs.def (cadd90_optab, cadd270_optab): New. > > > > > > > > > > * doc/md.texi: Document them. > > > > > > > > > > * tree-vect-slp.c: > > > > > > > > > > (vect_free_slp_instance, vect_create_new_slp_node): > > > > Export. > > > > > > > > > > (vect_match_slp_patterns_2, vect_match_slp_patterns): > > > > New. > > > > > > > > > > (vect_analyze_slp): Use it. > > > > > > > > > > * tree-vectorizer.h (vect_free_slp_tree): Export. > > > > > > > > > > (enum _complex_operation): Forward declare. > > > > > > > > > > (class vect_pattern): New > > > > > > > > > > > > > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > > > > > > > > > > > > > * lib/target-supports.exp > > > > > > > > > > > > > > (check_effective_target_arm_v8_3a_complex_neon_ok_nocache): > > > > > > > > > Fix it. > > > > > > > > > > (check_effective_target_vect_complex_add_byte > > > > > > > > > > ,check_effective_target_vect_complex_add_int > > > > > > > > > > ,check_effective_target_vect_complex_add_short > > > > > > > > > > ,check_effective_target_vect_complex_add_long > > > > > > > > > > ,check_effective_target_vect_complex_add_half > > > > > > > > > > ,check_effective_target_vect_complex_add_float > > > > > > > > > > ,check_effective_target_vect_complex_add_double): New. > > > > > > > > > > * gcc.dg/vect/complex/bb-slp-complex-add-byte.c: New > > test. > > > > > > > > > > * gcc.dg/vect/complex/bb-slp-complex-add-int.c: New > > test. > > > > > > > > > > * gcc.dg/vect/complex/bb-slp-complex-add-long.c: New > > test. > > > > > > > > > > * gcc.dg/vect/complex/bb-slp-complex-add-pattern- > > byte.c: > > > > New > > > > > > test. > > > > > > > > > > * gcc.dg/vect/complex/bb-slp-complex-add-pattern-int.c: > > > > New > > > > > > test. > > > > > > > > > > * gcc.dg/vect/complex/bb-slp-complex-add-pattern- > > long.c: > > > > New > > > > > > test. > > > > > > > > > > * gcc.dg/vect/complex/bb-slp-complex-add-pattern- > > short.c: > > > > New > > > > > > test. > > > > > > > > > > * gcc.dg/vect/complex/bb-slp-complex-add-pattern- > > > > unsigned- > > > > > > byte.c: > > > > > > > > > New test. > > > > > > > > > > * gcc.dg/vect/complex/bb-slp-complex-add-pattern- > > > > unsigned- > > > > > > int.c: > > > > > > > > > New test. > > > > > > > > > > * gcc.dg/vect/complex/bb-slp-complex-add-pattern- > > > > unsigned- > > > > > > long.c: > > > > > > > > > New test. > > > > > > > > > > * gcc.dg/vect/complex/bb-slp-complex-add-pattern- > > > > unsigned- > > > > > > short.c: > > > > > > > > > New test. > > > > > > > > > > * gcc.dg/vect/complex/bb-slp-complex-add-short.c: New > > test. > > > > > > > > > > * gcc.dg/vect/complex/bb-slp-complex-add-unsigned- > > byte.c: > > > > > > New test. > > > > > > > > > > * gcc.dg/vect/complex/bb-slp-complex-add-unsigned- > > int.c: > > > > New > > > > > > test. > > > > > > > > > > * gcc.dg/vect/complex/bb-slp-complex-add-unsigned- > > long.c: > > > > > > New test. > > > > > > > > > > * gcc.dg/vect/complex/bb-slp-complex-add-unsigned- > > short.c: > > > > > > New > > > > > > > > > test. > > > > > > > > > > * gcc.dg/vect/complex/complex-add-pattern-template.c: > > > > New > > > > > > test. > > > > > > > > > > * gcc.dg/vect/complex/complex-add-template.c: New > > test. > > > > > > > > > > * gcc.dg/vect/complex/complex-operations-run.c: New > > test. > > > > > > > > > > * gcc.dg/vect/complex/complex-operations.c: New test. > > > > > > > > > > * gcc.dg/vect/complex/fast-math-bb-slp-complex-add- > > > > double.c: > > > > > > New > > > > > > > > > test. > > > > > > > > > > * gcc.dg/vect/complex/fast-math-bb-slp-complex-add- > > float.c: > > > > > > New > > > > > > > > > test. > > > > > > > > > > * gcc.dg/vect/complex/fast-math-bb-slp-complex-add- > > half- > > > > > > float.c: > > > > > > > > > New test. > > > > > > > > > > * gcc.dg/vect/complex/fast-math-bb-slp-complex-add- > > > > pattern- > > > > > > > > > double.c: New test. > > > > > > > > > > * gcc.dg/vect/complex/fast-math-bb-slp-complex-add- > > > > pattern- > > > > > > float.c: > > > > > > > > > New test. > > > > > > > > > > * gcc.dg/vect/complex/fast-math-bb-slp-complex-add- > > > > pattern- > > > > > > half- > > > > > > > > > float.c: New test. > > > > > > > > > > * gcc.dg/vect/complex/fast-math-complex-add-double.c: > > > > New > > > > > > test. > > > > > > > > > > * gcc.dg/vect/complex/fast-math-complex-add-float.c: > > New > > > > test. > > > > > > > > > > * gcc.dg/vect/complex/fast-math-complex-add-half- > > float.c: > > > > New > > > > > > test. > > > > > > > > > > * gcc.dg/vect/complex/fast-math-complex-add-pattern- > > > > double.c: > > > > > > New > > > > > > > > > test. > > > > > > > > > > * gcc.dg/vect/complex/fast-math-complex-add-pattern- > > > > float.c: > > > > > > New > > > > > > > > > test. > > > > > > > > > > * gcc.dg/vect/complex/fast-math-complex-add-pattern- > > half- > > > > > > float.c: > > > > > > > > > New test. > > > > > > > > > > * gcc.dg/vect/complex/vect-complex-add-byte.c: New > > test. > > > > > > > > > > * gcc.dg/vect/complex/vect-complex-add-int.c: New test. > > > > > > > > > > * gcc.dg/vect/complex/vect-complex-add-long.c: New > > test. > > > > > > > > > > * gcc.dg/vect/complex/vect-complex-add-pattern-byte.c: > > > > New > > > > > > test. > > > > > > > > > > * gcc.dg/vect/complex/vect-complex-add-pattern-int.c: > > New > > > > test. > > > > > > > > > > * gcc.dg/vect/complex/vect-complex-add-pattern-long.c: > > > > New > > > > > > test. > > > > > > > > > > * gcc.dg/vect/complex/vect-complex-add-pattern-short.c: > > > > New > > > > > > test. > > > > > > > > > > * gcc.dg/vect/complex/vect-complex-add-pattern- > > unsigned- > > > > > > byte.c: > > > > > > > > > New test. > > > > > > > > > > * gcc.dg/vect/complex/vect-complex-add-pattern- > > unsigned- > > > > int.c: > > > > > > New > > > > > > > > > test. > > > > > > > > > > * gcc.dg/vect/complex/vect-complex-add-pattern- > > unsigned- > > > > > > long.c: > > > > > > > > > New test. > > > > > > > > > > * gcc.dg/vect/complex/vect-complex-add-pattern- > > unsigned- > > > > > > short.c: > > > > > > > > > New test. > > > > > > > > > > * gcc.dg/vect/complex/vect-complex-add-short.c: New > > test. > > > > > > > > > > * gcc.dg/vect/complex/vect-complex-add-unsigned- > > byte.c: > > > > New > > > > > > test. > > > > > > > > > > * gcc.dg/vect/complex/vect-complex-add-unsigned-int.c: > > > > New > > > > > > test. > > > > > > > > > > * gcc.dg/vect/complex/vect-complex-add-unsigned-long.c: > > > > New > > > > > > test. > > > > > > > > > > * gcc.dg/vect/complex/vect-complex-add-unsigned- > > short.c: > > > > New > > > > > > test. > > > > > > > > > > > > > > > > > > > > --- inline copy of patch -- > > > > > > > > > > diff --git a/gcc/Makefile.in b/gcc/Makefile.in > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 778ec09c75d9af1cb9f2d5e7582b948c0397db65..d80657b089829fa30cede8bcf > > > > > > > > > e036dda0ec06682 100644 > > > > > > > > > > --- a/gcc/Makefile.in > > > > > > > > > > +++ b/gcc/Makefile.in > > > > > > > > > > @@ -1646,6 +1646,7 @@ OBJS = \ > > > > > > > > > > tree-vect-loop.o \ > > > > > > > > > > tree-vect-loop-manip.o \ > > > > > > > > > > tree-vect-slp.o \ > > > > > > > > > > + tree-vect-slp-patterns.o \ > > > > > > > > > > tree-vectorizer.o \ > > > > > > > > > > tree-vector-builder.o \ > > > > > > > > > > tree-vrp.o \ > > > > > > > > > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > da8c9a283dd42e2b3078ed5f370a37180ee0b538..2a030a1d7373cd2b5837aa1c > > > > > > > > > 99936a6a4e4e1480 100644 > > > > > > > > > > --- a/gcc/doc/md.texi > > > > > > > > > > +++ b/gcc/doc/md.texi > > > > > > > > > > @@ -6154,6 +6154,54 @@ floating-point mode. > > > > > > > > > > > > > > > > > > > > This pattern is not allowed to @code{FAIL}. > > > > > > > > > > > > > > > > > > > > +@cindex @code{cadd90@var{m}3} instruction pattern > > > > > > > > > > +@item @samp{cadd90@var{m}3} > > > > > > > > > > +Perform vector add and subtract on even/odd number pairs. > > > > The > > > > > > > > > operation being > > > > > > > > > > +matched is semantically described as > > > > > > > > > > + > > > > > > > > > > +@smallexample > > > > > > > > > > + for (int i = 0; i < N; i += 2) > > > > > > > > > > + @{ > > > > > > > > > > + c[i] = a[i] - b[i+1]; > > > > > > > > > > + c[i+1] = a[i+1] + b[i]; > > > > > > > > > > + @} > > > > > > > > > > +@end smallexample > > > > > > > > > > + > > > > > > > > > > +This operation is semantically equivalent to performing a > > vector > > > > > > addition > > > > > > > > > of > > > > > > > > > > +complex numbers in operand 1 with operand 2 rotated by 90 > > > > > > degrees > > > > > > > > > around > > > > > > > > > > +the argand plane and storing the result in operand 0. > > > > > > > > > > + > > > > > > > > > > +In GCC lane ordering the real part of the number must be in > > the > > > > > > even > > > > > > > > > lanes with > > > > > > > > > > +the imaginary part in the odd lanes. > > > > > > > > > > + > > > > > > > > > > +The operation is only supported for vector modes @var{m}. > > > > > > > > > > + > > > > > > > > > > +This pattern is not allowed to @code{FAIL}. > > > > > > > > > > + > > > > > > > > > > +@cindex @code{cadd270@var{m}3} instruction pattern > > > > > > > > > > +@item @samp{cadd270@var{m}3} > > > > > > > > > > +Perform vector add and subtract on even/odd number pairs. > > > > The > > > > > > > > > operation being > > > > > > > > > > +matched is semantically described as > > > > > > > > > > + > > > > > > > > > > +@smallexample > > > > > > > > > > + for (int i = 0; i < N; i += 2) > > > > > > > > > > + @{ > > > > > > > > > > + c[i] = a[i] + b[i+1]; > > > > > > > > > > + c[i+1] = a[i+1] - b[i]; > > > > > > > > > > + @} > > > > > > > > > > +@end smallexample > > > > > > > > > > + > > > > > > > > > > +This operation is semantically equivalent to performing a > > vector > > > > > > addition > > > > > > > > > of > > > > > > > > > > +complex numbers in operand 1 with operand 2 rotated by > > 270 > > > > > > degrees > > > > > > > > > around > > > > > > > > > > +the argand plane and storing the result in operand 0. > > > > > > > > > > + > > > > > > > > > > +In GCC lane ordering the real part of the number must be in > > the > > > > > > even > > > > > > > > > lanes with > > > > > > > > > > +the imaginary part in the odd lanes. > > > > > > > > > > + > > > > > > > > > > +The operation is only supported for vector modes @var{m}. > > > > > > > > > > + > > > > > > > > > > +This pattern is not allowed to @code{FAIL}. > > > > > > > > > > + > > > > > > > > > > @cindex @code{ffs@var{m}2} instruction pattern > > > > > > > > > > @item @samp{ffs@var{m}2} > > > > > > > > > > Store into operand 0 one plus the index of the least significant > > 1- > > > > bit > > > > > > > > > > diff --git a/gcc/doc/passes.texi b/gcc/doc/passes.texi > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > a5ae4143a8c1293e674b499120372ee5fe5c412b..c86df5cd843084a5b7933ef99 > > > > > > > > > a23386891a7b0c1 100644 > > > > > > > > > > --- a/gcc/doc/passes.texi > > > > > > > > > > +++ b/gcc/doc/passes.texi > > > > > > > > > > @@ -709,7 +709,8 @@ loop. > > > > > > > > > > The pass is implemented in @file{tree-vectorizer.c} (the main > > > > driver), > > > > > > > > > > @file{tree-vect-loop.c} and @file{tree-vect-loop-manip.c} > > (loop > > > > > > specific > > > > > > > > > parts > > > > > > > > > > and general loop utilities), @file{tree-vect-slp} (loop-aware > > SLP > > > > > > > > > > -functionality), @file{tree-vect-stmts.c} and @file{tree-vect- > > data- > > > > > > refs.c}. > > > > > > > > > > +functionality), @file{tree-vect-stmts.c}, @file{tree-vect-data- > > > > refs.c} > > > > > > and > > > > > > > > > > +@file{tree-vect-slp-patterns.c} containing the SLP pattern > > > > matcher. > > > > > > > > > > Analysis of data references is in @file{tree-data-ref.c}. > > > > > > > > > > > > > > > > > > > > SLP Vectorization. This pass performs vectorization of > > straight- > > > > line > > > > > > code. > > > > > > > > > The > > > > > > > > > > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 310d37aa53819791b5df1683afca831f08e5892a..33c54be1e158ddea25c4cd6b1 > > > > > > > > > 148df8cf4a509b5 100644 > > > > > > > > > > --- a/gcc/internal-fn.def > > > > > > > > > > +++ b/gcc/internal-fn.def > > > > > > > > > > @@ -277,6 +277,9 @@ DEF_INTERNAL_FLT_FN (SCALB, > > > > ECF_CONST, > > > > > > scalb, > > > > > > > > > binary) > > > > > > > > > > DEF_INTERNAL_FLT_FLOATN_FN (FMIN, ECF_CONST, fmin, > > > > binary) > > > > > > > > > > DEF_INTERNAL_FLT_FLOATN_FN (FMAX, ECF_CONST, fmax, > > > > binary) > > > > > > > > > > DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONST, xorsign, > > > > binary) > > > > > > > > > > +DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT90, > > ECF_CONST, > > > > > > cadd90, > > > > > > > > > binary) > > > > > > > > > > +DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, > > > > ECF_CONST, > > > > > > > > > cadd270, binary) > > > > > > > > > > + > > > > > > > > > > > > > > > > > > > > /* FP scales. */ > > > > > > > > > > DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary) > > > > > > > > > > diff --git a/gcc/optabs.def b/gcc/optabs.def > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 5607f51e6b4b775a92d1d8ffcd3e9b53e9270d6c..e9727def4dbf941bb9ac8b56f > > > > > > > > > 83f8ea0f52b262c 100644 > > > > > > > > > > --- a/gcc/optabs.def > > > > > > > > > > +++ b/gcc/optabs.def > > > > > > > > > > @@ -290,6 +290,8 @@ OPTAB_D (atan_optab, "atan$a2") > > > > > > > > > > OPTAB_D (atanh_optab, "atanh$a2") > > > > > > > > > > OPTAB_D (copysign_optab, "copysign$F$a3") > > > > > > > > > > OPTAB_D (xorsign_optab, "xorsign$F$a3") > > > > > > > > > > +OPTAB_D (cadd90_optab, "cadd90$a3") > > > > > > > > > > +OPTAB_D (cadd270_optab, "cadd270$a3") > > > > > > > > > > OPTAB_D (cos_optab, "cos$a2") > > > > > > > > > > OPTAB_D (cosh_optab, "cosh$a2") > > > > > > > > > > OPTAB_D (exp10_optab, "exp10$a2") > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp- > > complex- > > > > add- > > > > > > byte.c > > > > > > > > > b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add- > > byte.c > > > > > > > > > > new file mode 100644 > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 0000000000000000000000000000000000000000..3b1e0837a323364c55094240b > > > > > > > > > 21dcc4938fa37c2 > > > > > > > > > > --- /dev/null > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex- > > add- > > > > byte.c > > > > > > > > > > @@ -0,0 +1,9 @@ > > > > > > > > > > +/* { dg-do compile } */ > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_byte } */ > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > > > > > > > > + > > > > > > > > > > +#define TYPE int8_t > > > > > > > > > > +#define N 16 > > > > > > > > > > +#include > > > > > > > > > > +#include "complex-add-template.c" > > > > > > > > > > \ No newline at end of file > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp- > > complex- > > > > add- > > > > > > int.c > > > > > > > > > b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-int.c > > > > > > > > > > new file mode 100644 > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 0000000000000000000000000000000000000000..33d3d13d629bb831272609c48 > > > > > > > > > 4c78e6d19a7b930 > > > > > > > > > > --- /dev/null > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex- > > add- > > > > int.c > > > > > > > > > > @@ -0,0 +1,9 @@ > > > > > > > > > > +/* { dg-do compile } */ > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_int } */ > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > > > > > > > > + > > > > > > > > > > +#define TYPE int32_t > > > > > > > > > > +#define N 16 > > > > > > > > > > +#include > > > > > > > > > > +#include "complex-add-template.c" > > > > > > > > > > \ No newline at end of file > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp- > > complex- > > > > add- > > > > > > long.c > > > > > > > > > b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add- > > long.c > > > > > > > > > > new file mode 100644 > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 0000000000000000000000000000000000000000..54d0f1d6864c41fc656eeb1af3 > > > > > > > > > 2736ad37dcf381 > > > > > > > > > > --- /dev/null > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex- > > add- > > > > long.c > > > > > > > > > > @@ -0,0 +1,9 @@ > > > > > > > > > > +/* { dg-do compile } */ > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_long } */ > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > > > > > > > > + > > > > > > > > > > +#define TYPE int64_t > > > > > > > > > > +#define N 16 > > > > > > > > > > +#include > > > > > > > > > > +#include "complex-add-template.c" > > > > > > > > > > \ No newline at end of file > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp- > > complex- > > > > add- > > > > > > > > > pattern-byte.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp- > > > > complex- > > > > > > add- > > > > > > > > > pattern-byte.c > > > > > > > > > > new file mode 100644 > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 0000000000000000000000000000000000000000..fac77f7b626c985e4b033818a1 > > > > > > > > > 0f126784d5a9a6 > > > > > > > > > > --- /dev/null > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex- > > add- > > > > > > pattern- > > > > > > > > > byte.c > > > > > > > > > > @@ -0,0 +1,9 @@ > > > > > > > > > > +/* { dg-do compile } */ > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_byte } */ > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > > > > > > > > + > > > > > > > > > > +#define TYPE int8_t > > > > > > > > > > +#define N 16 > > > > > > > > > > +#include > > > > > > > > > > +#include "complex-add-pattern-template.c" > > > > > > > > > > \ No newline at end of file > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp- > > complex- > > > > add- > > > > > > > > > pattern-int.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp- > > complex- > > > > > > add- > > > > > > > > > pattern-int.c > > > > > > > > > > new file mode 100644 > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 0000000000000000000000000000000000000000..41a836c10c8f2f45a521912186 > > > > > > > > > ab8ac5393f69fd > > > > > > > > > > --- /dev/null > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex- > > add- > > > > > > pattern- > > > > > > > > > int.c > > > > > > > > > > @@ -0,0 +1,9 @@ > > > > > > > > > > +/* { dg-do compile } */ > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_int } */ > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > > > > > > > > + > > > > > > > > > > +#define TYPE int32_t > > > > > > > > > > +#define N 16 > > > > > > > > > > +#include > > > > > > > > > > +#include "complex-add-pattern-template.c" > > > > > > > > > > \ No newline at end of file > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp- > > complex- > > > > add- > > > > > > > > > pattern-long.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp- > > > > complex- > > > > > > add- > > > > > > > > > pattern-long.c > > > > > > > > > > new file mode 100644 > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 0000000000000000000000000000000000000000..175f51c46d125578520b5205c8 > > > > > > > > > 6ca8a836174a2f > > > > > > > > > > --- /dev/null > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex- > > add- > > > > > > pattern- > > > > > > > > > long.c > > > > > > > > > > @@ -0,0 +1,9 @@ > > > > > > > > > > +/* { dg-do compile } */ > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_long } */ > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > > > > > > > > + > > > > > > > > > > +#define TYPE int64_t > > > > > > > > > > +#define N 16 > > > > > > > > > > +#include > > > > > > > > > > +#include "complex-add-pattern-template.c" > > > > > > > > > > \ No newline at end of file > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp- > > complex- > > > > add- > > > > > > > > > pattern-short.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp- > > > > complex- > > > > > > add- > > > > > > > > > pattern-short.c > > > > > > > > > > new file mode 100644 > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 0000000000000000000000000000000000000000..c4fe72712a4d90bb5e89e6f6b > > > > > > > > > 2359029715c0bd8 > > > > > > > > > > --- /dev/null > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex- > > add- > > > > > > pattern- > > > > > > > > > short.c > > > > > > > > > > @@ -0,0 +1,9 @@ > > > > > > > > > > +/* { dg-do compile } */ > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_short } > > */ > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > > > > > > > > + > > > > > > > > > > +#define TYPE int16_t > > > > > > > > > > +#define N 16 > > > > > > > > > > +#include > > > > > > > > > > +#include "complex-add-pattern-template.c" > > > > > > > > > > \ No newline at end of file > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp- > > complex- > > > > add- > > > > > > > > > pattern-unsigned-byte.c > > b/gcc/testsuite/gcc.dg/vect/complex/bb- > > > > slp- > > > > > > > > > complex-add-pattern-unsigned-byte.c > > > > > > > > > > new file mode 100644 > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 0000000000000000000000000000000000000000..534a4201d54f73e0419c99a599 > > > > > > > > > 55900b473107c8 > > > > > > > > > > --- /dev/null > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex- > > add- > > > > > > pattern- > > > > > > > > > unsigned-byte.c > > > > > > > > > > @@ -0,0 +1,9 @@ > > > > > > > > > > +/* { dg-do compile } */ > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_byte } */ > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > > > > > > > > + > > > > > > > > > > +#define TYPE uint8_t > > > > > > > > > > +#define N 16 > > > > > > > > > > +#include > > > > > > > > > > +#include "complex-add-pattern-template.c" > > > > > > > > > > \ No newline at end of file > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp- > > complex- > > > > add- > > > > > > > > > pattern-unsigned-int.c > > b/gcc/testsuite/gcc.dg/vect/complex/bb- > > > > slp- > > > > > > > > > complex-add-pattern-unsigned-int.c > > > > > > > > > > new file mode 100644 > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 0000000000000000000000000000000000000000..9e3cf8062668b87962e0c71710 > > > > > > > > > 579939f950651c > > > > > > > > > > --- /dev/null > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex- > > add- > > > > > > pattern- > > > > > > > > > unsigned-int.c > > > > > > > > > > @@ -0,0 +1,9 @@ > > > > > > > > > > +/* { dg-do compile } */ > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_int } */ > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > > > > > > > > + > > > > > > > > > > +#define TYPE uint32_t > > > > > > > > > > +#define N 16 > > > > > > > > > > +#include > > > > > > > > > > +#include "complex-add-pattern-template.c" > > > > > > > > > > \ No newline at end of file > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp- > > complex- > > > > add- > > > > > > > > > pattern-unsigned-long.c > > b/gcc/testsuite/gcc.dg/vect/complex/bb- > > > > slp- > > > > > > > > > complex-add-pattern-unsigned-long.c > > > > > > > > > > new file mode 100644 > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 0000000000000000000000000000000000000000..398fc94154c88f2f9088910e50c > > > > > > > > > 3c1d4cc0ce17f > > > > > > > > > > --- /dev/null > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex- > > add- > > > > > > pattern- > > > > > > > > > unsigned-long.c > > > > > > > > > > @@ -0,0 +1,9 @@ > > > > > > > > > > +/* { dg-do compile } */ > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_long } */ > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > > > > > > > > + > > > > > > > > > > +#define TYPE uint64_t > > > > > > > > > > +#define N 16 > > > > > > > > > > +#include > > > > > > > > > > +#include "complex-add-pattern-template.c" > > > > > > > > > > \ No newline at end of file > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp- > > complex- > > > > add- > > > > > > > > > pattern-unsigned-short.c > > b/gcc/testsuite/gcc.dg/vect/complex/bb- > > > > slp- > > > > > > > > > complex-add-pattern-unsigned-short.c > > > > > > > > > > new file mode 100644 > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 0000000000000000000000000000000000000000..7326d29d86c27056705c6287d > > > > > > > > > a41dd0b85d5cc35 > > > > > > > > > > --- /dev/null > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex- > > add- > > > > > > pattern- > > > > > > > > > unsigned-short.c > > > > > > > > > > @@ -0,0 +1,9 @@ > > > > > > > > > > +/* { dg-do compile } */ > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_short } > > */ > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > > > > > > > > + > > > > > > > > > > +#define TYPE uint16_t > > > > > > > > > > +#define N 16 > > > > > > > > > > +#include > > > > > > > > > > +#include "complex-add-pattern-template.c" > > > > > > > > > > \ No newline at end of file > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp- > > complex- > > > > add- > > > > > > short.c > > > > > > > > > b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add- > > short.c > > > > > > > > > > new file mode 100644 > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 0000000000000000000000000000000000000000..c1ce663dc7ab09875a06ad503 > > > > > > > > > 81acc955dfd1fff > > > > > > > > > > --- /dev/null > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex- > > add- > > > > > > short.c > > > > > > > > > > @@ -0,0 +1,9 @@ > > > > > > > > > > +/* { dg-do compile } */ > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_short } > > */ > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > > > > > > > > + > > > > > > > > > > +#define TYPE int16_t > > > > > > > > > > +#define N 16 > > > > > > > > > > +#include > > > > > > > > > > +#include "complex-add-template.c" > > > > > > > > > > \ No newline at end of file > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp- > > complex- > > > > add- > > > > > > > > > unsigned-byte.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp- > > > > complex- > > > > > > add- > > > > > > > > > unsigned-byte.c > > > > > > > > > > new file mode 100644 > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 0000000000000000000000000000000000000000..8d0c817fdae8e6ff6cdc665d6a > > > > > > > > > 132b4fc322ea61 > > > > > > > > > > --- /dev/null > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex- > > add- > > > > > > unsigned- > > > > > > > > > byte.c > > > > > > > > > > @@ -0,0 +1,9 @@ > > > > > > > > > > +/* { dg-do compile } */ > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_byte } */ > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > > > > > > > > + > > > > > > > > > > +#define TYPE uint8_t > > > > > > > > > > +#define N 16 > > > > > > > > > > +#include > > > > > > > > > > +#include "complex-add-template.c" > > > > > > > > > > \ No newline at end of file > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp- > > complex- > > > > add- > > > > > > > > > unsigned-int.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp- > > > > complex- > > > > > > add- > > > > > > > > > unsigned-int.c > > > > > > > > > > new file mode 100644 > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 0000000000000000000000000000000000000000..3b08ecd0dd80f949ab88d7e74 > > > > > > > > > 7602bb99fea7acc > > > > > > > > > > --- /dev/null > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex- > > add- > > > > > > unsigned- > > > > > > > > > int.c > > > > > > > > > > @@ -0,0 +1,9 @@ > > > > > > > > > > +/* { dg-do compile } */ > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_int } */ > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > > > > > > > > + > > > > > > > > > > +#define TYPE uint32_t > > > > > > > > > > +#define N 16 > > > > > > > > > > +#include > > > > > > > > > > +#include "complex-add-template.c" > > > > > > > > > > \ No newline at end of file > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp- > > complex- > > > > add- > > > > > > > > > unsigned-long.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp- > > > > complex- > > > > > > add- > > > > > > > > > unsigned-long.c > > > > > > > > > > new file mode 100644 > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 0000000000000000000000000000000000000000..4e069ee8297064dcad7447fff6 > > > > > > > > > 012a10a34543e3 > > > > > > > > > > --- /dev/null > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex- > > add- > > > > > > unsigned- > > > > > > > > > long.c > > > > > > > > > > @@ -0,0 +1,9 @@ > > > > > > > > > > +/* { dg-do compile } */ > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_long } */ > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > > > > > > > > + > > > > > > > > > > +#define TYPE uint64_t > > > > > > > > > > +#define N 16 > > > > > > > > > > +#include > > > > > > > > > > +#include "complex-add-template.c" > > > > > > > > > > \ No newline at end of file > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/bb-slp- > > complex- > > > > add- > > > > > > > > > unsigned-short.c b/gcc/testsuite/gcc.dg/vect/complex/bb-slp- > > > > > > complex-add- > > > > > > > > > unsigned-short.c > > > > > > > > > > new file mode 100644 > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 0000000000000000000000000000000000000000..88d21abd3c8ee59901df645cf5 > > > > > > > > > c036c548cc6b1c > > > > > > > > > > --- /dev/null > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex- > > add- > > > > > > unsigned- > > > > > > > > > short.c > > > > > > > > > > @@ -0,0 +1,9 @@ > > > > > > > > > > +/* { dg-do compile } */ > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_short } > > */ > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > > > > > > > > + > > > > > > > > > > +#define TYPE uint16_t > > > > > > > > > > +#define N 16 > > > > > > > > > > +#include > > > > > > > > > > +#include "complex-add-template.c" > > > > > > > > > > \ No newline at end of file > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/complex-add- > > > > pattern- > > > > > > > > > template.c b/gcc/testsuite/gcc.dg/vect/complex/complex-add- > > > > > > pattern- > > > > > > > > > template.c > > > > > > > > > > new file mode 100644 > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 0000000000000000000000000000000000000000..e8b8b19d1708673b17564b31d > > > > > > > > > 22df3443d667277 > > > > > > > > > > --- /dev/null > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/complex-add- > > pattern- > > > > > > > > > template.c > > > > > > > > > > @@ -0,0 +1,60 @@ > > > > > > > > > > +void add90 (TYPE a[restrict N], TYPE b[restrict N], TYPE > > c[restrict > > > > N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i+=2) > > > > > > > > > > + { > > > > > > > > > > + c[i] = a[i] - b[i+1]; > > > > > > > > > > + c[i+1] = a[i+1] + b[i]; > > > > > > > > > > + } > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +/* { dg-final { scan-tree-dump-times > > > > "stmt.*COMPLEX_ADD_ROT90" > > > > > > 1 > > > > > > > > > "vect" } } */ > > > > > > > > > > + > > > > > > > > > > +void add270 (TYPE a[restrict N], TYPE b[restrict N], TYPE > > c[restrict > > > > N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i+=2) > > > > > > > > > > + { > > > > > > > > > > + c[i] = a[i] + b[i+1]; > > > > > > > > > > + c[i+1] = a[i+1] - b[i]; > > > > > > > > > > + } > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +/* { dg-final { scan-tree-dump-times > > > > > > "stmt.*COMPLEX_ADD_ROT270" 1 > > > > > > > > > "vect" } } */ > > > > > > > > > > + > > > > > > > > > > +void addMixed (TYPE a[restrict N], TYPE b[restrict N], TYPE > > > > c[restrict > > > > > > N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i+=4) > > > > > > > > > > + { > > > > > > > > > > + c[i] = a[i] - b[i+1]; > > > > > > > > > > + c[i+1] = a[i+1] + b[i]; > > > > > > > > > > + c[i+2] = a[i+2] + b[i+3]; > > > > > > > > > > + c[i+3] = a[i+3] - b[i+2]; > > > > > > > > > > + } > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +void add90HandUnrolled (TYPE a[restrict N], TYPE b[restrict > > N], > > > > > > > > > > + TYPE c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < (N /2); i+=4) > > > > > > > > > > + { > > > > > > > > > > + c[i] = a[i] - b[i+1]; > > > > > > > > > > + c[i+2] = a[i+2] - b[i+3]; > > > > > > > > > > + c[i+1] = a[i+1] + b[i]; > > > > > > > > > > + c[i+3] = a[i+3] + b[i+2]; > > > > > > > > > > + } > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +/* { dg-final { scan-tree-dump-times > > > > "stmt.*COMPLEX_ADD_ROT90" > > > > > > 1 > > > > > > > > > "vect" } } */ > > > > > > > > > > + > > > > > > > > > > +void add90Hybrid (TYPE a[restrict N], TYPE b[restrict N], TYPE > > > > > > c[restrict N], > > > > > > > > > > + TYPE d[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i+=2) > > > > > > > > > > + { > > > > > > > > > > + c[i] = a[i] - b[i+1]; > > > > > > > > > > + c[i+1] = a[i+1] + b[i]; > > > > > > > > > > + d[i] = a[i] - b[i]; > > > > > > > > > > + d[i+1] = a[i+1] - b[i+1]; > > > > > > > > > > + } > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +/* { dg-final { scan-tree-dump-times > > > > "stmt.*COMPLEX_ADD_ROT90" > > > > > > 2 > > > > > > > > > "vect" } } */ > > > > > > > > > > \ No newline at end of file > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/complex-add- > > > > > > template.c > > > > > > > > > b/gcc/testsuite/gcc.dg/vect/complex/complex-add-template.c > > > > > > > > > > new file mode 100644 > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 0000000000000000000000000000000000000000..afe08e867473695f0a742de330 > > > > > > > > > 944f495bc541d7 > > > > > > > > > > --- /dev/null > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/complex-add- > > > > template.c > > > > > > > > > > @@ -0,0 +1,77 @@ > > > > > > > > > > +void add0 (TYPE _Complex a[restrict N], TYPE _Complex > > > > b[restrict N], > > > > > > > > > > + TYPE _Complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] = a[i] + b[i]; > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +void add90snd (TYPE _Complex a[restrict N], TYPE _Complex > > > > > > b[restrict N], > > > > > > > > > > + TYPE _Complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] = a[i] + (b[i] * 1.0i); > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +/* { dg-final { scan-tree-dump-times > > > > "stmt.*COMPLEX_ADD_ROT90" > > > > > > 1 > > > > > > > > > "vect" } } */ > > > > > > > > > > + > > > > > > > > > > +void add180snd (TYPE _Complex a[restrict N], TYPE _Complex > > > > > > b[restrict N], > > > > > > > > > > + TYPE _Complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] = a[i] + (b[i] * 1.0i * 1.0i); > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +void add270snd (TYPE _Complex a[restrict N], TYPE _Complex > > > > > > b[restrict N], > > > > > > > > > > + TYPE _Complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] = a[i] + b[i]; > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +/* { dg-final { scan-tree-dump-times > > > > > > "stmt.*COMPLEX_ADD_ROT270" 1 > > > > > > > > > "vect" } } */ > > > > > > > > > > + > > > > > > > > > > +void add90fst (TYPE _Complex a[restrict N], TYPE _Complex > > > > > > b[restrict N], > > > > > > > > > > + TYPE _Complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] = (a[i] * 1.0i) + b[i]; > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +/* { dg-final { scan-tree-dump-times > > > > "stmt.*COMPLEX_ADD_ROT90" > > > > > > 1 > > > > > > > > > "vect" } } */ > > > > > > > > > > + > > > > > > > > > > +void add180fst (TYPE _Complex a[restrict N], TYPE _Complex > > > > > > b[restrict N], > > > > > > > > > > + TYPE _Complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] = (a[i] * 1.0i * 1.0i) + b[i]; > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +void add270fst (TYPE _Complex a[restrict N], TYPE _Complex > > > > > > b[restrict N], > > > > > > > > > > + TYPE _Complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] = (a[i] * 1.0i * 1.0i * 1.0i) + b[i]; > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +/* { dg-final { scan-tree-dump-times > > > > > > "stmt.*COMPLEX_ADD_ROT270" 1 > > > > > > > > > "vect" } } */ > > > > > > > > > > + > > > > > > > > > > +void addconjfst (TYPE _Complex a[restrict N], TYPE _Complex > > > > > > b[restrict N], > > > > > > > > > > + TYPE _Complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] = ~a[i] + b[i]; > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +void addconjsnd (TYPE _Complex a[restrict N], TYPE > > _Complex > > > > > > b[restrict > > > > > > > > > N], > > > > > > > > > > + TYPE _Complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] = a[i] + ~b[i]; > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +void addconjboth (TYPE _Complex a[restrict N], TYPE > > _Complex > > > > > > b[restrict > > > > > > > > > N], > > > > > > > > > > + TYPE _Complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] = ~a[i] + ~b[i]; > > > > > > > > > > +} > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/complex- > > > > operations- > > > > > > run.c > > > > > > > > > b/gcc/testsuite/gcc.dg/vect/complex/complex-operations-run.c > > > > > > > > > > new file mode 100644 > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 0000000000000000000000000000000000000000..a0348a7041ca384104bc5ab688 > > > > > > > > > d941c14e5b7381 > > > > > > > > > > --- /dev/null > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/complex- > > operations- > > > > run.c > > > > > > > > > > @@ -0,0 +1,103 @@ > > > > > > > > > > +/* { dg-do run } */ > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_double } > > */ > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > > > > > > > > + > > > > > > > > > > +#include > > > > > > > > > > +#include > > > > > > > > > > +#include > > > > > > > > > > +#include > > > > > > > > > > +#include > > > > > > > > > > + > > > > > > > > > > +#define PREF old > > > > > > > > > > +#pragma GCC push_options > > > > > > > > > > +#pragma GCC optimize ("no-tree-vectorize") > > > > > > > > > > +# include "complex-operations.c" > > > > > > > > > > +#pragma GCC pop_options > > > > > > > > > > +#undef PREF > > > > > > > > > > + > > > > > > > > > > +#define PREF new > > > > > > > > > > +# include "complex-operations.c" > > > > > > > > > > +#undef PREF > > > > > > > > > > + > > > > > > > > > > +#define TYPE double > > > > > > > > > > +#define TYPE2 double > > > > > > > > > > +#define EP pow(2, -45) > > > > > > > > > > + > > > > > > > > > > +#define xstr(s) str(s) > > > > > > > > > > +#define str(s) #s > > > > > > > > > > + > > > > > > > > > > +#define FCMP(A, B) \ > > > > > > > > > > + ((fabs (creal (A) - creal (B)) <= EP) && (fabs (cimag (A) - > > cimag > > > > (B)) > > > > > > <= EP)) > > > > > > > > > > + > > > > > > > > > > +#define CMP(A, B) \ > > > > > > > > > > + (FCMP(A,B) ? "PASS" : "FAIL") > > > > > > > > > > + > > > > > > > > > > +#define COMPARE(A,B) \ > > > > > > > > > > + memset (&c1, 0, sizeof (c1)); \ > > > > > > > > > > + memset (&c2, 0, sizeof (c2)); \ > > > > > > > > > > + A; B; \ > > > > > > > > > > + if (!FCMP(c1[0],c2[0]) || !FCMP(c1[1], c2[1])) \ > > > > > > > > > > + { \ > > > > > > > > > > + printf ("=> %s vs %s\n", xstr (A), xstr (B)); \ > > > > > > > > > > + printf ("%a\n", creal (c1[0]) - creal (c2[0])); \ > > > > > > > > > > + printf ("%a\n", cimag (c1[1]) - cimag (c2[1])); \ > > > > > > > > > > + printf ("%.2f+%.2fI == %.2f+%.2fI (%s)\n", creal (c1[0]), > > cimag > > > > > > (c1[0]), > > > > > > > > > creal (c2[0]), cimag (c2[0]), CMP (c1[0], c2[0])); \ > > > > > > > > > > + printf ("%.2f+%.2fI == %.2f+%.2fI (%s)\n", creal (c1[1]), > > cimag > > > > > > (c1[1]), > > > > > > > > > creal (c2[1]), cimag (c2[1]), CMP (c1[1], c2[1])); \ > > > > > > > > > > + printf ("\n"); \ > > > > > > > > > > + __builtin_abort (); \ > > > > > > > > > > + } > > > > > > > > > > + > > > > > > > > > > +int main () > > > > > > > > > > +{ > > > > > > > > > > + TYPE2 complex a[] = { 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, > > 2.0 + > > > > 3.5 > > > > > > * I, > > > > > > > > > 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 > > + > > > > 3.5 * > > > > > > I, 1.0 > > > > > > > > > + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + > > 3.5 * > > > > I, > > > > > > 1.0 + > > > > > > > > > 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 > > * I, > > > > 1.0 > > > > > > + 3.0 > > > > > > > > > * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I, > > 1.0 > > > > + > > > > > > 3.0 * I, > > > > > > > > > 2.0 + 3.5 * I, 1.0 + 3.0 * I, 2.0 + 3.5 * I }; > > > > > > > > > > + TYPE complex b[] = { 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, > > 2.1 + > > > > 3.6 > > > > > > * I, > > > > > > > > > 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 > > + > > > > 3.6 * > > > > > > I, 1.1 > > > > > > > > > + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + > > 3.6 * > > > > I, > > > > > > 1.1 + > > > > > > > > > 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 > > * I, > > > > 1.1 > > > > > > + 3.1 > > > > > > > > > * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I, > > 1.1 > > > > + > > > > > > 3.1 * I, > > > > > > > > > 2.1 + 3.6 * I, 1.1 + 3.1 * I, 2.1 + 3.6 * I }; > > > > > > > > > > + TYPE complex c2[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, > > > > 0, 0, > > > > > > 0, 0, > > > > > > > > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; > > > > > > > > > > + TYPE complex c1[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > 0, > > > > 0, 0, > > > > > > 0, 0, > > > > > > > > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; > > > > > > > > > > + TYPE diff1, diff2; > > > > > > > > > > + > > > > > > > > > > + COMPARE(fma0_old(a, b, c1), fma0_new(a, b, c2)); > > > > > > > > > > + COMPARE(fma90_old(a, b, c1), fma90_new(a, b, c2)); > > > > > > > > > > + COMPARE(fma180_old(a, b, c1), fma180_new(a, b, c2)); > > > > > > > > > > + COMPARE(fma270_old(a, b, c1), fma270_new(a, b, c2)); > > > > > > > > > > + COMPARE(fma0_snd_old(a, b, c1), fma0_snd_new(a, b, c2)); > > > > > > > > > > + COMPARE(fma90_snd_old(a, b, c1), fma90_snd_new(a, b, > > c2)); > > > > > > > > > > + COMPARE(fma180_snd_old(a, b, c1), fma180_snd_new(a, b, > > > > c2)); > > > > > > > > > > + COMPARE(fma270_snd_old(a, b, c1), fma270_snd_new(a, b, > > > > c2)); > > > > > > > > > > + COMPARE(fma_conj_first_old(a, b, c1), > > fma_conj_first_new(a, > > > > b, > > > > > > c2)); > > > > > > > > > > + COMPARE(fma_conj_second_old(a, b, c1), > > > > > > fma_conj_second_new(a, b, > > > > > > > > > c2)); > > > > > > > > > > + COMPARE(fma_conj_both_old(a, b, c1), > > > > fma_conj_both_new(a, b, > > > > > > c2)); > > > > > > > > > > + COMPARE(fms0_old(a, b, c1), fms0_new(a, b, c2)); > > > > > > > > > > + COMPARE(fms90_old(a, b, c1), fms90_new(a, b, c2)); > > > > > > > > > > + COMPARE(fms180_old(a, b, c1), fms180_new(a, b, c2)); > > > > > > > > > > + COMPARE(fms270_old(a, b, c1), fms270_new(a, b, c2)); > > > > > > > > > > + COMPARE(fms0_snd_old(a, b, c1), fms0_snd_new(a, b, c2)); > > > > > > > > > > + COMPARE(fms90_snd_old(a, b, c1), fms90_snd_new(a, b, > > c2)); > > > > > > > > > > + COMPARE(fms180_snd_old(a, b, c1), fms180_snd_new(a, b, > > > > c2)); > > > > > > > > > > + COMPARE(fms270_snd_old(a, b, c1), fms270_snd_new(a, b, > > > > c2)); > > > > > > > > > > + COMPARE(fms_conj_first_old(a, b, c1), > > fms_conj_first_new(a, > > > > b, > > > > > > c2)); > > > > > > > > > > + COMPARE(fms_conj_second_old(a, b, c1), > > > > > > fms_conj_second_new(a, b, > > > > > > > > > c2)); > > > > > > > > > > + COMPARE(fms_conj_both_old(a, b, c1), > > fms_conj_both_new(a, > > > > b, > > > > > > c2)); > > > > > > > > > > + COMPARE(mul0_old(a, b, c1), mul0_new(a, b, c2)); > > > > > > > > > > + COMPARE(mul90_old(a, b, c1), mul90_new(a, b, c2)); > > > > > > > > > > + COMPARE(mul180_old(a, b, c1), mul180_new(a, b, c2)); > > > > > > > > > > + COMPARE(mul270_old(a, b, c1), mul270_new(a, b, c2)); > > > > > > > > > > + COMPARE(mul0_snd_old(a, b, c1), mul0_snd_new(a, b, c2)); > > > > > > > > > > + COMPARE(mul90_snd_old(a, b, c1), mul90_snd_new(a, b, > > c2)); > > > > > > > > > > + COMPARE(mul180_snd_old(a, b, c1), mul180_snd_new(a, b, > > > > c2)); > > > > > > > > > > + COMPARE(mul270_snd_old(a, b, c1), mul270_snd_new(a, b, > > > > c2)); > > > > > > > > > > + COMPARE(mul_conj_first_old(a, b, c1), > > mul_conj_first_new(a, > > > > b, > > > > > > c2)); > > > > > > > > > > + COMPARE(mul_conj_second_old(a, b, c1), > > > > > > mul_conj_second_new(a, b, > > > > > > > > > c2)); > > > > > > > > > > + COMPARE(mul_conj_both_old(a, b, c1), > > mul_conj_both_new(a, > > > > b, > > > > > > c2)); > > > > > > > > > > + COMPARE(add0_old(a, b, c1), add0_new(a, b, c2)); > > > > > > > > > > + COMPARE(add90_old(a, b, c1), add90_new(a, b, c2)); > > > > > > > > > > + COMPARE(add180_old(a, b, c1), add180_new(a, b, c2)); > > > > > > > > > > + COMPARE(add270_old(a, b, c1), add270_new(a, b, c2)); > > > > > > > > > > + COMPARE(add0_snd_old(a, b, c1), add0_snd_new(a, b, c2)); > > > > > > > > > > + COMPARE(add90_snd_old(a, b, c1), add90_snd_new(a, b, > > c2)); > > > > > > > > > > + COMPARE(add180_snd_old(a, b, c1), add180_snd_new(a, b, > > > > c2)); > > > > > > > > > > + COMPARE(add270_snd_old(a, b, c1), add270_snd_new(a, b, > > > > c2)); > > > > > > > > > > + COMPARE(add_conj_first_old(a, b, c1), > > add_conj_first_new(a, > > > > b, > > > > > > c2)); > > > > > > > > > > + COMPARE(add_conj_second_old(a, b, c1), > > > > > > add_conj_second_new(a, b, > > > > > > > > > c2)); > > > > > > > > > > + COMPARE(add_conj_both_old(a, b, c1), > > add_conj_both_new(a, > > > > b, > > > > > > c2)); > > > > > > > > > > +} > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/complex- > > > > operations.c > > > > > > > > > b/gcc/testsuite/gcc.dg/vect/complex/complex-operations.c > > > > > > > > > > new file mode 100644 > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 0000000000000000000000000000000000000000..fdce995481d23c6a536293c8ee > > > > > > > > > 59eaf9ca9239bf > > > > > > > > > > --- /dev/null > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/complex- > > operations.c > > > > > > > > > > @@ -0,0 +1,358 @@ > > > > > > > > > > +#include > > > > > > > > > > +#include > > > > > > > > > > + > > > > > > > > > > +#ifndef PREF > > > > > > > > > > +#define PREF c > > > > > > > > > > +#endif > > > > > > > > > > + > > > > > > > > > > +#define FX(N,P) P ## _ ## N > > > > > > > > > > +#define MK(N,P) FX(P,N) > > > > > > > > > > + > > > > > > > > > > +#define N 32 > > > > > > > > > > +#define TYPE double > > > > > > > > > > + > > > > > > > > > > +// ------ FMA > > > > > > > > > > + > > > > > > > > > > +// Complex FMA instructions rotating the result > > > > > > > > > > + > > > > > > > > > > +__attribute__((noinline,noipa)) > > > > > > > > > > +void MK(fma0, PREF) (TYPE complex a[restrict N], TYPE > > complex > > > > > > b[restrict > > > > > > > > > N], TYPE complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] += a[i] * b[i]; > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +__attribute__((noinline,noipa)) > > > > > > > > > > +void MK(fma90, PREF) (TYPE complex a[restrict N], TYPE > > complex > > > > > > > > > b[restrict N], TYPE complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] += a[i] * b[i] * I; > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +__attribute__((noinline,noipa)) > > > > > > > > > > +void MK(fma180, PREF) (TYPE complex a[restrict N], TYPE > > > > complex > > > > > > > > > b[restrict N], TYPE complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] += a[i] * b[i] * I * I; > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +__attribute__((noinline,noipa)) > > > > > > > > > > +void MK(fma270, PREF) (TYPE complex a[restrict N], TYPE > > > > complex > > > > > > > > > b[restrict N], TYPE complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] += a[i] * b[i] * I * I * I; > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +// Complex FMA instructions rotating the second parameter. > > > > > > > > > > + > > > > > > > > > > + > > > > > > > > > > +__attribute__((noinline,noipa)) > > > > > > > > > > +void MK(fma0_snd, PREF) (TYPE complex a[restrict N], TYPE > > > > complex > > > > > > > > > b[restrict N], TYPE complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] += a[i] * b[i]; > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +__attribute__((noinline,noipa)) > > > > > > > > > > +void MK(fma90_snd, PREF) (TYPE complex a[restrict N], TYPE > > > > > > complex > > > > > > > > > b[restrict N], TYPE complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] += a[i] * (b[i] * I); > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +__attribute__((noinline,noipa)) > > > > > > > > > > +void MK(fma180_snd, PREF) (TYPE complex a[restrict N], > > TYPE > > > > > > complex > > > > > > > > > b[restrict N], TYPE complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] += a[i] * (b[i] * I * I); > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +__attribute__((noinline,noipa)) > > > > > > > > > > +void MK(fma270_snd, PREF) (TYPE complex a[restrict N], > > TYPE > > > > > > complex > > > > > > > > > b[restrict N], TYPE complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] += a[i] * (b[i] * I * I * I); > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +// Complex FMA instructions with conjucated values. > > > > > > > > > > + > > > > > > > > > > + > > > > > > > > > > +__attribute__((noinline,noipa)) > > > > > > > > > > +void MK(fma_conj_first, PREF) (TYPE complex a[restrict N], > > TYPE > > > > > > complex > > > > > > > > > b[restrict N], TYPE complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] += conj (a[i]) * b[i]; > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +__attribute__((noinline,noipa)) > > > > > > > > > > +void MK(fma_conj_second, PREF) (TYPE complex a[restrict > > N], > > > > TYPE > > > > > > > > > complex b[restrict N], TYPE complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] += a[i] * conj (b[i]); > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +__attribute__((noinline,noipa)) > > > > > > > > > > +void MK(fma_conj_both, PREF) (TYPE complex a[restrict N], > > TYPE > > > > > > complex > > > > > > > > > b[restrict N], TYPE complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] += conj (a[i]) * conj (b[i]); > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +// ----- FMS > > > > > > > > > > + > > > > > > > > > > +// Complex FMS instructions rotating the result > > > > > > > > > > + > > > > > > > > > > +__attribute__((noinline,noipa)) > > > > > > > > > > +void MK(fms0, PREF) (TYPE complex a[restrict N], TYPE > > complex > > > > > > b[restrict > > > > > > > > > N], TYPE complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] -= a[i] * b[i]; > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +__attribute__((noinline,noipa)) > > > > > > > > > > +void MK(fms90, PREF) (TYPE complex a[restrict N], TYPE > > complex > > > > > > > > > b[restrict N], TYPE complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] -= a[i] * b[i] * I; > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +__attribute__((noinline,noipa)) > > > > > > > > > > +void MK(fms180, PREF) (TYPE complex a[restrict N], TYPE > > > > complex > > > > > > > > > b[restrict N], TYPE complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] -= a[i] * b[i] * I * I; > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +__attribute__((noinline,noipa)) > > > > > > > > > > +void MK(fms270, PREF) (TYPE complex a[restrict N], TYPE > > > > complex > > > > > > > > > b[restrict N], TYPE complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] -= a[i] * b[i] * I * I * I; > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +// Complex FMS instructions rotating the second parameter. > > > > > > > > > > + > > > > > > > > > > +__attribute__((noinline,noipa)) > > > > > > > > > > +void MK(fms0_snd, PREF) (TYPE complex a[restrict N], TYPE > > > > complex > > > > > > > > > b[restrict N], TYPE complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] -= a[i] * b[i]; > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +__attribute__((noinline,noipa)) > > > > > > > > > > +void MK(fms90_snd, PREF) (TYPE complex a[restrict N], TYPE > > > > > > complex > > > > > > > > > b[restrict N], TYPE complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] -= a[i] * (b[i] * I); > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +__attribute__((noinline,noipa)) > > > > > > > > > > +void MK(fms180_snd, PREF) (TYPE complex a[restrict N], > > TYPE > > > > > > complex > > > > > > > > > b[restrict N], TYPE complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] -= a[i] * (b[i] * I * I); > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +__attribute__((noinline,noipa)) > > > > > > > > > > +void MK(fms270_snd, PREF) (TYPE complex a[restrict N], > > TYPE > > > > > > complex > > > > > > > > > b[restrict N], TYPE complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] -= a[i] * (b[i] * I * I * I); > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +// Complex FMS instructions with conjucated values. > > > > > > > > > > + > > > > > > > > > > +__attribute__((noinline,noipa)) > > > > > > > > > > +void MK(fms_conj_first, PREF) (TYPE complex a[restrict N], > > TYPE > > > > > > complex > > > > > > > > > b[restrict N], TYPE complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] -= conj (a[i]) * b[i]; > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +__attribute__((noinline,noipa)) > > > > > > > > > > +void MK(fms_conj_second, PREF) (TYPE complex a[restrict N], > > > > TYPE > > > > > > > > > complex b[restrict N], TYPE complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] -= a[i] * conj (b[i]); > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +__attribute__((noinline,noipa)) > > > > > > > > > > +void MK(fms_conj_both, PREF) (TYPE complex a[restrict N], > > TYPE > > > > > > complex > > > > > > > > > b[restrict N], TYPE complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] -= conj (a[i]) * conj (b[i]); > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > + > > > > > > > > > > +// ----- MUL > > > > > > > > > > + > > > > > > > > > > +// Complex MUL instructions rotating the result > > > > > > > > > > + > > > > > > > > > > +__attribute__((noinline,noipa)) > > > > > > > > > > +void MK(mul0, PREF) (TYPE complex a[restrict N], TYPE > > complex > > > > > > b[restrict > > > > > > > > > N], TYPE complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] = a[i] * b[i]; > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +__attribute__((noinline,noipa)) > > > > > > > > > > +void MK(mul90, PREF) (TYPE complex a[restrict N], TYPE > > complex > > > > > > > > > b[restrict N], TYPE complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] = a[i] * b[i] * I; > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +__attribute__((noinline,noipa)) > > > > > > > > > > +void MK(mul180, PREF) (TYPE complex a[restrict N], TYPE > > > > complex > > > > > > > > > b[restrict N], TYPE complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] = a[i] * b[i] * I * I; > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +__attribute__((noinline,noipa)) > > > > > > > > > > +void MK(mul270, PREF) (TYPE complex a[restrict N], TYPE > > > > complex > > > > > > > > > b[restrict N], TYPE complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] = a[i] * b[i] * I * I * I; > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +// Complex MUL instructions rotating the second parameter. > > > > > > > > > > + > > > > > > > > > > +__attribute__((noinline,noipa)) > > > > > > > > > > +void MK(mul0_snd, PREF) (TYPE complex a[restrict N], TYPE > > > > complex > > > > > > > > > b[restrict N], TYPE complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] = a[i] * b[i]; > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +__attribute__((noinline,noipa)) > > > > > > > > > > +void MK(mul90_snd, PREF) (TYPE complex a[restrict N], TYPE > > > > > > complex > > > > > > > > > b[restrict N], TYPE complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] = a[i] * (b[i] * I); > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +__attribute__((noinline,noipa)) > > > > > > > > > > +void MK(mul180_snd, PREF) (TYPE complex a[restrict N], > > TYPE > > > > > > complex > > > > > > > > > b[restrict N], TYPE complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] = a[i] * (b[i] * I * I); > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +__attribute__((noinline,noipa)) > > > > > > > > > > +void MK(mul270_snd, PREF) (TYPE complex a[restrict N], > > TYPE > > > > > > complex > > > > > > > > > b[restrict N], TYPE complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] = a[i] * (b[i] * I * I * I); > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +// Complex FMS instructions with conjucated values. > > > > > > > > > > + > > > > > > > > > > +__attribute__((noinline,noipa)) > > > > > > > > > > +void MK(mul_conj_first, PREF) (TYPE complex a[restrict N], > > TYPE > > > > > > complex > > > > > > > > > b[restrict N], TYPE complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] = conj (a[i]) * b[i]; > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +__attribute__((noinline,noipa)) > > > > > > > > > > +void MK(mul_conj_second, PREF) (TYPE complex a[restrict N], > > > > TYPE > > > > > > > > > complex b[restrict N], TYPE complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] = a[i] * conj (b[i]); > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +__attribute__((noinline,noipa)) > > > > > > > > > > +void MK(mul_conj_both, PREF) (TYPE complex a[restrict N], > > TYPE > > > > > > complex > > > > > > > > > b[restrict N], TYPE complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] = conj (a[i]) * conj (b[i]); > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > + > > > > > > > > > > +// ----- ADD > > > > > > > > > > + > > > > > > > > > > +// Complex ADD instructions rotating the result > > > > > > > > > > + > > > > > > > > > > +__attribute__((noinline,noipa)) > > > > > > > > > > +void MK(add0, PREF) (TYPE complex a[restrict N], TYPE > > complex > > > > > > b[restrict > > > > > > > > > N], TYPE complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] = a[i] + b[i]; > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +__attribute__((noinline,noipa)) > > > > > > > > > > +void MK(add90, PREF) (TYPE complex a[restrict N], TYPE > > complex > > > > > > > > > b[restrict N], TYPE complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] = (a[i] + b[i]) * I; > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +__attribute__((noinline,noipa)) > > > > > > > > > > +void MK(add180, PREF) (TYPE complex a[restrict N], TYPE > > > > complex > > > > > > > > > b[restrict N], TYPE complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] = (a[i] + b[i]) * I * I; > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +__attribute__((noinline,noipa)) > > > > > > > > > > +void MK(add270, PREF) (TYPE complex a[restrict N], TYPE > > > > complex > > > > > > > > > b[restrict N], TYPE complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] = (a[i] + b[i]) * I * I * I; > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +// Complex ADD instructions rotating the second parameter. > > > > > > > > > > + > > > > > > > > > > +__attribute__((noinline,noipa)) > > > > > > > > > > +void MK(add0_snd, PREF) (TYPE complex a[restrict N], TYPE > > > > complex > > > > > > > > > b[restrict N], TYPE complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] = a[i] + b[i]; > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +__attribute__((noinline,noipa)) > > > > > > > > > > +void MK(add90_snd, PREF) (TYPE complex a[restrict N], TYPE > > > > > > complex > > > > > > > > > b[restrict N], TYPE complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] = a[i] + (b[i] * I); > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +__attribute__((noinline,noipa)) > > > > > > > > > > +void MK(add180_snd, PREF) (TYPE complex a[restrict N], > > TYPE > > > > > > complex > > > > > > > > > b[restrict N], TYPE complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] = a[i] + (b[i] * I * I); > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +__attribute__((noinline,noipa)) > > > > > > > > > > +void MK(add270_snd, PREF) (TYPE complex a[restrict N], > > TYPE > > > > > > complex > > > > > > > > > b[restrict N], TYPE complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] = a[i] + (b[i] * I * I * I); > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +// Complex ADD instructions with conjucated values. > > > > > > > > > > + > > > > > > > > > > +__attribute__((noinline,noipa)) > > > > > > > > > > +void MK(add_conj_first, PREF) (TYPE complex a[restrict N], > > TYPE > > > > > > complex > > > > > > > > > b[restrict N], TYPE complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] = conj (a[i]) + b[i]; > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +__attribute__((noinline,noipa)) > > > > > > > > > > +void MK(add_conj_second, PREF) (TYPE complex a[restrict N], > > > > TYPE > > > > > > > > > complex b[restrict N], TYPE complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] = a[i] + conj (b[i]); > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +__attribute__((noinline,noipa)) > > > > > > > > > > +void MK(add_conj_both, PREF) (TYPE complex a[restrict N], > > TYPE > > > > > > complex > > > > > > > > > b[restrict N], TYPE complex c[restrict N]) > > > > > > > > > > +{ > > > > > > > > > > + for (int i=0; i < N; i++) > > > > > > > > > > + c[i] = conj (a[i]) + conj (b[i]); > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > + > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb- > > slp- > > > > > > complex- > > > > > > > > > add-double.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math- > > bb- > > > > slp- > > > > > > > > > complex-add-double.c > > > > > > > > > > new file mode 100644 > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 0000000000000000000000000000000000000000..b5c252b176c7c21c9484574edc > > > > > > > > > 9a56d9d142e13c > > > > > > > > > > --- /dev/null > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp- > > > > complex- > > > > > > add- > > > > > > > > > double.c > > > > > > > > > > @@ -0,0 +1,8 @@ > > > > > > > > > > +/* { dg-do compile } */ > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_double } > > */ > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > > > > > > > > + > > > > > > > > > > +#define TYPE double > > > > > > > > > > +#define N 16 > > > > > > > > > > +#include "complex-add-template.c" > > > > > > > > > > \ No newline at end of file > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb- > > slp- > > > > > > complex- > > > > > > > > > add-float.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb- > > slp- > > > > > > complex- > > > > > > > > > add-float.c > > > > > > > > > > new file mode 100644 > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 0000000000000000000000000000000000000000..1a08e00bcede874d6acac9e2e > > > > > > > > > bece5851c583530 > > > > > > > > > > --- /dev/null > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp- > > > > complex- > > > > > > add- > > > > > > > > > float.c > > > > > > > > > > @@ -0,0 +1,8 @@ > > > > > > > > > > +/* { dg-do compile } */ > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_float } */ > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > > > > > > > > + > > > > > > > > > > +#define TYPE float > > > > > > > > > > +#define N 16 > > > > > > > > > > +#include "complex-add-template.c" > > > > > > > > > > \ No newline at end of file > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb- > > slp- > > > > > > complex- > > > > > > > > > add-half-float.c b/gcc/testsuite/gcc.dg/vect/complex/fast- > > math- > > > > bb- > > > > > > slp- > > > > > > > > > complex-add-half-float.c > > > > > > > > > > new file mode 100644 > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 0000000000000000000000000000000000000000..e4d5c55c0a88f4ac8d45262ee1 > > > > > > > > > 3632443318931f > > > > > > > > > > --- /dev/null > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp- > > > > complex- > > > > > > add- > > > > > > > > > half-float.c > > > > > > > > > > @@ -0,0 +1,8 @@ > > > > > > > > > > +/* { dg-do compile } */ > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_half } */ > > > > > > > > > > +/* { dg-add-options arm_v8_3a_fp16_complex_neon } */ > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > > > > > > > > + > > > > > > > > > > +#define TYPE _Float16 > > > > > > > > > > +#define N 16 > > > > > > > > > > +#include "complex-add-template.c" > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb- > > slp- > > > > > > complex- > > > > > > > > > add-pattern-double.c > > b/gcc/testsuite/gcc.dg/vect/complex/fast- > > > > math- > > > > > > bb- > > > > > > > > > slp-complex-add-pattern-double.c > > > > > > > > > > new file mode 100644 > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 0000000000000000000000000000000000000000..6dd3f98a7a52b21f0365cd6c43 > > > > > > > > > 94b20927a6a320 > > > > > > > > > > --- /dev/null > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp- > > > > complex- > > > > > > add- > > > > > > > > > pattern-double.c > > > > > > > > > > @@ -0,0 +1,8 @@ > > > > > > > > > > +/* { dg-do compile } */ > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_double } > > */ > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > > > > > > > > + > > > > > > > > > > +#define TYPE double > > > > > > > > > > +#define N 16 > > > > > > > > > > +#include "complex-add-pattern-template.c" > > > > > > > > > > \ No newline at end of file > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb- > > slp- > > > > > > complex- > > > > > > > > > add-pattern-float.c b/gcc/testsuite/gcc.dg/vect/complex/fast- > > > > math- > > > > > > bb-slp- > > > > > > > > > complex-add-pattern-float.c > > > > > > > > > > new file mode 100644 > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 0000000000000000000000000000000000000000..3d02cd455340e9510ae536d8d > > > > > > > > > 109b39f811743f0 > > > > > > > > > > --- /dev/null > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp- > > > > complex- > > > > > > add- > > > > > > > > > pattern-float.c > > > > > > > > > > @@ -0,0 +1,8 @@ > > > > > > > > > > +/* { dg-do compile } */ > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_float } */ > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > > > > > > > > + > > > > > > > > > > +#define TYPE float > > > > > > > > > > +#define N 16 > > > > > > > > > > +#include "complex-add-pattern-template.c" > > > > > > > > > > \ No newline at end of file > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb- > > slp- > > > > > > complex- > > > > > > > > > add-pattern-half-float.c > > b/gcc/testsuite/gcc.dg/vect/complex/fast- > > > > > > math-bb- > > > > > > > > > slp-complex-add-pattern-half-float.c > > > > > > > > > > new file mode 100644 > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 0000000000000000000000000000000000000000..51dcd2724f51cb2d91f0aa234a > > > > > > > > > bc39f92275aa42 > > > > > > > > > > --- /dev/null > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-bb-slp- > > > > complex- > > > > > > add- > > > > > > > > > pattern-half-float.c > > > > > > > > > > @@ -0,0 +1,8 @@ > > > > > > > > > > +/* { dg-do compile } */ > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_half } */ > > > > > > > > > > +/* { dg-add-options arm_v8_3a_fp16_complex_neon } */ > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > > > > > > > > + > > > > > > > > > > +#define TYPE _Float16 > > > > > > > > > > +#define N 16 > > > > > > > > > > +#include "complex-add-pattern-template.c" > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math- > > > > complex- > > > > > > add- > > > > > > > > > double.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math- > > complex- > > > > > > add- > > > > > > > > > double.c > > > > > > > > > > new file mode 100644 > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 0000000000000000000000000000000000000000..606b8992b4890e4e221315776 > > > > > > > > > 1bfac62f72aa40e > > > > > > > > > > --- /dev/null > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex- > > > > add- > > > > > > > > > double.c > > > > > > > > > > @@ -0,0 +1,8 @@ > > > > > > > > > > +/* { dg-do compile } */ > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_double } > > */ > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > > > > > > > > + > > > > > > > > > > +#define TYPE double > > > > > > > > > > +#define N 200 > > > > > > > > > > +#include "complex-add-template.c" > > > > > > > > > > \ No newline at end of file > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math- > > > > complex- > > > > > > add- > > > > > > > > > float.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math- > > complex- > > > > add- > > > > > > float.c > > > > > > > > > > new file mode 100644 > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 0000000000000000000000000000000000000000..5c640f0b14107b7cb8ad153597 > > > > > > > > > 5d266e00b1d1b2 > > > > > > > > > > --- /dev/null > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex- > > > > add- > > > > > > float.c > > > > > > > > > > @@ -0,0 +1,8 @@ > > > > > > > > > > +/* { dg-do compile } */ > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_float } */ > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > > > > > > > > + > > > > > > > > > > +#define TYPE float > > > > > > > > > > +#define N 200 > > > > > > > > > > +#include "complex-add-template.c" > > > > > > > > > > \ No newline at end of file > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math- > > > > complex- > > > > > > add- > > > > > > > > > half-float.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math- > > > > complex- > > > > > > add- > > > > > > > > > half-float.c > > > > > > > > > > new file mode 100644 > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 0000000000000000000000000000000000000000..6111356cbd4a9c86a9356bf674 > > > > > > > > > 70512db44cfed2 > > > > > > > > > > --- /dev/null > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex- > > > > add- > > > > > > half- > > > > > > > > > float.c > > > > > > > > > > @@ -0,0 +1,8 @@ > > > > > > > > > > +/* { dg-do compile } */ > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_half } */ > > > > > > > > > > +/* { dg-add-options arm_v8_3a_fp16_complex_neon } */ > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > > > > > > > > + > > > > > > > > > > +#define TYPE _Float16 > > > > > > > > > > +#define N 200 > > > > > > > > > > +#include "complex-add-template.c" > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math- > > > > complex- > > > > > > add- > > > > > > > > > pattern-double.c b/gcc/testsuite/gcc.dg/vect/complex/fast- > > math- > > > > > > complex- > > > > > > > > > add-pattern-double.c > > > > > > > > > > new file mode 100644 > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 0000000000000000000000000000000000000000..00f383d8cfddd1176cf4894ac7f > > > > > > > > > d4d0ae9bcb297 > > > > > > > > > > --- /dev/null > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex- > > > > add- > > > > > > > > > pattern-double.c > > > > > > > > > > @@ -0,0 +1,8 @@ > > > > > > > > > > +/* { dg-do compile } */ > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_double } > > */ > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > > > > > > > > + > > > > > > > > > > +#define TYPE double > > > > > > > > > > +#define N 200 > > > > > > > > > > +#include "complex-add-pattern-template.c" > > > > > > > > > > \ No newline at end of file > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math- > > > > complex- > > > > > > add- > > > > > > > > > pattern-float.c b/gcc/testsuite/gcc.dg/vect/complex/fast-math- > > > > > > complex- > > > > > > > > > add-pattern-float.c > > > > > > > > > > new file mode 100644 > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 0000000000000000000000000000000000000000..ed108b14a3b704819a3c425b4 > > > > > > > > > d19d1103aeb432d > > > > > > > > > > --- /dev/null > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex- > > > > add- > > > > > > > > > pattern-float.c > > > > > > > > > > @@ -0,0 +1,8 @@ > > > > > > > > > > +/* { dg-do compile } */ > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_float } */ > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > > > > > > > > + > > > > > > > > > > +#define TYPE float > > > > > > > > > > +#define N 200 > > > > > > > > > > +#include "complex-add-pattern-template.c" > > > > > > > > > > \ No newline at end of file > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/fast-math- > > > > complex- > > > > > > add- > > > > > > > > > pattern-half-float.c b/gcc/testsuite/gcc.dg/vect/complex/fast- > > > > math- > > > > > > > > > complex-add-pattern-half-float.c > > > > > > > > > > new file mode 100644 > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 0000000000000000000000000000000000000000..aa239445a6563ea0ee15751a7 > > > > > > > > > f6a989fb1c9d9a7 > > > > > > > > > > --- /dev/null > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/fast-math-complex- > > > > add- > > > > > > > > > pattern-half-float.c > > > > > > > > > > @@ -0,0 +1,8 @@ > > > > > > > > > > +/* { dg-do compile } */ > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_half } */ > > > > > > > > > > +/* { dg-add-options arm_v8_3a_fp16_complex_neon } */ > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > > > > > > > > + > > > > > > > > > > +#define TYPE _Float16 > > > > > > > > > > +#define N 200 > > > > > > > > > > +#include "complex-add-pattern-template.c" > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex- > > add- > > > > > > byte.c > > > > > > > > > b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-byte.c > > > > > > > > > > new file mode 100644 > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 0000000000000000000000000000000000000000..4001f689671e0973b64665e6b > > > > > > > > > 9ea96c755277fae > > > > > > > > > > --- /dev/null > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add- > > > > byte.c > > > > > > > > > > @@ -0,0 +1,9 @@ > > > > > > > > > > +/* { dg-do compile } */ > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_byte } */ > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > > > > > > > > + > > > > > > > > > > +#define TYPE int8_t > > > > > > > > > > +#define N 200 > > > > > > > > > > +#include > > > > > > > > > > +#include "complex-add-template.c" > > > > > > > > > > \ No newline at end of file > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex- > > add- > > > > int.c > > > > > > > > > b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-int.c > > > > > > > > > > new file mode 100644 > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 0000000000000000000000000000000000000000..1f006556af09027f22cefe12947 > > > > > > > > > 5bd7e977054a0 > > > > > > > > > > --- /dev/null > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add- > > int.c > > > > > > > > > > @@ -0,0 +1,9 @@ > > > > > > > > > > +/* { dg-do compile } */ > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_int } */ > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > > > > > > > > + > > > > > > > > > > +#define TYPE int32_t > > > > > > > > > > +#define N 200 > > > > > > > > > > +#include > > > > > > > > > > +#include "complex-add-template.c" > > > > > > > > > > \ No newline at end of file > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex- > > add- > > > > > > long.c > > > > > > > > > b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-long.c > > > > > > > > > > new file mode 100644 > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 0000000000000000000000000000000000000000..1e82657abf8316228e13651d1 > > > > > > > > > 11b7d256d0f266f > > > > > > > > > > --- /dev/null > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add- > > > > long.c > > > > > > > > > > @@ -0,0 +1,9 @@ > > > > > > > > > > +/* { dg-do compile } */ > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_long } */ > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > > > > > > > > + > > > > > > > > > > +#define TYPE int64_t > > > > > > > > > > +#define N 200 > > > > > > > > > > +#include > > > > > > > > > > +#include "complex-add-template.c" > > > > > > > > > > \ No newline at end of file > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex- > > add- > > > > > > pattern- > > > > > > > > > byte.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add- > > > > > > pattern- > > > > > > > > > byte.c > > > > > > > > > > new file mode 100644 > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 0000000000000000000000000000000000000000..db72e147c9dc4511fb46a0366 > > > > > > > > > 79b7ba77b97ffe3 > > > > > > > > > > --- /dev/null > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add- > > > > > > pattern- > > > > > > > > > byte.c > > > > > > > > > > @@ -0,0 +1,9 @@ > > > > > > > > > > +/* { dg-do compile } */ > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_byte } */ > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > > > > > > > > + > > > > > > > > > > +#define TYPE int8_t > > > > > > > > > > +#define N 200 > > > > > > > > > > +#include > > > > > > > > > > +#include "complex-add-pattern-template.c" > > > > > > > > > > \ No newline at end of file > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex- > > add- > > > > > > pattern- > > > > > > > > > int.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add- > > > > pattern- > > > > > > int.c > > > > > > > > > > new file mode 100644 > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 0000000000000000000000000000000000000000..8d350d69ae0eefba073aba8ae > > > > > > > > > 7b3da4b39c845df > > > > > > > > > > --- /dev/null > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add- > > > > > > pattern-int.c > > > > > > > > > > @@ -0,0 +1,9 @@ > > > > > > > > > > +/* { dg-do compile } */ > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_int } */ > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > > > > > > > > + > > > > > > > > > > +#define TYPE int32_t > > > > > > > > > > +#define N 200 > > > > > > > > > > +#include > > > > > > > > > > +#include "complex-add-pattern-template.c" > > > > > > > > > > \ No newline at end of file > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex- > > add- > > > > > > pattern- > > > > > > > > > long.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add- > > > > > > pattern- > > > > > > > > > long.c > > > > > > > > > > new file mode 100644 > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 0000000000000000000000000000000000000000..c8e56cd4f91bc6254a5fb2177b > > > > > > > > > 1f2484859bcf98 > > > > > > > > > > --- /dev/null > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add- > > > > > > pattern- > > > > > > > > > long.c > > > > > > > > > > @@ -0,0 +1,9 @@ > > > > > > > > > > +/* { dg-do compile } */ > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_long } */ > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > > > > > > > > + > > > > > > > > > > +#define TYPE int64_t > > > > > > > > > > +#define N 200 > > > > > > > > > > +#include > > > > > > > > > > +#include "complex-add-pattern-template.c" > > > > > > > > > > \ No newline at end of file > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex- > > add- > > > > > > pattern- > > > > > > > > > short.c b/gcc/testsuite/gcc.dg/vect/complex/vect-complex- > > add- > > > > > > pattern- > > > > > > > > > short.c > > > > > > > > > > new file mode 100644 > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 0000000000000000000000000000000000000000..2c54d756c9b2f54352d6dba97c > > > > > > > > > cf05d37865cbaa > > > > > > > > > > --- /dev/null > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add- > > > > > > pattern- > > > > > > > > > short.c > > > > > > > > > > @@ -0,0 +1,9 @@ > > > > > > > > > > +/* { dg-do compile } */ > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_short } > > */ > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > > > > > > > > + > > > > > > > > > > +#define TYPE int16_t > > > > > > > > > > +#define N 200 > > > > > > > > > > +#include > > > > > > > > > > +#include "complex-add-pattern-template.c" > > > > > > > > > > \ No newline at end of file > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex- > > add- > > > > > > pattern- > > > > > > > > > unsigned-byte.c b/gcc/testsuite/gcc.dg/vect/complex/vect- > > > > complex- > > > > > > add- > > > > > > > > > pattern-unsigned-byte.c > > > > > > > > > > new file mode 100644 > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 0000000000000000000000000000000000000000..f54b903aa308a5dc68654b9ffd > > > > > > > > > 0a0c230f58e4cc > > > > > > > > > > --- /dev/null > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add- > > > > > > pattern- > > > > > > > > > unsigned-byte.c > > > > > > > > > > @@ -0,0 +1,9 @@ > > > > > > > > > > +/* { dg-do compile } */ > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_byte } */ > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > > > > > > > > + > > > > > > > > > > +#define TYPE uint8_t > > > > > > > > > > +#define N 200 > > > > > > > > > > +#include > > > > > > > > > > +#include "complex-add-pattern-template.c" > > > > > > > > > > \ No newline at end of file > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex- > > add- > > > > > > pattern- > > > > > > > > > unsigned-int.c b/gcc/testsuite/gcc.dg/vect/complex/vect- > > complex- > > > > > > add- > > > > > > > > > pattern-unsigned-int.c > > > > > > > > > > new file mode 100644 > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 0000000000000000000000000000000000000000..96824f16b821236f5499dcb904 > > > > > > > > > 54e72a1326df5c > > > > > > > > > > --- /dev/null > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add- > > > > > > pattern- > > > > > > > > > unsigned-int.c > > > > > > > > > > @@ -0,0 +1,9 @@ > > > > > > > > > > +/* { dg-do compile } */ > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_int } */ > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > > > > > > > > + > > > > > > > > > > +#define TYPE uint32_t > > > > > > > > > > +#define N 200 > > > > > > > > > > +#include > > > > > > > > > > +#include "complex-add-pattern-template.c" > > > > > > > > > > \ No newline at end of file > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex- > > add- > > > > > > pattern- > > > > > > > > > unsigned-long.c b/gcc/testsuite/gcc.dg/vect/complex/vect- > > > > complex- > > > > > > add- > > > > > > > > > pattern-unsigned-long.c > > > > > > > > > > new file mode 100644 > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 0000000000000000000000000000000000000000..8bd9f077b233eaf6e0c4ff4df9 > > > > > > > > > b97c109df7d002 > > > > > > > > > > --- /dev/null > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add- > > > > > > pattern- > > > > > > > > > unsigned-long.c > > > > > > > > > > @@ -0,0 +1,9 @@ > > > > > > > > > > +/* { dg-do compile } */ > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_long } */ > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > > > > > > > > + > > > > > > > > > > +#define TYPE uint64_t > > > > > > > > > > +#define N 200 > > > > > > > > > > +#include > > > > > > > > > > +#include "complex-add-pattern-template.c" > > > > > > > > > > \ No newline at end of file > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex- > > add- > > > > > > pattern- > > > > > > > > > unsigned-short.c b/gcc/testsuite/gcc.dg/vect/complex/vect- > > > > complex- > > > > > > add- > > > > > > > > > pattern-unsigned-short.c > > > > > > > > > > new file mode 100644 > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 0000000000000000000000000000000000000000..7e5154d73703512dceda39e37 > > > > > > > > > f0ebd0eb7c2e057 > > > > > > > > > > --- /dev/null > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add- > > > > > > pattern- > > > > > > > > > unsigned-short.c > > > > > > > > > > @@ -0,0 +1,9 @@ > > > > > > > > > > +/* { dg-do compile } */ > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_short } > > */ > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > > > > > > > > + > > > > > > > > > > +#define TYPE uint16_t > > > > > > > > > > +#define N 200 > > > > > > > > > > +#include > > > > > > > > > > +#include "complex-add-pattern-template.c" > > > > > > > > > > \ No newline at end of file > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex- > > add- > > > > > > short.c > > > > > > > > > b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add-short.c > > > > > > > > > > new file mode 100644 > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 0000000000000000000000000000000000000000..ca0d618b991255f3ba34ee40f > > > > > > > > > b876fd053e8121b > > > > > > > > > > --- /dev/null > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add- > > > > short.c > > > > > > > > > > @@ -0,0 +1,9 @@ > > > > > > > > > > +/* { dg-do compile } */ > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_short } > > */ > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > > > > > > > > + > > > > > > > > > > +#define TYPE int16_t > > > > > > > > > > +#define N 200 > > > > > > > > > > +#include > > > > > > > > > > +#include "complex-add-template.c" > > > > > > > > > > \ No newline at end of file > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex- > > add- > > > > > > > > > unsigned-byte.c b/gcc/testsuite/gcc.dg/vect/complex/vect- > > > > complex- > > > > > > add- > > > > > > > > > unsigned-byte.c > > > > > > > > > > new file mode 100644 > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 0000000000000000000000000000000000000000..925cfc2ea27b0d4ffbdadfb86a > > > > > > > > > bc5c198f57469d > > > > > > > > > > --- /dev/null > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add- > > > > > > unsigned- > > > > > > > > > byte.c > > > > > > > > > > @@ -0,0 +1,9 @@ > > > > > > > > > > +/* { dg-do compile } */ > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_byte } */ > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > > > > > > > > + > > > > > > > > > > +#define TYPE uint8_t > > > > > > > > > > +#define N 200 > > > > > > > > > > +#include > > > > > > > > > > +#include "complex-add-template.c" > > > > > > > > > > \ No newline at end of file > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex- > > add- > > > > > > > > > unsigned-int.c b/gcc/testsuite/gcc.dg/vect/complex/vect- > > complex- > > > > > > add- > > > > > > > > > unsigned-int.c > > > > > > > > > > new file mode 100644 > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 0000000000000000000000000000000000000000..6a70c6ebf0586c11a17cb1ad2c > > > > > > > > > add0d5927c2aca > > > > > > > > > > --- /dev/null > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add- > > > > > > unsigned- > > > > > > > > > int.c > > > > > > > > > > @@ -0,0 +1,9 @@ > > > > > > > > > > +/* { dg-do compile } */ > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_int } */ > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > > > > > > > > + > > > > > > > > > > +#define TYPE uint32_t > > > > > > > > > > +#define N 200 > > > > > > > > > > +#include > > > > > > > > > > +#include "complex-add-template.c" > > > > > > > > > > \ No newline at end of file > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex- > > add- > > > > > > > > > unsigned-long.c b/gcc/testsuite/gcc.dg/vect/complex/vect- > > > > complex- > > > > > > add- > > > > > > > > > unsigned-long.c > > > > > > > > > > new file mode 100644 > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 0000000000000000000000000000000000000000..084080aeb4386bf41b0e23d0c > > > > > > > > > 684917b2b0435d1 > > > > > > > > > > --- /dev/null > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add- > > > > > > unsigned- > > > > > > > > > long.c > > > > > > > > > > @@ -0,0 +1,9 @@ > > > > > > > > > > +/* { dg-do compile } */ > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_long } */ > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > > > > > > > > + > > > > > > > > > > +#define TYPE uint64_t > > > > > > > > > > +#define N 200 > > > > > > > > > > +#include > > > > > > > > > > +#include "complex-add-template.c" > > > > > > > > > > \ No newline at end of file > > > > > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/complex/vect-complex- > > add- > > > > > > > > > unsigned-short.c b/gcc/testsuite/gcc.dg/vect/complex/vect- > > > > complex- > > > > > > add- > > > > > > > > > unsigned-short.c > > > > > > > > > > new file mode 100644 > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 0000000000000000000000000000000000000000..1379608a60310fd26b18e3db2 > > > > > > > > > b6294c28bf5bf2e > > > > > > > > > > --- /dev/null > > > > > > > > > > +++ b/gcc/testsuite/gcc.dg/vect/complex/vect-complex-add- > > > > > > unsigned- > > > > > > > > > short.c > > > > > > > > > > @@ -0,0 +1,9 @@ > > > > > > > > > > +/* { dg-do compile } */ > > > > > > > > > > +/* { dg-require-effective-target vect_complex_add_short } > > */ > > > > > > > > > > +/* { dg-add-options arm_v8_3a_complex_neon } */ > > > > > > > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > > > > > > > > + > > > > > > > > > > +#define TYPE uint16_t > > > > > > > > > > +#define N 200 > > > > > > > > > > +#include > > > > > > > > > > +#include "complex-add-template.c" > > > > > > > > > > \ No newline at end of file > > > > > > > > > > diff --git a/gcc/testsuite/lib/target-supports.exp > > > > > > b/gcc/testsuite/lib/target- > > > > > > > > > supports.exp > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 22acda2a74fdfa51aebbc311d5cc84763b0ffc63..baa5e4a569263edda2125bd8ac > > > > > > > > > a6f5b19bbad783 100644 > > > > > > > > > > --- a/gcc/testsuite/lib/target-supports.exp > > > > > > > > > > +++ b/gcc/testsuite/lib/target-supports.exp > > > > > > > > > > @@ -3355,7 +3355,102 @@ proc > > check_effective_target_vect_int > > > > { } { > > > > > > > > > > }}] > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > -# Return 1 if the target supports signed int->float conversion > > > > > > > > > > +# Return 1 if the target supports hardware vectorization of > > > > complex > > > > > > > > > additions of > > > > > > > > > > +# byte, 0 otherwise. > > > > > > > > > > +# > > > > > > > > > > +# This won't change for different subtargets so cache the > > result. > > > > > > > > > > + > > > > > > > > > > +proc check_effective_target_vect_complex_add_byte { } { > > > > > > > > > > + return [check_cached_effective_target_indexed > > > > > > > > > vect_complex_add_byte { > > > > > > > > > > + expr { > > > > > > > > > > + [check_effective_target_aarch64_sve2] > > > > > > > > > > + || > > [check_effective_target_arm_v8_1m_mve_fp_ok] > > > > > > > > > > + }}] > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +# Return 1 if the target supports hardware vectorization of > > > > complex > > > > > > > > > additions of > > > > > > > > > > +# short, 0 otherwise. > > > > > > > > > > +# > > > > > > > > > > +# This won't change for different subtargets so cache the > > result. > > > > > > > > > > + > > > > > > > > > > +proc check_effective_target_vect_complex_add_short { } { > > > > > > > > > > + return [check_cached_effective_target_indexed > > > > > > > > > vect_complex_add_short { > > > > > > > > > > + expr { > > > > > > > > > > + [check_effective_target_aarch64_sve2] > > > > > > > > > > + || > > [check_effective_target_arm_v8_1m_mve_fp_ok] > > > > > > > > > > + }}] > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +# Return 1 if the target supports hardware vectorization of > > > > complex > > > > > > > > > additions of > > > > > > > > > > +# int, 0 otherwise. > > > > > > > > > > +# > > > > > > > > > > +# This won't change for different subtargets so cache the > > result. > > > > > > > > > > + > > > > > > > > > > +proc check_effective_target_vect_complex_add_int { } { > > > > > > > > > > + return [check_cached_effective_target_indexed > > > > > > > > > vect_complex_add_int { > > > > > > > > > > + expr { > > > > > > > > > > + [check_effective_target_aarch64_sve2] > > > > > > > > > > + || > > [check_effective_target_arm_v8_1m_mve_fp_ok] > > > > > > > > > > + }}] > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +# Return 1 if the target supports hardware vectorization of > > > > complex > > > > > > > > > additions of > > > > > > > > > > +# long, 0 otherwise. > > > > > > > > > > +# > > > > > > > > > > +# This won't change for different subtargets so cache the > > result. > > > > > > > > > > + > > > > > > > > > > +proc check_effective_target_vect_complex_add_long { } { > > > > > > > > > > + return [check_cached_effective_target_indexed > > > > > > > > > vect_complex_add_long { > > > > > > > > > > + expr { > > > > > > > > > > + [check_effective_target_aarch64_sve2] > > > > > > > > > > + || > > [check_effective_target_arm_v8_1m_mve_fp_ok] > > > > > > > > > > + }}] > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +# Return 1 if the target supports hardware vectorization of > > > > complex > > > > > > > > > additions of > > > > > > > > > > +# half, 0 otherwise. > > > > > > > > > > +# > > > > > > > > > > +# This won't change for different subtargets so cache the > > result. > > > > > > > > > > + > > > > > > > > > > +proc check_effective_target_vect_complex_add_half { } { > > > > > > > > > > + return [check_cached_effective_target_indexed > > > > > > > > > vect_complex_add_half { > > > > > > > > > > + expr { > > > > > > > > > > + > > [check_effective_target_arm_v8_3a_complex_neon_ok > > > > > > > > > > + && > > check_effective_target_arm_v8_2a_fp16_neon_ok] > > > > > > > > > > + || [check_effective_target_aarch64_sve2] > > > > > > > > > > + || > > [check_effective_target_arm_v8_1m_mve_fp_ok] > > > > > > > > > > + }}] > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +# Return 1 if the target supports hardware vectorization of > > > > complex > > > > > > > > > additions of > > > > > > > > > > +# float, 0 otherwise. > > > > > > > > > > +# > > > > > > > > > > +# This won't change for different subtargets so cache the > > result. > > > > > > > > > > + > > > > > > > > > > +proc check_effective_target_vect_complex_add_float { } { > > > > > > > > > > + return [check_cached_effective_target_indexed > > > > > > > > > vect_complex_add_float { > > > > > > > > > > + expr { > > > > > > > > > > + > > [check_effective_target_arm_v8_3a_complex_neon_ok] > > > > > > > > > > + || [check_effective_target_aarch64_sve2] > > > > > > > > > > + || > > [check_effective_target_arm_v8_1m_mve_fp_ok] > > > > > > > > > > + }}] > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +# Return 1 if the target supports hardware vectorization of > > > > complex > > > > > > > > > additions of > > > > > > > > > > +# double, 0 otherwise. > > > > > > > > > > +# > > > > > > > > > > +# This won't change for different subtargets so cache the > > result. > > > > > > > > > > + > > > > > > > > > > +proc check_effective_target_vect_complex_add_double { } { > > > > > > > > > > + return [check_cached_effective_target_indexed > > > > > > > > > vect_complex_add_double { > > > > > > > > > > + expr { > > > > > > > > > > + > > [check_effective_target_arm_v8_3a_complex_neon_ok] > > > > > > > > > > + || [check_effective_target_aarch64_sve2] > > > > > > > > > > + || > > [check_effective_target_arm_v8_1m_mve_fp_ok] > > > > > > > > > > + }}] > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +# Return 1 if the target supports signed int->float conversion > > > > > > > > > > # > > > > > > > > > > > > > > > > > > > > proc check_effective_target_vect_intfloat_cvt { } { > > > > > > > > > > @@ -10367,7 +10462,7 @@ proc > > > > > > > > > > > check_effective_target_arm_v8_3a_complex_neon_ok_nocache { } > > > > { > > > > > > > > > > set et_arm_v8_3a_complex_neon_flags "" > > > > > > > > > > > > > > > > > > > > if { ![istarget arm*-*-*] && ![istarget aarch64*-*-*] } { > > > > > > > > > > - return 0; > > > > > > > > > > + return 1; > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > # Iterate through sets of options to find the compiler flags > > that > > > > > > > > > > @@ -10380,11 +10475,11 @@ proc > > > > > > > > > > > check_effective_target_arm_v8_3a_complex_neon_ok_nocache { } > > > > { > > > > > > > > > > #endif > > > > > > > > > > } "$flags -march=armv8.3-a"] } { > > > > > > > > > > set et_arm_v8_3a_complex_neon_flags "$flags - > > > > > > march=armv8.3-a" > > > > > > > > > > - return 1 > > > > > > > > > > + return 0; > > > > > > > > > > } > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > - return 0; > > > > > > > > > > + return 1; > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > proc check_effective_target_arm_v8_3a_complex_neon_ok > > { } { > > > > > > > > > > @@ -10400,13 +10495,57 @@ proc > > > > > > > > > add_options_for_arm_v8_3a_complex_neon { flags } { > > > > > > > > > > return "$flags $et_arm_v8_3a_complex_neon_flags" > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > +# Return 1 if the target supports ARMv8.3 Adv.SIMD + FP16 > > > > Complex > > > > > > > > > instructions > > > > > > > > > > +# instructions, 0 otherwise. The test is valid for ARM and for > > > > > > AArch64. > > > > > > > > > > +# Record the command line options needed. > > > > > > > > > > + > > > > > > > > > > +proc > > > > > > > > > > > > > check_effective_target_arm_v8_3a_fp16_complex_neon_ok_nocache > > > > > > { } { > > > > > > > > > > + global et_arm_v8_3a_fp16_complex_neon_flags > > > > > > > > > > + set et_arm_v8_3a_fp16_complex_neon_flags "" > > > > > > > > > > + > > > > > > > > > > + if { ![istarget arm*-*-*] && ![istarget aarch64*-*-*] } { > > > > > > > > > > + return 1; > > > > > > > > > > + } > > > > > > > > > > + > > > > > > > > > > + # Iterate through sets of options to find the compiler flags > > that > > > > > > > > > > + # need to be added to the -march option. > > > > > > > > > > + foreach flags {"" "-mfloat-abi=softfp -mfpu=auto" "-mfloat- > > > > > > abi=hard - > > > > > > > > > mfpu=auto"} { > > > > > > > > > > + if { [check_no_compiler_messages_nocache \ > > > > > > > > > > + arm_v8_3a_fp16_complex_neon_ok object { > > > > > > > > > > + #if !defined (__ARM_FEATURE_COMPLEX) > > > > > > > > > > + #error "__ARM_FEATURE_COMPLEX not defined" > > > > > > > > > > + #endif > > > > > > > > > > + } "$flags -march=armv8.3-a+fp16"] } { > > > > > > > > > > + set et_arm_v8_3a_fp16_complex_neon_flags \ > > > > > > > > > > + "$flags -march=armv8.3-a+fp16" > > > > > > > > > > + return 0; > > > > > > > > > > + } > > > > > > > > > > + } > > > > > > > > > > + > > > > > > > > > > + return 1; > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +proc > > > > check_effective_target_arm_v8_3a_fp16_complex_neon_ok { } > > > > > > { > > > > > > > > > > + return [check_cached_effective_target > > > > > > > > > arm_v8_3a_fp16_complex_neon_ok \ > > > > > > > > > > + > > > > > > > > > > > > > check_effective_target_arm_v8_3a_fp16_complex_neon_ok_nocache] > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +proc add_options_for_arm_v8_3a_fp16_complex_neon > > { flags } > > > > { > > > > > > > > > > + if { ! > > > > > > [check_effective_target_arm_v8_3a_fp16_complex_neon_ok] } { > > > > > > > > > > + return "$flags" > > > > > > > > > > + } > > > > > > > > > > + global et_arm_v8_3a_fp16_complex_neon_flags > > > > > > > > > > + return "$flags $et_arm_v8_3a_fp16_complex_neon_flags" > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > + > > > > > > > > > > # Return 1 if the target supports executing AdvSIMD > > instructions > > > > > > from > > > > > > > > > ARMv8.3 > > > > > > > > > > # with the complex instruction extension, 0 otherwise. The > > test is > > > > > > valid for > > > > > > > > > > # ARM and for AArch64. > > > > > > > > > > > > > > > > > > > > proc check_effective_target_arm_v8_3a_complex_neon_hw > > { } { > > > > > > > > > > if > > { ![check_effective_target_arm_v8_3a_complex_neon_ok] } > > > > { > > > > > > > > > > - return 0; > > > > > > > > > > + return 1; > > > > > > > > > > } > > > > > > > > > > return [check_runtime > > > > arm_v8_3a_complex_neon_hw_available { > > > > > > > > > > #include "arm_neon.h" > > > > > > > > > > @@ -10431,7 +10570,7 @@ proc > > > > > > > > > check_effective_target_arm_v8_3a_complex_neon_hw { } { > > > > > > > > > > : /* No clobbers. */); > > > > > > > > > > #endif > > > > > > > > > > > > > > > > > > > > - return (results[0] == 8 && results[1] == 24) ? 1 : 0; > > > > > > > > > > + return (results[0] == 8 && results[1] == 24) ? 0 : 1; > > > > > > > > > > } > > > > > > > > > > } [add_options_for_arm_v8_3a_complex_neon ""]] > > > > > > > > > > } > > > > > > > > > > diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp- > > > > patterns.c > > > > > > > > > > new file mode 100644 > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 0000000000000000000000000000000000000000..aeb402289277c4bb48b62b7e9 > > > > > > > > > e074850a99d3182 > > > > > > > > > > --- /dev/null > > > > > > > > > > +++ b/gcc/tree-vect-slp-patterns.c > > > > > > > > > > @@ -0,0 +1,739 @@ > > > > > > > > > > +/* SLP - Pattern matcher on SLP trees > > > > > > > > > > + Copyright (C) 2020 Free Software Foundation, Inc. > > > > > > > > > > + > > > > > > > > > > +This file is part of GCC. > > > > > > > > > > + > > > > > > > > > > +GCC is free software; you can redistribute it and/or modify it > > > > under > > > > > > > > > > +the terms of the GNU General Public License as published by > > the > > > > > > Free > > > > > > > > > > +Software Foundation; either version 3, or (at your option) > > any > > > > later > > > > > > > > > > +version. > > > > > > > > > > + > > > > > > > > > > +GCC is distributed in the hope that it will be useful, but > > WITHOUT > > > > > > ANY > > > > > > > > > > +WARRANTY; without even the implied warranty of > > > > > > MERCHANTABILITY or > > > > > > > > > > +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General > > > > Public > > > > > > > > > License > > > > > > > > > > +for more details. > > > > > > > > > > + > > > > > > > > > > +You should have received a copy of the GNU General Public > > > > License > > > > > > > > > > +along with GCC; see the file COPYING3. If not see > > > > > > > > > > +. */ > > > > > > > > > > + > > > > > > > > > > +#include "config.h" > > > > > > > > > > +#include "system.h" > > > > > > > > > > +#include "coretypes.h" > > > > > > > > > > +#include "backend.h" > > > > > > > > > > +#include "target.h" > > > > > > > > > > +#include "rtl.h" > > > > > > > > > > +#include "tree.h" > > > > > > > > > > +#include "gimple.h" > > > > > > > > > > +#include "tree-pass.h" > > > > > > > > > > +#include "ssa.h" > > > > > > > > > > +#include "optabs-tree.h" > > > > > > > > > > +#include "insn-config.h" > > > > > > > > > > +#include "recog.h" /* FIXME: for insn_data */ > > > > > > > > > > +#include "fold-const.h" > > > > > > > > > > +#include "stor-layout.h" > > > > > > > > > > +#include "gimple-iterator.h" > > > > > > > > > > +#include "cfgloop.h" > > > > > > > > > > +#include "tree-vectorizer.h" > > > > > > > > > > +#include "langhooks.h" > > > > > > > > > > +#include "gimple-walk.h" > > > > > > > > > > +#include "dbgcnt.h" > > > > > > > > > > +#include "tree-vector-builder.h" > > > > > > > > > > +#include "vec-perm-indices.h" > > > > > > > > > > +#include "gimple-fold.h" > > > > > > > > > > +#include "internal-fn.h" > > > > > > > > > > + > > > > > > > > > > +/* SLP Pattern matching mechanism. > > > > > > > > > > + > > > > > > > > > > + This extension to the SLP vectorizer allows one to transform > > the > > > > > > > > > generated SLP > > > > > > > > > > + tree based on any pattern. The difference between this > > and > > > > the > > > > > > normal > > > > > > > > > vect > > > > > > > > > > + pattern matcher is that unlike the former, this matcher > > allows > > > > you > > > > > > to > > > > > > > > > match > > > > > > > > > > + with instructions that do not belong to the same SSA > > dominator > > > > > > graph. > > > > > > > > > > + > > > > > > > > > > + The only requirement that this pattern matcher has is that > > you > > > > are > > > > > > only > > > > > > > > > > + only allowed to either match an entire group or none. > > > > > > > > > > + > > > > > > > > > > + The pattern matcher currently only allows you to perform > > > > > > replacements > > > > > > > > > to > > > > > > > > > > + internal functions. > > > > > > > > > > + > > > > > > > > > > + Once the patterns are matched it is one way, these cannot > > be > > > > > > undone. It > > > > > > > > > is > > > > > > > > > > + currently not supported to match patterns recursively. > > > > > > > > > > + > > > > > > > > > > + To add a new pattern, implement the vect_pattern class and > > > > add > > > > > > the > > > > > > > > > type to > > > > > > > > > > + slp_patterns. > > > > > > > > > > + > > > > > > > > > > +*/ > > > > > > > > > > + > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > +/********************************************************* > > > > > > > > > ********************** > > > > > > > > > > + * vect_pattern class > > > > > > > > > > + > > > > > > > > > > > > > > > > > > > > > ********************************************************** > > > > > > > > > ********************/ > > > > > > > > > > + > > > > > > > > > > +/* Default implementation of recognize that peforms > > matching, > > > > > > validation > > > > > > > > > and > > > > > > > > > > + replacement of nodes but that can be overriden if required. > > */ > > > > > > > > > > + > > > > > > > > > > +static bool > > > > > > > > > > +vect_pattern_validate_optab (internal_fn ifn, slp_tree node) > > > > > > > > > > +{ > > > > > > > > > > + tree vectype = SLP_TREE_VECTYPE (node); > > > > > > > > > > + if (ifn == IFN_LAST || !vectype) > > > > > > > > > > + return false; > > > > > > > > > > + > > > > > > > > > > + if (dump_enabled_p ()) > > > > > > > > > > + dump_printf_loc (MSG_NOTE, vect_location, > > > > > > > > > > + "Found %s pattern in SLP tree\n", > > > > > > > > > > + internal_fn_name (ifn)); > > > > > > > > > > + > > > > > > > > > > + if (direct_internal_fn_supported_p (ifn, vectype, > > > > > > > > > OPTIMIZE_FOR_SPEED)) > > > > > > > > > > + { > > > > > > > > > > + if (dump_enabled_p ()) > > > > > > > > > > + dump_printf_loc (MSG_NOTE, vect_location, > > > > > > > > > > + "Target supports %s vectorization > > with > > > > > > mode %T\n", > > > > > > > > > > + internal_fn_name (ifn), vectype); > > > > > > > > > > + } > > > > > > > > > > + else > > > > > > > > > > + { > > > > > > > > > > + if (dump_enabled_p ()) > > > > > > > > > > + { > > > > > > > > > > + if (!vectype) > > > > > > > > > > + dump_printf_loc (MSG_NOTE, vect_location, > > > > > > > > > > + "Target does not support vector > > type > > > > > > for %T\n", > > > > > > > > > > + SLP_TREE_DEF_TYPE (node)); > > > > > > > > > > + else > > > > > > > > > > + dump_printf_loc (MSG_NOTE, vect_location, > > > > > > > > > > + "Target does not support %s for > > vector > > > > > > type " > > > > > > > > > > + "%T\n", internal_fn_name (ifn), > > vectype); > > > > > > > > > > + } > > > > > > > > > > + return false; > > > > > > > > > > + } > > > > > > > > > > + return true; > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > +/********************************************************* > > > > > > > > > ********************** > > > > > > > > > > + * General helper types > > > > > > > > > > + > > > > > > > > > > > > > > > > > > > > > ********************************************************** > > > > > > > > > ********************/ > > > > > > > > > > + > > > > > > > > > > +/* The COMPLEX_OPERATION enum denotes the possible > > pair of > > > > > > > > > operations that can > > > > > > > > > > + be matched when looking for expressions that we are > > > > interested > > > > > > > > > matching for > > > > > > > > > > + complex numbers addition and mla. */ > > > > > > > > > > + > > > > > > > > > > +typedef enum _complex_operation : unsigned { > > > > > > > > > > + PLUS_PLUS, > > > > > > > > > > + MINUS_PLUS, > > > > > > > > > > + PLUS_MINUS, > > > > > > > > > > + MULT_MULT, > > > > > > > > > > + CMPLX_NONE > > > > > > > > > > +} complex_operation_t; > > > > > > > > > > + > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > +/********************************************************* > > > > > > > > > ********************** > > > > > > > > > > + * General helper functions > > > > > > > > > > + > > > > > > > > > > > > > > > > > > > > > ********************************************************** > > > > > > > > > ********************/ > > > > > > > > > > + > > > > > > > > > > +/* Helper function of linear_loads_p that checks to see if the > > > > load > > > > > > > > > permutation > > > > > > > > > > + is sequential and in monotonically increasing order of loads > > with > > > > no > > > > > > gaps. > > > > > > > > > > +*/ > > > > > > > > > > + > > > > > > > > > > +static inline bool > > > > > > > > > > +is_linear_load_p (load_permutation_t loads) > > > > > > > > > > +{ > > > > > > > > > > + if (loads.length() == 0) > > > > > > > > > > + return false; > > > > > > > > > > + > > > > > > > > > > + unsigned leader = loads[0]; > > > > > > > > > > + unsigned load, i; > > > > > > > > > > + FOR_EACH_VEC_ELT_FROM (loads, i, load, 1) > > > > > > > > > > + if (load != ++leader) > > > > > > > > > > + return false; > > > > > > > > > > + return true; > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > + > > > > > > > > > > +/* Check to see if all loads rooted in ROOT are linear. > > Linearity is > > > > > > > > > > + defined as having no gaps between values loaded. */ > > > > > > > > > > + > > > > > > > > > > +static load_permutation_t > > > > > > > > > > +linear_loads_p (slp_tree_to_load_perm_map_t > > *perm_cache, > > > > > > slp_tree > > > > > > > > > root, > > > > > > > > > > + bool *linear) > > > > > > > > > > +{ > > > > > > > > > > + *linear = false; > > > > > > > > > > + if (!root) > > > > > > > > > > + return vNULL; > > > > > > > > > > + > > > > > > > > > > + unsigned i; > > > > > > > > > > + load_permutation_t loads = vNULL; > > > > > > > > > > + load_permutation_t *tmp; > > > > > > > > > > + > > > > > > > > > > + if ((tmp = perm_cache->get (root)) != NULL) > > > > > > > > > > + { > > > > > > > > > > + *linear = is_linear_load_p (*tmp); > > > > > > > > > > + return *tmp; > > > > > > > > > > + } > > > > > > > > > > + > > > > > > > > > > + perm_cache->put (root, vNULL); > > > > > > > > > > + > > > > > > > > > > + /* If it's a load node, then just read the load permute. */ > > > > > > > > > > + if (SLP_TREE_LOAD_PERMUTATION (root).exists ()) > > > > > > > > > > + { > > > > > > > > > > + loads = SLP_TREE_LOAD_PERMUTATION (root); > > > > > > > > > > + perm_cache->put (root, loads); > > > > > > > > > > + if (!is_linear_load_p (loads)) > > > > > > > > > > + return loads; > > > > > > > > > > + } > > > > > > > > > > + else if (SLP_TREE_DEF_TYPE (root) == vect_external_def) > > > > > > > > > > + { > > > > > > > > > > + loads.create (SLP_TREE_LANES (root)); > > > > > > > > > > + tree op; > > > > > > > > > > + FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_OPS (root), i, op) > > > > > > > > > > + { > > > > > > > > > > + if (TREE_CODE (op) != SSA_NAME) > > > > > > > > > > + return vNULL; > > > > > > > > > > + > > > > > > > > > > + gimple *defstmt = SSA_NAME_DEF_STMT (op); > > > > > > > > > > + if (!is_gimple_assign (defstmt)) > > > > > > > > > > + return vNULL; > > > > > > > > > > + > > > > > > > > > > + switch (gimple_assign_rhs_code (defstmt)) > > > > > > > > > > + { > > > > > > > > > > + case IMAGPART_EXPR: > > > > > > > > > > + loads.safe_push (1); > > > > > > > > > > + break; > > > > > > > > > > + case REALPART_EXPR: > > > > > > > > > > + loads.safe_push (0); > > > > > > > > > > + break; > > > > > > > > > > + default: > > > > > > > > > > + { > > > > > > > > > > + loads.release (); > > > > > > > > > > + return vNULL; > > > > > > > > > > + } > > > > > > > > > > + } > > > > > > > > > > + } > > > > > > > > > > + > > > > > > > > > > + perm_cache->put (root, loads); > > > > > > > > > > + if (!is_linear_load_p (loads)) > > > > > > > > > > + return loads; > > > > > > > > > > + } > > > > > > > > > > + else if (SLP_TREE_DEF_TYPE (root) != vect_internal_def) > > > > > > > > > > + return vNULL; > > > > > > > > > > + > > > > > > > > > > + auto_vec all_loads; > > > > > > > > > > + bool is_perm = SLP_TREE_LANE_PERMUTATION > > (root).exists (); > > > > > > > > > > + > > > > > > > > > > + slp_tree child; > > > > > > > > > > + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (root), i, child) > > > > > > > > > > + { > > > > > > > > > > + loads = linear_loads_p (perm_cache, child, linear); > > > > > > > > > > + if ((!*linear && !is_perm) || !loads.exists ()) > > > > > > > > > > + return loads; > > > > > > > > > > + > > > > > > > > > > + all_loads.safe_push (loads); > > > > > > > > > > + } > > > > > > > > > > + > > > > > > > > > > + if (is_perm) > > > > > > > > > > + { > > > > > > > > > > + lane_permutation_t perm = > > SLP_TREE_LANE_PERMUTATION > > > > > > (root); > > > > > > > > > > + load_permutation_t nloads; > > > > > > > > > > + nloads.create (SLP_TREE_LANES (root)); > > > > > > > > > > + nloads.quick_grow (SLP_TREE_LANES (root)); > > > > > > > > > > + for (i = 0; i < SLP_TREE_LANES (root); i++) > > > > > > > > > > + nloads[i] = all_loads[perm[i].first][perm[i].second]; > > > > > > > > > > + > > > > > > > > > > + perm_cache->put (root, nloads); > > > > > > > > > > + if (!is_linear_load_p (nloads)) > > > > > > > > > > + return nloads; > > > > > > > > > > + loads = nloads; > > > > > > > > > > + } > > > > > > > > > > + > > > > > > > > > > + perm_cache->put (root, loads); > > > > > > > > > > + *linear = true; > > > > > > > > > > + return loads; > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > + > > > > > > > > > > +/* This function attempts to make a node rooted in NODE > > with > > > > > > parent > > > > > > > > > PARENT > > > > > > > > > > + linear. If the node if already linear than the node itself is > > > > returned > > > > > > > > > > + in RESULT. > > > > > > > > > > + > > > > > > > > > > + If the node is not linear then a new VEC_PERM_EXPR node > > is > > > > > > created > > > > > > > > > with a > > > > > > > > > > + lane permute that when applied will make the node linear. > > If > > > > such > > > > > > a > > > > > > > > > > + permute cannot be created then FALSE is returned from > > the > > > > > > function. > > > > > > > > > > + > > > > > > > > > > + Here linearity is defined as having a sequential, monotically > > > > > > increasing > > > > > > > > > > + load position inside the load permute generated by the > > loads > > > > > > reachable > > > > > > > > > from > > > > > > > > > > + NODE. */ > > > > > > > > > > + > > > > > > > > > > +static bool > > > > > > > > > > +vect_slp_make_linear (slp_tree_to_load_perm_map_t > > > > > > *perm_cache, > > > > > > > > > > + slp_tree parent, slp_tree node, slp_tree > > *result) > > > > > > > > > > +{ > > > > > > > > > > + bool is_linear = false; > > > > > > > > > > + unsigned x, val; > > > > > > > > > > + load_permutation_t load_perm = linear_loads_p > > (perm_cache, > > > > > > node, > > > > > > > > > &is_linear); > > > > > > > > > > + if (is_linear) > > > > > > > > > > + { > > > > > > > > > > + *result = node; > > > > > > > > > > + SLP_TREE_REF_COUNT (node)++; > > > > > > > > > > + return true; > > > > > > > > > > + } > > > > > > > > > > + > > > > > > > > > > + /* Attempt to linearise the permute. */ > > > > > > > > > > + vec > zipped; > > > > > > > > > > + zipped.create (load_perm.length ()); > > > > > > > > > > + FOR_EACH_VEC_ELT (load_perm, x, val) > > > > > > > > > > + zipped.quick_push (std::make_pair (val, x)); > > > > > > > > > > + > > > > > > > > > > + typedef const std::pair* cmp_t; > > > > > > > > > > + zipped.qsort ([](const void *a, const void *b) -> int > > > > > > > > > > + { return (int)((cmp_t)a)->first - (int)((cmp_t)b)->first; }); > > > > > > > > > > + > > > > > > > > > > + /* Verify if we have a linear permute sequence. */ > > > > > > > > > > + if (zipped.length () > 0) > > > > > > > > > > + { > > > > > > > > > > + unsigned leader = zipped[0].first; > > > > > > > > > > + for (x = 1; x < zipped.length (); x++) > > > > > > > > > > + if(!(is_linear = (zipped[x].first == ++leader))) > > > > > > > > > > + break; > > > > > > > > > > + } > > > > > > > > > > + > > > > > > > > > > + if (!is_linear) > > > > > > > > > > + { > > > > > > > > > > + if (dump_enabled_p ()) > > > > > > > > > > + dump_printf_loc (MSG_NOTE, vect_location, > > > > > > > > > > + "Loads could not be made > > linear %p\n", > > > > > > > > > > + node); > > > > > > > > > > + zipped.release (); > > > > > > > > > > + return false; > > > > > > > > > > + } > > > > > > > > > > + > > > > > > > > > > + for (x = 0; x < zipped.length (); x++) > > > > > > > > > > + zipped[x].first = 0; > > > > > > > > > > + > > > > > > > > > > + /* Create the new permute node and store it instead. */ > > > > > > > > > > + slp_tree vnode = vect_create_new_slp_node (vNULL, 1); > > > > > > > > > > + SLP_TREE_CODE (vnode) = VEC_PERM_EXPR; > > > > > > > > > > + SLP_TREE_LANE_PERMUTATION (vnode) = zipped; > > > > > > > > > > + SLP_TREE_VECTYPE (vnode) = SLP_TREE_VECTYPE (parent); > > > > > > > > > > + SLP_TREE_CHILDREN (vnode).quick_push (node); > > > > > > > > > > + SLP_TREE_REF_COUNT (vnode) = 1; > > > > > > > > > > + SLP_TREE_LANES (vnode) = SLP_TREE_LANES (node); > > > > > > > > > > + SLP_TREE_REPRESENTATIVE (vnode) = > > > > SLP_TREE_REPRESENTATIVE > > > > > > > > > (parent); > > > > > > > > > > + SLP_TREE_REF_COUNT (node)++; > > > > > > > > > > + *result = vnode; > > > > > > > > > > + return is_linear; > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +/* Checks to see of the expression represented by NODE is a > > > > gimple > > > > > > > > > assign with > > > > > > > > > > + code CODE. */ > > > > > > > > > > + > > > > > > > > > > +static inline bool > > > > > > > > > > +vect_match_expression_p (slp_tree node, tree_code code) > > > > > > > > > > +{ > > > > > > > > > > + if (!node > > > > > > > > > > + || !SLP_TREE_REPRESENTATIVE (node)) > > > > > > > > > > + return false; > > > > > > > > > > + > > > > > > > > > > + gimple* expr = STMT_VINFO_STMT > > > > (SLP_TREE_REPRESENTATIVE > > > > > > > > > (node)); > > > > > > > > > > + if (!is_gimple_assign (expr) > > > > > > > > > > + || gimple_assign_rhs_code (expr) != code) > > > > > > > > > > + return false; > > > > > > > > > > + > > > > > > > > > > + return true; > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +/* Check if the given lane permute in PERMUTES matches an > > > > > > alternating > > > > > > > > > sequence > > > > > > > > > > + of {P0 P1 P0 P1 ...}. This to account for unrolled loops. > > Further > > > > > > mode > > > > > > > > > > + there resulting permute must be linear. */ > > > > > > > > > > + > > > > > > > > > > +static inline bool > > > > > > > > > > +vect_check_lane_permute (lane_permutation_t &permutes, > > > > > > > > > > + unsigned p0, unsigned p1) > > > > > > > > > > +{ > > > > > > > > > > + if (permutes.length () == 0) > > > > > > > > > > + return false; > > > > > > > > > > + > > > > > > > > > > + unsigned val[2] = {p0, p1}; > > > > > > > > > > + unsigned seed = permutes[0].second; > > > > > > > > > > + for (unsigned i = 0; i < permutes.length (); i++) > > > > > > > > > > + if (permutes[i].first != val[i % 2] > > > > > > > > > > + || permutes[i].second != seed++) > > > > > > > > > > + return false; > > > > > > > > > > + > > > > > > > > > > + return true; > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +/* This function will match the two gimple expressions > > > > representing > > > > > > > > > NODE1 and > > > > > > > > > > + NODE2 in parallel and returns the pair operation that > > > > represents > > > > > > the two > > > > > > > > > > + expressions in the two statements. > > > > > > > > > > + > > > > > > > > > > + If match is successful then the corresponding > > > > complex_operation is > > > > > > > > > > + returned and the arguments to the two matched > > operations > > > > are > > > > > > > > > returned in OPS. > > > > > > > > > > + > > > > > > > > > > + If TWO_OPERANDS it is expected that the LANES of the > > parent > > > > > > > > > VEC_PERM select > > > > > > > > > > + from the two nodes alternatingly. > > > > > > > > > > + > > > > > > > > > > + If unsuccessful then CMPLX_NONE is returned and OPS is > > > > > > untouched. > > > > > > > > > > + > > > > > > > > > > + e.g. the following gimple statements > > > > > > > > > > + > > > > > > > > > > + stmt 0 _39 = _37 + _12; > > > > > > > > > > + stmt 1 _6 = _38 - _36; > > > > > > > > > > + > > > > > > > > > > + will return PLUS_MINUS along with OPS containing {_37, > > _12, > > > > _38, > > > > > > _36}. > > > > > > > > > > +*/ > > > > > > > > > > + > > > > > > > > > > +static complex_operation_t > > > > > > > > > > +vect_detect_pair_op (slp_tree node1, slp_tree node2, > > > > > > > > > lane_permutation_t &lanes, > > > > > > > > > > + bool two_operands = true, vec > > *ops = > > > > > > NULL) > > > > > > > > > > +{ > > > > > > > > > > + complex_operation_t result = CMPLX_NONE; > > > > > > > > > > + > > > > > > > > > > + if (vect_match_expression_p (node1, MINUS_EXPR) > > > > > > > > > > + && vect_match_expression_p (node2, PLUS_EXPR) > > > > > > > > > > + && (!two_operands || vect_check_lane_permute (lanes, > > 0, > > > > 1))) > > > > > > > > > > + result = MINUS_PLUS; > > > > > > > > > > + else if (vect_match_expression_p (node1, PLUS_EXPR) > > > > > > > > > > + && vect_match_expression_p (node2, > > MINUS_EXPR) > > > > > > > > > > + && (!two_operands || vect_check_lane_permute > > (lanes, 0, > > > > > > 1))) > > > > > > > > > > + result = PLUS_MINUS; > > > > > > > > > > + else if (vect_match_expression_p (node1, PLUS_EXPR) > > > > > > > > > > + && vect_match_expression_p (node2, PLUS_EXPR)) > > > > > > > > > > + result = PLUS_PLUS; > > > > > > > > > > + else if (vect_match_expression_p (node1, MULT_EXPR) > > > > > > > > > > + && vect_match_expression_p (node2, > > MULT_EXPR)) > > > > > > > > > > + result = MULT_MULT; > > > > > > > > > > + > > > > > > > > > > + if (result != CMPLX_NONE && ops != NULL) > > > > > > > > > > + { > > > > > > > > > > + ops->create (2); > > > > > > > > > > + ops->quick_push (node1); > > > > > > > > > > + ops->quick_push (node2); > > > > > > > > > > + } > > > > > > > > > > + return result; > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +/* Overload of vect_detect_pair_op that matches against the > > > > > > > > > representative > > > > > > > > > > + statements in the children of NODE. It is expected that > > NODE > > > > has > > > > > > > > > exactly > > > > > > > > > > + two children and when TWO_OPERANDS then NODE must > > be a > > > > > > > > > VEC_PERM. */ > > > > > > > > > > + > > > > > > > > > > +static complex_operation_t > > > > > > > > > > +vect_detect_pair_op (slp_tree node, bool two_operands = > > true, > > > > > > > > > > + vec *ops = NULL) > > > > > > > > > > +{ > > > > > > > > > > + if (!two_operands && SLP_TREE_CODE (node) == > > > > VEC_PERM_EXPR) > > > > > > > > > > + return CMPLX_NONE; > > > > > > > > > > + > > > > > > > > > > + if (SLP_TREE_CHILDREN (node).length () != 2) > > > > > > > > > > + return CMPLX_NONE; > > > > > > > > > > + > > > > > > > > > > + vec children = SLP_TREE_CHILDREN (node); > > > > > > > > > > + lane_permutation_t &lanes = > > SLP_TREE_LANE_PERMUTATION > > > > > > (node); > > > > > > > > > > + > > > > > > > > > > + return vect_detect_pair_op (children[0], children[1], lanes, > > > > > > > > > two_operands, > > > > > > > > > > + ops); > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > +/********************************************************* > > > > > > > > > ********************** > > > > > > > > > > + * complex_pattern class > > > > > > > > > > + > > > > > > > > > > > > > > > > > > > > > ********************************************************** > > > > > > > > > ********************/ > > > > > > > > > > + > > > > > > > > > > +/* SLP Complex Numbers pattern matching. > > > > > > > > > > + > > > > > > > > > > + As an example, the following simple loop: > > > > > > > > > > + > > > > > > > > > > + double a[restrict N]; double b[restrict N]; double c[restrict > > N]; > > > > > > > > > > + > > > > > > > > > > + for (int i=0; i < N; i+=2) > > > > > > > > > > + { > > > > > > > > > > + c[i] = a[i] - b[i+1]; > > > > > > > > > > + c[i+1] = a[i+1] + b[i]; > > > > > > > > > > + } > > > > > > > > > > + > > > > > > > > > > + which represents a complex addition on with a rotation of > > 90* > > > > > > around > > > > > > > > > the > > > > > > > > > > + argand plane. i.e. if `a` and `b` were complex numbers then > > this > > > > > > would be > > > > > > > > > the > > > > > > > > > > + same as `a + (b * I)`. > > > > > > > > > > + > > > > > > > > > > + Here the expressions for `c[i]` and `c[i+1]` are independent > > but > > > > > > have to > > > > > > > > > be > > > > > > > > > > + both recognized in order for the pattern to work. As an SLP > > tree > > > > > > this is > > > > > > > > > > + represented as > > > > > > > > > > + > > > > > > > > > > + +--------------------------------+ > > > > > > > > > > + | stmt 0 *_9 = _10; | > > > > > > > > > > + | stmt 1 *_15 = _16; | > > > > > > > > > > + +--------------------------------+ > > > > > > > > > > + | > > > > > > > > > > + | > > > > > > > > > > + v > > > > > > > > > > + +--------------------------------+ > > > > > > > > > > + | stmt 0 _10 = _4 - _8; | > > > > > > > > > > + | stmt 1 _16 = _12 + _14; | > > > > > > > > > > + | lane permutation { 0[0] 1[1] } | > > > > > > > > > > + +--------------------------------+ > > > > > > > > > > + | | > > > > > > > > > > + | | > > > > > > > > > > + | | > > > > > > > > > > + +-----+ | | +-----+ > > > > > > > > > > + | | | | | | > > > > > > > > > > + +-----| { } |<-----+ +----->| { } --------+ > > > > > > > > > > + | | | +------------------| | | > > > > > > > > > > + | +-----+ | +-----+ | > > > > > > > > > > + | | | | > > > > > > > > > > + | | | | > > > > > > > > > > + | +------|------------------+ | > > > > > > > > > > + | | | | > > > > > > > > > > + v v v v > > > > > > > > > > + +--------------------------+ +--------------------------------+ > > > > > > > > > > + | stmt 0 _8 = *_7; | | stmt 0 _4 = *_3; | > > > > > > > > > > + | stmt 1 _14 = *_13; | | stmt 1 _12 = *_11; | > > > > > > > > > > + | load permutation { 1 0 } | | load permutation { 0 1 } | > > > > > > > > > > + +--------------------------+ +--------------------------------+ > > > > > > > > > > + > > > > > > > > > > + The pattern matcher allows you to replace both statements > > 0 > > > > and 1 > > > > > > or > > > > > > > > > none at > > > > > > > > > > + all. Because this operation is a two operands operation the > > > > actual > > > > > > nodes > > > > > > > > > > + being replaced are those in the { } nodes. The actual scalar > > > > > > statements > > > > > > > > > > + themselves are not replaced or used during the matching > > but > > > > > > instead the > > > > > > > > > > + SLP_TREE_REPRESENTATIVE statements are inspected. You > > are > > > > > > also > > > > > > > > > allowed to > > > > > > > > > > + replace and match on any number of nodes. > > > > > > > > > > + > > > > > > > > > > + Because the pattern matcher matches on the > > representative > > > > > > statement > > > > > > > > > for the > > > > > > > > > > + SLP node the case of two_operators it allows you to match > > the > > > > > > children > > > > > > > > > of the > > > > > > > > > > + node. This is done using the method `recognize ()`. > > > > > > > > > > + > > > > > > > > > > +*/ > > > > > > > > > > + > > > > > > > > > > +/* The complex_pattern class contains common code for > > pattern > > > > > > > > > matchers that work > > > > > > > > > > + on complex numbers. These provide functionality to allow > > de- > > > > > > > > > construction and > > > > > > > > > > + validation of sequences depicting/transforming REAL and > > IMAG > > > > > > pairs. */ > > > > > > > > > > + > > > > > > > > > > +class complex_pattern : public vect_pattern > > > > > > > > > > +{ > > > > > > > > > > + protected: > > > > > > > > > > + auto_vec m_workset; > > > > > > > > > > + complex_pattern (slp_tree *node, vec *m_ops, > > > > > > internal_fn > > > > > > > > > ifn) > > > > > > > > > > + : vect_pattern (node, m_ops, ifn) > > > > > > > > > > + { > > > > > > > > > > + this->m_workset.safe_push (*node); > > > > > > > > > > + } > > > > > > > > > > + > > > > > > > > > > + public: > > > > > > > > > > + void build (slp_tree_to_load_perm_map_t *, vec_info *); > > > > > > > > > > + > > > > > > > > > > + static internal_fn > > > > > > > > > > + matches (complex_operation_t op, > > > > > > slp_tree_to_load_perm_map_t *, > > > > > > > > > > + vec *); > > > > > > > > > > +}; > > > > > > > > > > + > > > > > > > > > > +/* Create a replacement pattern statement for each node in > > > > > > m_node and > > > > > > > > > inserts > > > > > > > > > > + the new statement into m_node as the new representative > > > > > > statement. > > > > > > > > > The old > > > > > > > > > > + statement is marked as being in a pattern defined by the > > new > > > > > > statement. > > > > > > > > > The > > > > > > > > > > + statement is created as call to internal function IFN with > > > > > > m_num_args > > > > > > > > > > + arguments. > > > > > > > > > > + > > > > > > > > > > + Futhermore the new pattern is also added to the > > vectorization > > > > > > > > > information > > > > > > > > > > + structure VINFO and the old statement STMT_INFO is > > marked > > > > as > > > > > > unused > > > > > > > > > while > > > > > > > > > > + the new statement is marked as used and the number of > > SLP > > > > uses > > > > > > of the > > > > > > > > > new > > > > > > > > > > + statement is incremented. > > > > > > > > > > + > > > > > > > > > > + The newly created SLP nodes are marked as SLP only and > > will > > > > be > > > > > > > > > dissolved > > > > > > > > > > + if SLP is aborted. > > > > > > > > > > + > > > > > > > > > > + The newly created gimple call is returned and the BB > > remains > > > > > > unchanged. > > > > > > > > > > + > > > > > > > > > > + This default method is designed to only match against > > simple > > > > > > operands > > > > > > > > > where > > > > > > > > > > + all the input and output types are the same. > > > > > > > > > > +*/ > > > > > > > > > > + > > > > > > > > > > +void > > > > > > > > > > +complex_pattern::build (slp_tree_to_load_perm_map_t > > > > > > *perm_cache, > > > > > > > > > > + vec_info *vinfo) > > > > > > > > > > +{ > > > > > > > > > > + stmt_vec_info stmt_info; > > > > > > > > > > + > > > > > > > > > > + auto_vec args; > > > > > > > > > > + args.create (this->m_num_args); > > > > > > > > > > + args.quick_grow_cleared (this->m_num_args); > > > > > > > > > > + slp_tree node; > > > > > > > > > > + unsigned ix; > > > > > > > > > > + stmt_vec_info call_stmt_info; > > > > > > > > > > + gcall *call_stmt = NULL; > > > > > > > > > > + auto_vec nodes; > > > > > > > > > > + slp_tree tmp = NULL; > > > > > > > > > > + node = this->m_ops[0]; > > > > > > > > > > + > > > > > > > > > > + /* First re-arrange the children. */ > > > > > > > > > > + > > > > > > > > > > + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), ix, tmp) > > > > > > > > > > + { > > > > > > > > > > + slp_tree vnode = NULL; > > > > > > > > > > + if (vect_slp_make_linear (perm_cache, node, tmp, > > &vnode)) > > > > > > > > > > + nodes.safe_push (vnode); > > > > > > > > > > + else > > > > > > > > > > + { > > > > > > > > > > + FOR_EACH_VEC_ELT (nodes, ix, tmp) > > > > > > > > > > + vect_free_slp_tree (tmp); > > > > > > > > > > + > > > > > > > > > > + return; > > > > > > > > > > + } > > > > > > > > > > + } > > > > > > > > > > + > > > > > > > > > > + FOR_EACH_VEC_ELT (this->m_ops, ix, node) > > > > > > > > > > + vect_free_slp_tree (node); > > > > > > > > > > + > > > > > > > > > > + SLP_TREE_CHILDREN (*this->m_node).truncate (0); > > > > > > > > > > + SLP_TREE_CHILDREN (*this->m_node).safe_splice (nodes); > > > > > > > > > > + > > > > > > > > > > + /* Now modify the nodes themselves. */ > > > > > > > > > > + FOR_EACH_VEC_ELT (this->m_workset, ix, node) > > > > > > > > > > + { > > > > > > > > > > + /* Calculate the location of the statement in NODE to > > replace. > > > > */ > > > > > > > > > > + stmt_info = SLP_TREE_REPRESENTATIVE (node); > > > > > > > > > > + gimple* old_stmt = STMT_VINFO_STMT (stmt_info); > > > > > > > > > > + tree lhs_old_stmt = gimple_get_lhs (old_stmt); > > > > > > > > > > + tree type = TREE_TYPE (lhs_old_stmt); > > > > > > > > > > + > > > > > > > > > > + /* Create the argument set for use by > > > > > > gimple_build_call_internal_vec. > > > > > > > > > */ > > > > > > > > > > + for (unsigned i = 0; i < this->m_num_args; i++) > > > > > > > > > > + args[i] = lhs_old_stmt; > > > > > > > > > > + > > > > > > > > > > + /* Create the new pattern statements. */ > > > > > > > > > > + call_stmt = gimple_build_call_internal_vec (this->m_ifn, > > args); > > > > > > > > > > + tree var = make_temp_ssa_name (type, call_stmt, > > > > "slp_patt"); > > > > > > > > > > + gimple_call_set_lhs (call_stmt, var); > > > > > > > > > > + gimple_set_location (call_stmt, gimple_location > > (old_stmt)); > > > > > > > > > > + gimple_call_set_nothrow (call_stmt, true); > > > > > > > > > > + > > > > > > > > > > + /* Adjust the book-keeping for the new and old > > statements > > > > for > > > > > > use > > > > > > > > > during > > > > > > > > > > + SLP. This is required to get the right VF and > > statement during > > > > > > SLP > > > > > > > > > > + analysis. These changes are created after relevancy > > has > > > > > > been set for > > > > > > > > > > + the nodes as such we need to manually update them. > > Any > > > > > > changes > > > > > > > > > will be > > > > > > > > > > + undone if SLP is cancelled. */ > > > > > > > > > > + call_stmt_info > > > > > > > > > > + = vinfo->add_pattern_stmt (call_stmt, stmt_info); > > > > > > > > > > + STMT_VINFO_RELEVANT (call_stmt_info) = > > > > vect_used_in_scope; > > > > > > > > > > + > > > > > > > > > > + /* Unfortunately still need this on the new pattern > > because > > > > non- > > > > > > loop > > > > > > > > > SLP > > > > > > > > > > + doesn't call vect_detect_hybrid_slp so it never > > updates it. > > > > > > */ > > > > > > > > > > + STMT_SLP_TYPE (call_stmt_info) = pure_slp; > > > > > > > > > > + > > > > > > > > > > + /* add_pattern_stmt can't be done in > > > > vect_mark_pattern_stmts > > > > > > > > > because > > > > > > > > > > + the non-SLP pattern matchers already have added > > the > > > > > > statement to > > > > > > > > > VINFO > > > > > > > > > > + by the time it is called. Some of them need to > > modify the > > > > > > returned > > > > > > > > > > + stmt_info. vect_mark_pattern_stmts is called by > > > > > > recog_pattern and > > > > > > > > > it > > > > > > > > > > + would increase the size of each pattern with > > boilerplate code > > > > > > to > > > > > > > > > make > > > > > > > > > > + the call there. */ > > > > > > > > > > + vect_mark_pattern_stmts (vinfo, stmt_info, call_stmt, > > > > > > > > > > + SLP_TREE_VECTYPE (node)); > > > > > > > > > > + > > > > > > > > > > + /* Since we are replacing all the statements in the group > > with > > > > the > > > > > > same > > > > > > > > > > + thing it doesn't really matter. So just set it every > > time a new > > > > > > stmt > > > > > > > > > > + is created. */ > > > > > > > > > > + SLP_TREE_REPRESENTATIVE (node) = call_stmt_info; > > > > > > > > > > + SLP_TREE_CODE (node) = CALL_EXPR; > > > > > > > > > > + } > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > +/********************************************************* > > > > > > > > > ********************** > > > > > > > > > > + * complex_add_pattern class > > > > > > > > > > + > > > > > > > > > > > > > > > > > > > > > ********************************************************** > > > > > > > > > ********************/ > > > > > > > > > > + > > > > > > > > > > +class complex_add_pattern : public complex_pattern > > > > > > > > > > +{ > > > > > > > > > > + protected: > > > > > > > > > > + complex_add_pattern (slp_tree *node, vec > > > > *m_ops, > > > > > > > > > internal_fn ifn) > > > > > > > > > > + : complex_pattern (node, m_ops, ifn) > > > > > > > > > > + { > > > > > > > > > > + this->m_num_args = 2; > > > > > > > > > > + } > > > > > > > > > > + > > > > > > > > > > + public: > > > > > > > > > > + static internal_fn > > > > > > > > > > + matches (complex_operation_t op, > > > > > > slp_tree_to_load_perm_map_t *, > > > > > > > > > > + vec *); > > > > > > > > > > + > > > > > > > > > > + static vect_pattern* > > > > > > > > > > + recognize (slp_tree_to_load_perm_map_t *, slp_tree *); > > > > > > > > > > +}; > > > > > > > > > > + > > > > > > > > > > +/* Pattern matcher for trying to match complex addition > > pattern > > > > in > > > > > > SLP > > > > > > > > > tree. > > > > > > > > > > + > > > > > > > > > > + If no match is found then IFN is set to IFN_LAST. > > > > > > > > > > + This function matches the patterns shaped as: > > > > > > > > > > + > > > > > > > > > > + c[i] = a[i] - b[i+1]; > > > > > > > > > > + c[i+1] = a[i+1] + b[i]; > > > > > > > > > > + > > > > > > > > > > + If a match occurred then TRUE is returned, else FALSE. The > > > > initial > > > > > > match > > > > > > > > > is > > > > > > > > > > + expected to be in OP1 and the initial match operands in > > args0. > > > > */ > > > > > > > > > > + > > > > > > > > > > +internal_fn > > > > > > > > > > +complex_add_pattern::matches (complex_operation_t op, > > > > > > > > > > + slp_tree_to_load_perm_map_t > > > > > > *perm_cache, > > > > > > > > > > + vec *ops) > > > > > > > > > > +{ > > > > > > > > > > + internal_fn ifn = IFN_LAST; > > > > > > > > > > + > > > > > > > > > > + /* Find the two components. Rotation in the complex plane > > will > > > > > > modify > > > > > > > > > > + the operations: > > > > > > > > > > + > > > > > > > > > > + * Rotation 0: + + > > > > > > > > > > + * Rotation 90: - + > > > > > > > > > > + * Rotation 180: - - > > > > > > > > > > + * Rotation 270: + - > > > > > > > > > > + > > > > > > > > > > + Rotation 0 and 180 can be handled by normal SIMD code, > > so > > > > we > > > > > > don't > > > > > > > > > need > > > > > > > > > > + to care about them here. */ > > > > > > > > > > + if (op == MINUS_PLUS) > > > > > > > > > > + ifn = IFN_COMPLEX_ADD_ROT90; > > > > > > > > > > + else if (op == PLUS_MINUS) > > > > > > > > > > + ifn = IFN_COMPLEX_ADD_ROT270; > > > > > > > > > > + else > > > > > > > > > > + return ifn; > > > > > > > > > > + > > > > > > > > > > + /* verify that there is a permute, otherwise this isn't a > > pattern > > > > we > > > > > > > > > > + we support. */ > > > > > > > > > > + bool is_linear = false; > > > > > > > > > > + gcc_assert (ops->length () == 2); > > > > > > > > > > + > > > > > > > > > > + vec children = SLP_TREE_CHILDREN ((*ops)[0]); > > > > > > > > > > + > > > > > > > > > > + /* First node must be unpermuted. */ > > > > > > > > > > + linear_loads_p (perm_cache, children[0], &is_linear); > > > > > > > > > > + if (!is_linear) > > > > > > > > > > + return IFN_LAST; > > > > > > > > > > + > > > > > > > > > > + /* Second node must be permuted. */ > > > > > > > > > > + if (linear_loads_p (perm_cache, children[1], > > &is_linear).length > > > > () > > > > > > > 0 > > > > > > > > > > + && is_linear) > > > > > > > > > > + return IFN_LAST; > > > > > > > > > > + > > > > > > > > > > + return ifn; > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +vect_pattern* > > > > > > > > > > +complex_add_pattern::recognize > > > > (slp_tree_to_load_perm_map_t > > > > > > > > > *perm_cache, > > > > > > > > > > + slp_tree *node) > > > > > > > > > > +{ > > > > > > > > > > + auto_vec ops; > > > > > > > > > > + complex_operation_t op > > > > > > > > > > + = vect_detect_pair_op (*node, true, &ops); > > > > > > > > > > + internal_fn ifn = complex_add_pattern::matches (op, > > > > perm_cache, > > > > > > > > > &ops); > > > > > > > > > > + if (!vect_pattern_validate_optab (ifn, *node)) > > > > > > > > > > + return NULL; > > > > > > > > > > + > > > > > > > > > > + return new complex_add_pattern (node, &ops, ifn); > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > +/********************************************************* > > > > > > > > > ********************** > > > > > > > > > > + * Pattern matching definitions > > > > > > > > > > + > > > > > > > > > > > > > > > > > > > > > ********************************************************** > > > > > > > > > ********************/ > > > > > > > > > > + > > > > > > > > > > +#define SLP_PATTERN(x) &x::recognize > > > > > > > > > > +vect_pattern_decl_t slp_patterns[] > > > > > > > > > > +{ > > > > > > > > > > + /* For least amount of back-tracking and more efficient > > > > matching > > > > > > > > > > + order patterns from the largest to the smallest. Especially > > if > > > > they > > > > > > > > > > + overlap in what they can detect. */ > > > > > > > > > > + > > > > > > > > > > + SLP_PATTERN (complex_add_pattern), > > > > > > > > > > +}; > > > > > > > > > > +#undef SLP_PATTERN > > > > > > > > > > + > > > > > > > > > > +/* Set the number of SLP pattern matchers available. */ > > > > > > > > > > +size_t num__slp_patterns = > > > > > > > > > sizeof(slp_patterns)/sizeof(vect_pattern_decl_t); > > > > > > > > > > diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > d19874f175703a96b1c1110874067fdbec48c068..7f5fbdbd4969036b5db1cb698 > > > > > > > > > da970304c87b03b 100644 > > > > > > > > > > --- a/gcc/tree-vect-slp.c > > > > > > > > > > +++ b/gcc/tree-vect-slp.c > > > > > > > > > > @@ -105,7 +105,7 @@ _slp_tree::~_slp_tree () > > > > > > > > > > > > > > > > > > > > /* Recursively free the memory allocated for the SLP tree > > rooted > > > > at > > > > > > NODE. > > > > > > > > > */ > > > > > > > > > > > > > > > > > > > > -static void > > > > > > > > > > +void > > > > > > > > > > vect_free_slp_tree (slp_tree node) > > > > > > > > > > { > > > > > > > > > > int i; > > > > > > > > > > @@ -148,7 +148,7 @@ vect_free_slp_instance (slp_instance > > > > instance) > > > > > > > > > > > > > > > > > > > > /* Create an SLP node for SCALAR_STMTS. */ > > > > > > > > > > > > > > > > > > > > -slp_tree > > > > > > > > > > +static slp_tree > > > > > > > > > > vect_create_new_slp_node (slp_tree node, > > > > > > > > > > vec scalar_stmts, unsigned > > > > > > nops) > > > > > > > > > > { > > > > > > > > > > @@ -165,7 +165,7 @@ vect_create_new_slp_node (slp_tree > > > > node, > > > > > > > > > > > > > > > > > > > > /* Create an SLP node for SCALAR_STMTS. */ > > > > > > > > > > > > > > > > > > > > -static slp_tree > > > > > > > > > > +slp_tree > > > > > > > > > > vect_create_new_slp_node (vec > > scalar_stmts, > > > > > > unsigned > > > > > > > > > nops) > > > > > > > > > > { > > > > > > > > > > return vect_create_new_slp_node (new _slp_tree, > > scalar_stmts, > > > > > > nops); > > > > > > > > > > @@ -2175,6 +2175,84 @@ calculate_unrolling_factor > > (poly_uint64 > > > > > > nunits, > > > > > > > > > unsigned int group_size) > > > > > > > > > > return exact_div (common_multiple (nunits, group_size), > > > > > > group_size); > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > +/* Helper function of vect_match_slp_patterns. > > > > > > > > > > + > > > > > > > > > > + Attempts to match patterns against the slp tree rooted in > > > > > > REF_NODE > > > > > > > > > using > > > > > > > > > > + VINFO. Patterns are matched in post-order traversal. > > > > > > > > > > + > > > > > > > > > > + If matching is successful the value in REF_NODE is updated > > and > > > > > > returned, > > > > > > > > > if > > > > > > > > > > + not then it is returned unchanged. */ > > > > > > > > > > + > > > > > > > > > > +static bool > > > > > > > > > > +vect_match_slp_patterns_2 (slp_tree *ref_node, vec_info > > > > *vinfo, > > > > > > > > > > + slp_tree_to_load_perm_map_t > > > > > > *perm_cache, > > > > > > > > > > + hash_set *visited) > > > > > > > > > > +{ > > > > > > > > > > + unsigned i; > > > > > > > > > > + slp_tree node = *ref_node; > > > > > > > > > > + bool found_p = false; > > > > > > > > > > + if (!node || visited->add (node)) > > > > > > > > > > + return false; > > > > > > > > > > + > > > > > > > > > > + slp_tree child; > > > > > > > > > > + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child) > > > > > > > > > > + found_p |= vect_match_slp_patterns_2 > > > > (&SLP_TREE_CHILDREN > > > > > > > > > (node)[i], > > > > > > > > > > + vinfo, perm_cache, > > visited); > > > > > > > > > > + > > > > > > > > > > + for (unsigned x = 0; x < num__slp_patterns; x++) > > > > > > > > > > + { > > > > > > > > > > + vect_pattern *pattern = slp_patterns[x] (perm_cache, > > > > ref_node); > > > > > > > > > > + if (pattern) > > > > > > > > > > + { > > > > > > > > > > + pattern->build (perm_cache, vinfo); > > > > > > > > > > + delete pattern; > > > > > > > > > > + found_p = true; > > > > > > > > > > + } > > > > > > > > > > + } > > > > > > > > > > + > > > > > > > > > > + return found_p; > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +/* Applies pattern matching to the given SLP tree rooted in > > > > > > REF_NODE > > > > > > > > > using > > > > > > > > > > + vec_info VINFO. > > > > > > > > > > + > > > > > > > > > > + The modified tree is returned. Patterns are tried in order > > and > > > > > > multiple > > > > > > > > > > + patterns may match. */ > > > > > > > > > > + > > > > > > > > > > +static bool > > > > > > > > > > +vect_match_slp_patterns (slp_instance instance, vec_info > > > > *vinfo, > > > > > > > > > > + hash_set *visited, > > > > > > > > > > + slp_tree_to_load_perm_map_t > > > > > > *perm_cache, > > > > > > > > > > + scalar_stmts_to_slp_tree_map_t * > > /* > > > > > > bst_map */) > > > > > > > > > > +{ > > > > > > > > > > + DUMP_VECT_SCOPE ("vect_match_slp_patterns"); > > > > > > > > > > + slp_tree *ref_node = &SLP_INSTANCE_TREE (instance); > > > > > > > > > > + > > > > > > > > > > + if (dump_enabled_p ()) > > > > > > > > > > + dump_printf_loc (MSG_NOTE, vect_location, > > > > > > > > > > + "Analyzing SLP tree %p for patterns\n", > > > > > > > > > > + SLP_INSTANCE_TREE (instance)); > > > > > > > > > > + > > > > > > > > > > + bool found_p > > > > > > > > > > + = vect_match_slp_patterns_2 (ref_node, vinfo, > > perm_cache, > > > > > > visited); > > > > > > > > > > + > > > > > > > > > > + if (found_p) > > > > > > > > > > + { > > > > > > > > > > + if (dump_enabled_p ()) > > > > > > > > > > + { > > > > > > > > > > + dump_printf_loc (MSG_NOTE, vect_location, > > > > > > > > > > + "Pattern matched SLP tree\n"); > > > > > > > > > > + vect_print_slp_graph (MSG_NOTE, vect_location, > > > > > > *ref_node); > > > > > > > > > > + } > > > > > > > > > > + } > > > > > > > > > > + > > > > > > > > > > + return found_p; > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > +/* Analyze an SLP instance starting from a group of grouped > > > > stores. > > > > > > Call > > > > > > > > > > + vect_build_slp_tree to build a tree of packed stmts if > > possible. > > > > > > > > > > + Return FALSE if it's impossible to SLP any stmt in the loop. > > */ > > > > > > > > > > + > > > > > > > > > > static bool > > > > > > > > > > vect_analyze_slp_instance (vec_info *vinfo, > > > > > > > > > > scalar_stmts_to_slp_tree_map_t *bst_map, > > > > > > > > > > @@ -2540,6 +2618,7 @@ vect_analyze_slp (vec_info *vinfo, > > > > unsigned > > > > > > > > > max_tree_size) > > > > > > > > > > { > > > > > > > > > > unsigned int i; > > > > > > > > > > stmt_vec_info first_element; > > > > > > > > > > + slp_instance instance; > > > > > > > > > > > > > > > > > > > > DUMP_VECT_SCOPE ("vect_analyze_slp"); > > > > > > > > > > > > > > > > > > > > @@ -2586,6 +2665,13 @@ vect_analyze_slp (vec_info *vinfo, > > > > > > unsigned > > > > > > > > > max_tree_size) > > > > > > > > > > slp_inst_kind_reduc_group, > > > > > > > > > max_tree_size); > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > + hash_set visited_patterns; > > > > > > > > > > + slp_tree_to_load_perm_map_t perm_cache; > > > > > > > > > > + /* See if any patterns can be found in the SLP tree. */ > > > > > > > > > > + FOR_EACH_VEC_ELT (LOOP_VINFO_SLP_INSTANCES (vinfo), > > i, > > > > > > instance) > > > > > > > > > > + vect_match_slp_patterns (instance, vinfo, > > &visited_patterns, > > > > > > > > > &perm_cache, > > > > > > > > > > + bst_map); > > > > > > > > > > + > > > > > > > > > > /* The map keeps a reference on SLP nodes built, release > > that. > > > > */ > > > > > > > > > > for (scalar_stmts_to_slp_tree_map_t::iterator it = bst_map- > > > > >begin > > > > > > (); > > > > > > > > > > it != bst_map->end (); ++it) > > > > > > > > > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h > > > > > > > > > > index > > > > > > > > > > > > > > > > > > > > > 91e2e10761d591b99ad55467e4719219ea5c0e49..ea39f56365e6c6fcbaaeb9cde > > > > > > > > > 769a81a109d6af3 100644 > > > > > > > > > > --- a/gcc/tree-vectorizer.h > > > > > > > > > > +++ b/gcc/tree-vectorizer.h > > > > > > > > > > @@ -27,6 +27,7 @@ typedef class _stmt_vec_info > > > > *stmt_vec_info; > > > > > > > > > > #include "tree-hash-traits.h" > > > > > > > > > > #include "target.h" > > > > > > > > > > #include "alloc-pool.h" > > > > > > > > > > +#include "internal-fn.h" > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > /* Used for naming of new temporaries. */ > > > > > > > > > > @@ -1994,6 +1995,7 @@ extern void > > duplicate_and_interleave > > > > > > (vec_info *, > > > > > > > > > gimple_seq *, tree, > > > > > > > > > > extern int vect_get_place_in_interleaving_chain > > (stmt_vec_info, > > > > > > > > > stmt_vec_info); > > > > > > > > > > extern bool vect_update_shared_vectype (stmt_vec_info, > > tree); > > > > > > > > > > extern slp_tree vect_create_new_slp_node > > > > (vec, > > > > > > > > > unsigned); > > > > > > > > > > +extern void vect_free_slp_tree (slp_tree); > > > > > > > > > > > > > > > > > > > > /* In tree-vect-patterns.c. */ > > > > > > > > > > extern void > > > > > > > > > > @@ -2010,4 +2012,67 @@ void > > vect_free_loop_info_assumptions > > > > > > (class > > > > > > > > > loop *); > > > > > > > > > > gimple *vect_loop_vectorized_call (class loop *, gcond > > **cond = > > > > > > NULL); > > > > > > > > > > bool vect_stmt_dominates_stmt_p (gimple *, gimple *); > > > > > > > > > > > > > > > > > > > > +/* SLP Pattern matcher types, tree-vect-slp-patterns.c. */ > > > > > > > > > > + > > > > > > > > > > +/* Forward declaration of possible two operands operation > > that > > > > can > > > > > > be > > > > > > > > > matched > > > > > > > > > > + by the complex numbers pattern matchers. */ > > > > > > > > > > +enum _complex_operation : unsigned; > > > > > > > > > > + > > > > > > > > > > +/* Cache from nodes to the load permutation they represent. > > */ > > > > > > > > > > +typedef hash_map > > > > > > > > > > + slp_tree_to_load_perm_map_t; > > > > > > > > > > + > > > > > > > > > > +/* Vector pattern matcher base class. All SLP pattern > > matchers > > > > must > > > > > > > > > inherit > > > > > > > > > > + from this type. */ > > > > > > > > > > + > > > > > > > > > > +class vect_pattern > > > > > > > > > > +{ > > > > > > > > > > + protected: > > > > > > > > > > + /* The number of arguments that the IFN requires. */ > > > > > > > > > > + unsigned m_num_args; > > > > > > > > > > + > > > > > > > > > > + /* The internal function that will be used when a pattern is > > > > created. > > > > > > */ > > > > > > > > > > + internal_fn m_ifn; > > > > > > > > > > + > > > > > > > > > > + /* The current node being inspected. */ > > > > > > > > > > + slp_tree *m_node; > > > > > > > > > > + > > > > > > > > > > + /* The list of operands to be the children for the node > > > > produced > > > > > > when > > > > > > > > > the > > > > > > > > > > + internal function is created. */ > > > > > > > > > > + vec m_ops; > > > > > > > > > > + > > > > > > > > > > + /* Default constructor where NODE is the root of the tree > > to > > > > > > inspect. */ > > > > > > > > > > + vect_pattern (slp_tree *node, vec *m_ops, > > > > internal_fn > > > > > > ifn) > > > > > > > > > > + { > > > > > > > > > > + this->m_ifn = ifn; > > > > > > > > > > + this->m_node = node; > > > > > > > > > > + this->m_ops.create (0); > > > > > > > > > > + this->m_ops.safe_splice (*m_ops); > > > > > > > > > > + } > > > > > > > > > > + > > > > > > > > > > + public: > > > > > > > > > > + > > > > > > > > > > + /* Create a new instance of the pattern matcher class of > > the > > > > given > > > > > > type. > > > > > > > > > */ > > > > > > > > > > + static vect_pattern* recognize > > > > (slp_tree_to_load_perm_map_t *, > > > > > > > > > slp_tree *); > > > > > > > > > > + > > > > > > > > > > + /* Build the pattern from the data collected so far. */ > > > > > > > > > > + virtual void build (slp_tree_to_load_perm_map_t *, > > vec_info > > > > *) = > > > > > > 0; > > > > > > > > > > + > > > > > > > > > > + /* Default destructor. */ > > > > > > > > > > + virtual ~vect_pattern () > > > > > > > > > > + { > > > > > > > > > > + this->m_ops.release (); > > > > > > > > > > + } > > > > > > > > > > +}; > > > > > > > > > > + > > > > > > > > > > +/* Function pointer to create a new pattern matcher from a > > > > generic > > > > > > type. > > > > > > > > > */ > > > > > > > > > > +typedef vect_pattern* (*vect_pattern_decl_t) > > > > > > > > > (slp_tree_to_load_perm_map_t *, > > > > > > > > > > + slp_tree *); > > > > > > > > > > + > > > > > > > > > > +/* List of supported pattern matchers. */ > > > > > > > > > > +extern vect_pattern_decl_t slp_patterns[]; > > > > > > > > > > + > > > > > > > > > > +/* Number of supported pattern matchers. */ > > > > > > > > > > +extern size_t num__slp_patterns; > > > > > > > > > > + > > > > > > > > > > #endif /* GCC_TREE_VECTORIZER_H */ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > Richard Biener > > > > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, > > 90409 > > > > > > > > > Nuernberg, > > > > > > > > > Germany; GF: Felix Imend > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Richard Biener > > > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 > > > > > > Nuernberg, > > > > > > Germany; GF: Felix Imend > > > > > > > > > > > > > -- > > > > Richard Biener > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 > > > > Nuernberg, > > > > Germany; GF: Felix Imend > > > > > > > > > > -- > > Richard Biener > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 > > Nuernberg, > > Germany; GF: Felix Imend > -- Richard Biener SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imend