From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id DE47B383E69E for ; Thu, 9 Jun 2022 08:22:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org DE47B383E69E Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 3C694113E; Thu, 9 Jun 2022 01:22:23 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.37]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 291613F73B; Thu, 9 Jun 2022 01:22:21 -0700 (PDT) From: Richard Sandiford To: Tamar Christina Mail-Followup-To: Tamar Christina , "gcc-patches\@gcc.gnu.org" , nd , Richard Earnshaw , Marcus Shawcroft , Kyrylo Tkachov , "rguenther\@suse.de" , "roger\@nextmovesoftware.com" , richard.sandiford@arm.com Cc: "gcc-patches\@gcc.gnu.org" , nd , Richard Earnshaw , Marcus Shawcroft , Kyrylo Tkachov , "rguenther\@suse.de" , "roger\@nextmovesoftware.com" Subject: Re: [PATCH]AArch64 relax predicate on load structure load instructions References: Date: Thu, 09 Jun 2022 09:22:19 +0100 In-Reply-To: (Tamar Christina's message of "Thu, 9 Jun 2022 07:42:36 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-59.3 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Jun 2022 08:22:29 -0000 Tamar Christina writes: >> -----Original Message----- >> From: Richard Sandiford >> Sent: Wednesday, June 8, 2022 3:36 PM >> To: Tamar Christina >> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw >> ; Marcus Shawcroft >> ; Kyrylo Tkachov >> ; rguenther@suse.de; roger@eyesopen.com >> Subject: Re: [PATCH]AArch64 relax predicate on load structure load >> instructions >>=20 >> Tamar Christina writes: >> >> -----Original Message----- >> >> From: Richard Sandiford >> >> Sent: Wednesday, June 8, 2022 11:31 AM >> >> To: Tamar Christina >> >> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw >> >> ; Marcus Shawcroft >> >> ; Kyrylo Tkachov >> >> >> Subject: Re: [PATCH]AArch64 relax predicate on load structure load >> >> instructions >> >> >> >> Tamar Christina writes: >> >> > Hi All, >> >> > >> >> > At some point in time we started lowering the ld1r instructions in >> gimple. >> >> > >> >> > That is: >> >> > >> >> > uint8x8_t f1(const uint8_t *in) { >> >> > return vld1_dup_u8(&in[1]); >> >> > } >> >> > >> >> > generates at gimple: >> >> > >> >> > _3 =3D MEM[(const uint8_t *)in_1(D) + 1B]; >> >> > _4 =3D {_3, _3, _3, _3, _3, _3, _3, _3}; >> >> > >> >> > Which is good, but we then generate: >> >> > >> >> > f1: >> >> > ldr b0, [x0, 1] >> >> > dup v0.8b, v0.b[0] >> >> > ret >> >> > >> >> > instead of ld1r. >> >> > >> >> > The reason for this is because the load instructions have a too >> >> > restrictive predicate on them which causes combine not to be able >> >> > to combine the instructions due to the predicate only accepting >> >> > simple >> >> addressing modes. >> >> > >> >> > This patch relaxes the predicate to accept any memory operand and >> >> > relies on LRA to legitimize the address when it needs to as the >> >> > constraint still only allows the simple addressing mode. Reload is >> >> > always able to legitimize to these. >> >> > >> >> > Secondly since we are now actually generating more ld1r it became >> >> > clear that the lane instructions suffer from a similar issue. >> >> > >> >> > i.e. >> >> > >> >> > float32x4_t f2(const float32_t *in, float32x4_t a) { >> >> > float32x4_t dup =3D vld1q_dup_f32(&in[1]); >> >> > return vfmaq_laneq_f32 (a, a, dup, 1); } >> >> > >> >> > would generate ld1r + vector fmla instead of ldr + lane fmla. >> >> > >> >> > The reason for this is similar to the ld1r issue. The predicate is >> >> > too restrictive in only acception register operands but not memory. >> >> > >> >> > This relaxes it to accept register and/or memory while leaving the >> >> > constraint to only accept registers. This will have LRA generate a >> >> > reload if needed forcing the memory to registers using the standard >> >> patterns. >> >> > >> >> > These two changes allow combine and reload to generate the right >> >> sequences. >> >> > >> >> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. >> >> >> >> This is going against the general direction of travel, which is to >> >> make the instruction's predicates and conditions enforce the >> >> constraints as much as possible (making optimistic assumptions about >> pseudo registers). >> >> >> >> The RA *can* deal with things like: >> >> >> >> (match_operand:M N "general_operand" "r") >> >> >> >> but it's best avoided, for a few reasons: >> >> >> >> (1) The fix-up will be done in LRA, so IRA will not see the temporary >> >> registers. This can make the allocation of those temporaries >> >> suboptimal but (more importantly) it might require other >> >> previously-allocated registers to be spilled late due to the >> >> unexpected increase in register pressure. >> >> >> >> (2) It ends up hiding instructions from the pre-RA optimisers. >> >> >> >> (3) It can also prevent combine opportunities (as well as create them= ), >> >> unless the loose predicates in an insn I are propagated to all >> >> patterns that might result from combining I with something else. >> >> >> >> It sounds like the first problem (not generating ld1r) could be fixed >> >> by (a) combining aarch64_simd_dup and >> *aarch64_simd_ld1r, >> >> so that the register and memory alternatives are in the same pattern >> >> and (b) using the merged instruction(s) to implement the vec_duplicate >> optab. >> >> Target-independent code should then make the address satisfy the >> >> predicate, simplifying the address where necessary. >> >> >> > >> > I think I am likely missing something here. I would assume that you >> > wanted to use the optab to split the addressing off from the mem >> > expression so the combined insn matches. >> > >> > But in that case, why do you need to combine the two instructions? >> > I've tried and it doesn't work since the vec_duplicate optab doesn't >> > see the mem as op1, because in gimple the mem is not part of the >> duplicate. >> > >> > So you still just see: >> > >> >>>> dbgrtx (ops[1].value) >> > (subreg/s/v:QI (reg:SI 92 [ _3 ]) 0) >> > >> > As the operand as the argument to the dup is just an SSA_NAME. >>=20 >> Ah, yeah, I'd forgotten that fixed-length vec_duplicates would come from= a >> constructor rather than a vec_duplicate_expr, so we don't get the usual >> benefit of folding single-use mems during expand. >>=20 >> https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595362.html >> moves towards using vec_duplicate even for fixed-length vectors. >> If we take that approach, then I suppose a plain constructor should be f= olded >> to a vec_duplicate where possible. >>=20 >> (Alternatively, we could use an extended vec_perm_expr with scalar input= s, >> as Richi suggested in that thread.) >>=20 >> If we don't do that, or don't do it yet, then=E2=80=A6 >>=20 >> > If not and you wanted the combined insn to accept >> > >> > (set (reg:SI 92 [ _3 ]) >> > (zero_extend:SI (mem:QI (plus:DI (reg:DI 97) >> > (const_int 1 [0x1])) [0 MEM[(const uint8_tD.4561 >> > *)in_1(D) + 1B]+0 S1 A8]))) >> > >> > Then that's also not possible without relaxing the combined >> > predicates. As far as I can tell If I'm not allowed to use LRA for th= is, then >> the only thing that could work is an early split? >> > >> > Or do I have to modify store_constructor to try a variant where it >> > tries pushing in the Decl of an SSA_NAME first? >>=20 >> =E2=80=A6yeah, something like this would be needed. But the vec_duplica= te_expr/ >> vec_perm_expr thing seems better, even if we only introduce it during is= el. >>=20 >> Not my call either way though :-) Let's see what Richi (cc:ed) thinks. > > FWIW, since my inner "Richards like patch" detector still needs tunings = =F0=9F=98=8A > I did a quick experiment. Teaching gimple_build_vector_from_val to allow= the > non-constant case and then teaching simplify_vector_constructor to use it= for the > non-constant case gets them generated. > > Then I had to teach aarch64_expand_vector_init to generate vec_duplicate_= expr > when the value is non-constant works. > > I thought about skipping vec_init entirely in this case during expansion = however > there doesn't seem to be a way to test for vec_duplicate_expr as Richi me= ntioned, > it doesn't seem to have an associated optab. I don't understand, sorry. The optab is vec_duplicate_optab (generated via expand_vector_broadcast), and although we don't implement that for Advanced SIMD yet, the point of the above was that we would. Thanks, Richard > > This approach does fix the problem, but I'll hold out on cleaning it up t= ill I hear it's > acceptable. > > Cheers, > Tamar > >>=20 >> Thanks, >> Richard >>=20 >> > I guess this also only really works for ld1r, whenever we lower ld2(r) >> > etc we'll have the same issue again... But I suppose that's for the >> > next person =F0=9F=98=8A >> > >> > Thanks, >> > Tamar >> > >> >> I'm not sure whether fixing the ld1r problem that way will avoid the >> >> vfmaq_laneq_f32 problem; let me know if not. >> >> >> >> Thanks, >> >> Richard >> >> >> >> > Ok for master? >> >> > >> >> > Thanks, >> >> > Tamar >> >> > >> >> > gcc/ChangeLog: >> >> > >> >> > * config/aarch64/aarch64-simd.md (mul_lane3, >> >> mul_laneq3, >> >> > mul_n3, *aarch64_mul3_elt_to_64v2df, >> >> *aarch64_mla_elt, >> >> > *aarch64_mla_elt_, >> >> aarch64_mla_n, >> >> > *aarch64_mls_elt, >> >> *aarch64_mls_elt_, >> >> > aarch64_mls_n, *aarch64_fma4_elt, >> >> > *aarch64_fma4_elt_, >> >> > *aarch64_fma4_elt_from_dup, >> >> *aarch64_fma4_elt_to_64v2df, >> >> > *aarch64_fnma4_elt, >> >> *aarch64_fnma4_elt_, >> >> > *aarch64_fnma4_elt_from_dup, >> >> *aarch64_fnma4_elt_to_64v2df, >> >> > *aarch64_mulx_elt_, >> >> > *aarch64_mulx_elt, *aarch64_mulx_elt_from_dup, >> >> > *aarch64_vgetfmulx): Relax register_operand to >> >> > nonimmediate_operand. >> >> > (aarch64_simd_ld2, aarch64_simd_ld2r, >> >> > aarch64_vec_load_lanes_lane, >> >> > vec_load_lanes, >> >> aarch64_simd_st2, >> >> > aarch64_vec_store_lanes_lane, >> >> > vec_store_lanes, >> >> aarch64_simd_ld3, >> >> > aarch64_simd_ld3r, >> >> > aarch64_vec_load_lanes_lane, >> >> > vec_load_lanes, >> >> aarch64_simd_st3, >> >> > aarch64_vec_store_lanes_lane, >> >> > vec_store_lanes, >> >> aarch64_simd_ld4, >> >> > aarch64_simd_ld4r, >> >> > aarch64_vec_load_lanes_lane, >> >> > vec_load_lanes, >> >> aarch64_simd_st4, >> >> > aarch64_vec_store_lanes_lane, >> >> > vec_store_lanes, >> >> aarch64_ld1_x3_, >> >> > aarch64_ld1_x4_, aarch64_st1_x2_, >> >> > aarch64_st1_x3_, aarch64_st1_x4_, >> >> > aarch64_be_ld1, aarch64_be_st1, >> >> > aarch64_ld2_dreg, aarch64_ld2_dreg, >> >> > aarch64_ld3_dreg, aarch64_ld3_dreg, >> >> > aarch64_ld4_dreg, aarch64_ld4_dreg, >> >> > aarch64_st2_dreg, aarch64_st2_dreg, >> >> > aarch64_st3_dreg, aarch64_st3_dreg, >> >> > aarch64_st4_dreg, aarch64_st4_dreg, >> >> > *aarch64_simd_ld1r, aarch64_simd_ld1_x2): >> >> Relax >> >> > aarch64_simd_struct_operand to memory_operand. >> >> > * config/aarch64/predicates.md (aarch64_simd_struct_operand): >> >> Remove. >> >> > >> >> > gcc/testsuite/ChangeLog: >> >> > >> >> > * gcc.target/aarch64/vld1r.c: New test. >> >> > >> >> > --- inline copy of patch -- >> >> > diff --git a/gcc/config/aarch64/aarch64-simd.md >> >> > b/gcc/config/aarch64/aarch64-simd.md >> >> > index >> >> > >> >> >> be5c70bbb7520ae93d19c4a432ce34863e5b9a64..24e3274ddda2ea76c83571fa >> >> da8f >> >> > f4c953b752a1 100644 >> >> > --- a/gcc/config/aarch64/aarch64-simd.md >> >> > +++ b/gcc/config/aarch64/aarch64-simd.md >> >> > @@ -712,7 +712,7 @@ (define_insn "mul_lane3" >> >> > (mult:VMULD >> >> > (vec_duplicate:VMULD >> >> > (vec_select: >> >> > - (match_operand: 2 "register_operand" "") >> >> > + (match_operand: 2 "nonimmediate_operand" >> >> "") >> >> > (parallel [(match_operand:SI 3 "immediate_operand" "i")]))) >> >> > (match_operand:VMULD 1 "register_operand" "w")))] >> >> > "TARGET_SIMD" >> >> > @@ -728,7 +728,7 @@ (define_insn "mul_laneq3" >> >> > (mult:VMUL >> >> > (vec_duplicate:VMUL >> >> > (vec_select: >> >> > - (match_operand: 2 "register_operand" "") >> >> > + (match_operand: 2 "nonimmediate_operand" >> >> "") >> >> > (parallel [(match_operand:SI 3 "immediate_operand")]))) >> >> > (match_operand:VMUL 1 "register_operand" "w")))] >> >> > "TARGET_SIMD" >> >> > @@ -743,7 +743,7 @@ (define_insn "mul_n3" >> >> > [(set (match_operand:VMUL 0 "register_operand" "=3Dw") >> >> > (mult:VMUL >> >> > (vec_duplicate:VMUL >> >> > - (match_operand: 2 "register_operand" "")) >> >> > + (match_operand: 2 "nonimmediate_operand" "")) >> >> > (match_operand:VMUL 1 "register_operand" "w")))] >> >> > "TARGET_SIMD" >> >> > "mul\t%0., %1., %2.[0]"; @@ -789,7 >> >> > +789,7 @@ (define_insn "*aarch64_mul3_elt_to_64v2df" >> >> > [(set (match_operand:DF 0 "register_operand" "=3Dw") >> >> > (mult:DF >> >> > (vec_select:DF >> >> > - (match_operand:V2DF 1 "register_operand" "w") >> >> > + (match_operand:V2DF 1 "nonimmediate_operand" "w") >> >> > (parallel [(match_operand:SI 2 "immediate_operand")])) >> >> > (match_operand:DF 3 "register_operand" "w")))] >> >> > "TARGET_SIMD" >> >> > @@ -1406,7 +1406,7 @@ (define_insn "*aarch64_mla_elt" >> >> > (mult:VDQHS >> >> > (vec_duplicate:VDQHS >> >> > (vec_select: >> >> > - (match_operand:VDQHS 1 "register_operand" "") >> >> > + (match_operand:VDQHS 1 "nonimmediate_operand" >> >> "") >> >> > (parallel [(match_operand:SI 2 "immediate_operand")]))) >> >> > (match_operand:VDQHS 3 "register_operand" "w")) >> >> > (match_operand:VDQHS 4 "register_operand" "0")))] @@ -1424,7 >> >> > +1424,7 @@ (define_insn >> >> "*aarch64_mla_elt_" >> >> > (mult:VDQHS >> >> > (vec_duplicate:VDQHS >> >> > (vec_select: >> >> > - (match_operand: 1 "register_operand" >> >> "") >> >> > + (match_operand: 1 >> >> "nonimmediate_operand" "") >> >> > (parallel [(match_operand:SI 2 "immediate_operand")]))) >> >> > (match_operand:VDQHS 3 "register_operand" "w")) >> >> > (match_operand:VDQHS 4 "register_operand" "0")))] @@ -1441,7 >> >> > +1441,7 @@ (define_insn "aarch64_mla_n" >> >> > (plus:VDQHS >> >> > (mult:VDQHS >> >> > (vec_duplicate:VDQHS >> >> > - (match_operand: 3 "register_operand" "")) >> >> > + (match_operand: 3 "nonimmediate_operand" "")) >> >> > (match_operand:VDQHS 2 "register_operand" "w")) >> >> > (match_operand:VDQHS 1 "register_operand" "0")))] >> >> > "TARGET_SIMD" >> >> > @@ -1466,7 +1466,7 @@ (define_insn "*aarch64_mls_elt" >> >> > (mult:VDQHS >> >> > (vec_duplicate:VDQHS >> >> > (vec_select: >> >> > - (match_operand:VDQHS 1 "register_operand" "") >> >> > + (match_operand:VDQHS 1 "nonimmediate_operand" >> >> "") >> >> > (parallel [(match_operand:SI 2 "immediate_operand")]))) >> >> > (match_operand:VDQHS 3 "register_operand" "w"))))] >> >> > "TARGET_SIMD" >> >> > @@ -1484,7 +1484,7 @@ (define_insn >> >> "*aarch64_mls_elt_" >> >> > (mult:VDQHS >> >> > (vec_duplicate:VDQHS >> >> > (vec_select: >> >> > - (match_operand: 1 "register_operand" >> >> "") >> >> > + (match_operand: 1 >> >> "nonimmediate_operand" "") >> >> > (parallel [(match_operand:SI 2 "immediate_operand")]))) >> >> > (match_operand:VDQHS 3 "register_operand" "w"))))] >> >> > "TARGET_SIMD" >> >> > @@ -1501,7 +1501,7 @@ (define_insn "aarch64_mls_n" >> >> > (match_operand:VDQHS 1 "register_operand" "0") >> >> > (mult:VDQHS >> >> > (vec_duplicate:VDQHS >> >> > - (match_operand: 3 "register_operand" "")) >> >> > + (match_operand: 3 "nonimmediate_operand" "")) >> >> > (match_operand:VDQHS 2 "register_operand" "w"))))] >> >> > "TARGET_SIMD" >> >> > "mls\t%0., %2., %3.[0]" >> >> > @@ -2882,7 +2882,7 @@ (define_insn "*aarch64_fma4_elt" >> >> > (fma:VDQF >> >> > (vec_duplicate:VDQF >> >> > (vec_select: >> >> > - (match_operand:VDQF 1 "register_operand" "") >> >> > + (match_operand:VDQF 1 "nonimmediate_operand" "") >> >> > (parallel [(match_operand:SI 2 "immediate_operand")]))) >> >> > (match_operand:VDQF 3 "register_operand" "w") >> >> > (match_operand:VDQF 4 "register_operand" "0")))] @@ -2899,7 >> >> > +2899,7 @@ (define_insn >> >> "*aarch64_fma4_elt_" >> >> > (fma:VDQSF >> >> > (vec_duplicate:VDQSF >> >> > (vec_select: >> >> > - (match_operand: 1 "register_operand" >> >> "") >> >> > + (match_operand: 1 "nonimmediate_operand" >> >> "") >> >> > (parallel [(match_operand:SI 2 "immediate_operand")]))) >> >> > (match_operand:VDQSF 3 "register_operand" "w") >> >> > (match_operand:VDQSF 4 "register_operand" "0")))] @@ -2915,7 >> >> > +2915,7 @@ (define_insn "*aarch64_fma4_elt_from_dup" >> >> > [(set (match_operand:VMUL 0 "register_operand" "=3Dw") >> >> > (fma:VMUL >> >> > (vec_duplicate:VMUL >> >> > - (match_operand: 1 "register_operand" "")) >> >> > + (match_operand: 1 "nonimmediate_operand" "")) >> >> > (match_operand:VMUL 2 "register_operand" "w") >> >> > (match_operand:VMUL 3 "register_operand" "0")))] >> >> > "TARGET_SIMD" >> >> > @@ -2927,7 +2927,7 @@ (define_insn "*aarch64_fma4_elt_to_64v2df" >> >> > [(set (match_operand:DF 0 "register_operand" "=3Dw") >> >> > (fma:DF >> >> > (vec_select:DF >> >> > - (match_operand:V2DF 1 "register_operand" "w") >> >> > + (match_operand:V2DF 1 "nonimmediate_operand" "w") >> >> > (parallel [(match_operand:SI 2 "immediate_operand")])) >> >> > (match_operand:DF 3 "register_operand" "w") >> >> > (match_operand:DF 4 "register_operand" "0")))] @@ -2957,7 >> >> > +2957,7 @@ (define_insn "*aarch64_fnma4_elt" >> >> > (match_operand:VDQF 3 "register_operand" "w")) >> >> > (vec_duplicate:VDQF >> >> > (vec_select: >> >> > - (match_operand:VDQF 1 "register_operand" "") >> >> > + (match_operand:VDQF 1 "nonimmediate_operand" "") >> >> > (parallel [(match_operand:SI 2 "immediate_operand")]))) >> >> > (match_operand:VDQF 4 "register_operand" "0")))] >> >> > "TARGET_SIMD" >> >> > @@ -2975,7 +2975,7 @@ (define_insn >> >> "*aarch64_fnma4_elt_" >> >> > (match_operand:VDQSF 3 "register_operand" "w")) >> >> > (vec_duplicate:VDQSF >> >> > (vec_select: >> >> > - (match_operand: 1 "register_operand" >> >> "") >> >> > + (match_operand: 1 "nonimmediate_operand" >> >> "") >> >> > (parallel [(match_operand:SI 2 "immediate_operand")]))) >> >> > (match_operand:VDQSF 4 "register_operand" "0")))] >> >> > "TARGET_SIMD" >> >> > @@ -2992,7 +2992,7 @@ (define_insn >> >> "*aarch64_fnma4_elt_from_dup" >> >> > (neg:VMUL >> >> > (match_operand:VMUL 2 "register_operand" "w")) >> >> > (vec_duplicate:VMUL >> >> > - (match_operand: 1 "register_operand" "")) >> >> > + (match_operand: 1 "nonimmediate_operand" "")) >> >> > (match_operand:VMUL 3 "register_operand" "0")))] >> >> > "TARGET_SIMD" >> >> > "fmls\t%0., %2., %1.[0]" >> >> > @@ -3003,7 +3003,7 @@ (define_insn >> "*aarch64_fnma4_elt_to_64v2df" >> >> > [(set (match_operand:DF 0 "register_operand" "=3Dw") >> >> > (fma:DF >> >> > (vec_select:DF >> >> > - (match_operand:V2DF 1 "register_operand" "w") >> >> > + (match_operand:V2DF 1 "nonimmediate_operand" "w") >> >> > (parallel [(match_operand:SI 2 "immediate_operand")])) >> >> > (neg:DF >> >> > (match_operand:DF 3 "register_operand" "w")) @@ -4934,7 >> >> > +4934,7 @@ (define_insn >> >> "*aarch64_mulx_elt_" >> >> > [(match_operand:VDQSF 1 "register_operand" "w") >> >> > (vec_duplicate:VDQSF >> >> > (vec_select: >> >> > - (match_operand: 2 "register_operand" "w") >> >> > + (match_operand: 2 "nonimmediate_operand" >> >> "w") >> >> > (parallel [(match_operand:SI 3 "immediate_operand" "i")])))] >> >> > UNSPEC_FMULX))] >> >> > "TARGET_SIMD" >> >> > @@ -4953,7 +4953,7 @@ (define_insn "*aarch64_mulx_elt" >> >> > [(match_operand:VDQF 1 "register_operand" "w") >> >> > (vec_duplicate:VDQF >> >> > (vec_select: >> >> > - (match_operand:VDQF 2 "register_operand" "w") >> >> > + (match_operand:VDQF 2 "nonimmediate_operand" "w") >> >> > (parallel [(match_operand:SI 3 "immediate_operand" "i")])))] >> >> > UNSPEC_FMULX))] >> >> > "TARGET_SIMD" >> >> > @@ -4971,7 +4971,7 @@ (define_insn >> >> "*aarch64_mulx_elt_from_dup" >> >> > (unspec:VHSDF >> >> > [(match_operand:VHSDF 1 "register_operand" "w") >> >> > (vec_duplicate:VHSDF >> >> > - (match_operand: 2 "register_operand" ""))] >> >> > + (match_operand: 2 "nonimmediate_operand" ""))] >> >> > UNSPEC_FMULX))] >> >> > "TARGET_SIMD" >> >> > "fmulx\t%0., %1., %2.[0]"; @@ -4987,7 >> >> +4987,7 >> >> > @@ (define_insn "*aarch64_vgetfmulx" >> >> > (unspec: >> >> > [(match_operand: 1 "register_operand" "w") >> >> > (vec_select: >> >> > - (match_operand:VDQF 2 "register_operand" "w") >> >> > + (match_operand:VDQF 2 "nonimmediate_operand" "w") >> >> > (parallel [(match_operand:SI 3 "immediate_operand" "i")]))] >> >> > UNSPEC_FMULX))] >> >> > "TARGET_SIMD" >> >> > @@ -6768,7 +6768,7 @@ (define_insn "*sqrt2" >> >> > (define_insn "aarch64_simd_ld2" >> >> > [(set (match_operand:VSTRUCT_2Q 0 "register_operand" "=3Dw") >> >> > (unspec:VSTRUCT_2Q [ >> >> > - (match_operand:VSTRUCT_2Q 1 "aarch64_simd_struct_operand" >> >> "Utv")] >> >> > + (match_operand:VSTRUCT_2Q 1 "memory_operand" "Utv")] >> >> > UNSPEC_LD2))] >> >> > "TARGET_SIMD" >> >> > "ld2\\t{%S0. - %T0.}, %1" >> >> > @@ -6778,7 +6778,7 @@ (define_insn "aarch64_simd_ld2" >> >> > (define_insn "aarch64_simd_ld2r" >> >> > [(set (match_operand:VSTRUCT_2QD 0 "register_operand" "=3Dw") >> >> > (unspec:VSTRUCT_2QD [ >> >> > - (match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")] >> >> > + (match_operand:BLK 1 "memory_operand" "Utv")] >> >> > UNSPEC_LD2_DUP))] >> >> > "TARGET_SIMD" >> >> > "ld2r\\t{%S0. - %T0.}, %1" >> >> > @@ -6788,7 +6788,7 @@ (define_insn >> "aarch64_simd_ld2r" >> >> > (define_insn "aarch64_vec_load_lanes_lane" >> >> > [(set (match_operand:VSTRUCT_2QD 0 "register_operand" "=3Dw") >> >> > (unspec:VSTRUCT_2QD [ >> >> > - (match_operand:BLK 1 "aarch64_simd_struct_operand" >> >> "Utv") >> >> > + (match_operand:BLK 1 "memory_operand" "Utv") >> >> > (match_operand:VSTRUCT_2QD 2 "register_operand" "0") >> >> > (match_operand:SI 3 "immediate_operand" "i")] >> >> > UNSPEC_LD2_LANE))] >> >> > @@ -6804,7 +6804,7 @@ (define_insn >> >> "aarch64_vec_load_lanes_lane" >> >> > (define_expand "vec_load_lanes" >> >> > [(set (match_operand:VSTRUCT_2Q 0 "register_operand") >> >> > (unspec:VSTRUCT_2Q [ >> >> > - (match_operand:VSTRUCT_2Q 1 >> >> "aarch64_simd_struct_operand")] >> >> > + (match_operand:VSTRUCT_2Q 1 "memory_operand")] >> >> > UNSPEC_LD2))] >> >> > "TARGET_SIMD" >> >> > { >> >> > @@ -6822,7 +6822,7 @@ (define_expand >> >> "vec_load_lanes" >> >> > }) >> >> > >> >> > (define_insn "aarch64_simd_st2" >> >> > - [(set (match_operand:VSTRUCT_2Q 0 >> "aarch64_simd_struct_operand" >> >> > "=3DUtv") >> >> > + [(set (match_operand:VSTRUCT_2Q 0 "memory_operand" "=3DUtv") >> >> > (unspec:VSTRUCT_2Q [ >> >> > (match_operand:VSTRUCT_2Q 1 "register_operand" "w")] >> >> > UNSPEC_ST2))] >> >> > @@ -6833,7 +6833,7 @@ (define_insn "aarch64_simd_st2" >> >> > >> >> > ;; RTL uses GCC vector extension indices, so flip only for assembl= y. >> >> > (define_insn "aarch64_vec_store_lanes_lane" >> >> > - [(set (match_operand:BLK 0 "aarch64_simd_struct_operand" "=3DUtv= ") >> >> > + [(set (match_operand:BLK 0 "memory_operand" "=3DUtv") >> >> > (unspec:BLK [(match_operand:VSTRUCT_2QD 1 "register_operand" >> >> "w") >> >> > (match_operand:SI 2 "immediate_operand" "i")] >> >> > UNSPEC_ST2_LANE))] >> >> > @@ -6847,7 +6847,7 @@ (define_insn >> >> "aarch64_vec_store_lanes_lane" >> >> > ) >> >> > >> >> > (define_expand "vec_store_lanes" >> >> > - [(set (match_operand:VSTRUCT_2Q 0 >> "aarch64_simd_struct_operand") >> >> > + [(set (match_operand:VSTRUCT_2Q 0 "memory_operand") >> >> > (unspec:VSTRUCT_2Q [(match_operand:VSTRUCT_2Q 1 >> >> "register_operand")] >> >> > UNSPEC_ST2))] >> >> > "TARGET_SIMD" >> >> > @@ -6868,7 +6868,7 @@ (define_expand >> >> "vec_store_lanes" >> >> > (define_insn "aarch64_simd_ld3" >> >> > [(set (match_operand:VSTRUCT_3Q 0 "register_operand" "=3Dw") >> >> > (unspec:VSTRUCT_3Q [ >> >> > - (match_operand:VSTRUCT_3Q 1 "aarch64_simd_struct_operand" >> >> "Utv")] >> >> > + (match_operand:VSTRUCT_3Q 1 "memory_operand" "Utv")] >> >> > UNSPEC_LD3))] >> >> > "TARGET_SIMD" >> >> > "ld3\\t{%S0. - %U0.}, %1" >> >> > @@ -6878,7 +6878,7 @@ (define_insn "aarch64_simd_ld3" >> >> > (define_insn "aarch64_simd_ld3r" >> >> > [(set (match_operand:VSTRUCT_3QD 0 "register_operand" "=3Dw") >> >> > (unspec:VSTRUCT_3QD [ >> >> > - (match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")] >> >> > + (match_operand:BLK 1 "memory_operand" "Utv")] >> >> > UNSPEC_LD3_DUP))] >> >> > "TARGET_SIMD" >> >> > "ld3r\\t{%S0. - %U0.}, %1" >> >> > @@ -6888,7 +6888,7 @@ (define_insn >> "aarch64_simd_ld3r" >> >> > (define_insn "aarch64_vec_load_lanes_lane" >> >> > [(set (match_operand:VSTRUCT_3QD 0 "register_operand" "=3Dw") >> >> > (unspec:VSTRUCT_3QD [ >> >> > - (match_operand:BLK 1 "aarch64_simd_struct_operand" >> >> "Utv") >> >> > + (match_operand:BLK 1 "memory_operand" "Utv") >> >> > (match_operand:VSTRUCT_3QD 2 "register_operand" "0") >> >> > (match_operand:SI 3 "immediate_operand" "i")] >> >> > UNSPEC_LD3_LANE))] >> >> > @@ -6904,7 +6904,7 @@ (define_insn >> >> "aarch64_vec_load_lanes_lane" >> >> > (define_expand "vec_load_lanes" >> >> > [(set (match_operand:VSTRUCT_3Q 0 "register_operand") >> >> > (unspec:VSTRUCT_3Q [ >> >> > - (match_operand:VSTRUCT_3Q 1 >> >> "aarch64_simd_struct_operand")] >> >> > + (match_operand:VSTRUCT_3Q 1 "memory_operand")] >> >> > UNSPEC_LD3))] >> >> > "TARGET_SIMD" >> >> > { >> >> > @@ -6922,7 +6922,7 @@ (define_expand >> >> "vec_load_lanes" >> >> > }) >> >> > >> >> > (define_insn "aarch64_simd_st3" >> >> > - [(set (match_operand:VSTRUCT_3Q 0 >> "aarch64_simd_struct_operand" >> >> > "=3DUtv") >> >> > + [(set (match_operand:VSTRUCT_3Q 0 "memory_operand" "=3DUtv") >> >> > (unspec:VSTRUCT_3Q [(match_operand:VSTRUCT_3Q 1 >> >> "register_operand" "w")] >> >> > UNSPEC_ST3))] >> >> > "TARGET_SIMD" >> >> > @@ -6932,7 +6932,7 @@ (define_insn "aarch64_simd_st3" >> >> > >> >> > ;; RTL uses GCC vector extension indices, so flip only for assembl= y. >> >> > (define_insn "aarch64_vec_store_lanes_lane" >> >> > - [(set (match_operand:BLK 0 "aarch64_simd_struct_operand" "=3DUtv= ") >> >> > + [(set (match_operand:BLK 0 "memory_operand" "=3DUtv") >> >> > (unspec:BLK [(match_operand:VSTRUCT_3QD 1 "register_operand" >> >> "w") >> >> > (match_operand:SI 2 "immediate_operand" "i")] >> >> > UNSPEC_ST3_LANE))] >> >> > @@ -6946,7 +6946,7 @@ (define_insn >> >> "aarch64_vec_store_lanes_lane" >> >> > ) >> >> > >> >> > (define_expand "vec_store_lanes" >> >> > - [(set (match_operand:VSTRUCT_3Q 0 >> "aarch64_simd_struct_operand") >> >> > + [(set (match_operand:VSTRUCT_3Q 0 "memory_operand") >> >> > (unspec:VSTRUCT_3Q [ >> >> > (match_operand:VSTRUCT_3Q 1 "register_operand")] >> >> > UNSPEC_ST3))] >> >> > @@ -6968,7 +6968,7 @@ (define_expand >> >> "vec_store_lanes" >> >> > (define_insn "aarch64_simd_ld4" >> >> > [(set (match_operand:VSTRUCT_4Q 0 "register_operand" "=3Dw") >> >> > (unspec:VSTRUCT_4Q [ >> >> > - (match_operand:VSTRUCT_4Q 1 "aarch64_simd_struct_operand" >> >> "Utv")] >> >> > + (match_operand:VSTRUCT_4Q 1 "memory_operand" "Utv")] >> >> > UNSPEC_LD4))] >> >> > "TARGET_SIMD" >> >> > "ld4\\t{%S0. - %V0.}, %1" >> >> > @@ -6978,7 +6978,7 @@ (define_insn "aarch64_simd_ld4" >> >> > (define_insn "aarch64_simd_ld4r" >> >> > [(set (match_operand:VSTRUCT_4QD 0 "register_operand" "=3Dw") >> >> > (unspec:VSTRUCT_4QD [ >> >> > - (match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")] >> >> > + (match_operand:BLK 1 "memory_operand" "Utv")] >> >> > UNSPEC_LD4_DUP))] >> >> > "TARGET_SIMD" >> >> > "ld4r\\t{%S0. - %V0.}, %1" >> >> > @@ -6988,7 +6988,7 @@ (define_insn >> "aarch64_simd_ld4r" >> >> > (define_insn "aarch64_vec_load_lanes_lane" >> >> > [(set (match_operand:VSTRUCT_4QD 0 "register_operand" "=3Dw") >> >> > (unspec:VSTRUCT_4QD [ >> >> > - (match_operand:BLK 1 "aarch64_simd_struct_operand" >> >> "Utv") >> >> > + (match_operand:BLK 1 "memory_operand" "Utv") >> >> > (match_operand:VSTRUCT_4QD 2 "register_operand" "0") >> >> > (match_operand:SI 3 "immediate_operand" "i")] >> >> > UNSPEC_LD4_LANE))] >> >> > @@ -7004,7 +7004,7 @@ (define_insn >> >> "aarch64_vec_load_lanes_lane" >> >> > (define_expand "vec_load_lanes" >> >> > [(set (match_operand:VSTRUCT_4Q 0 "register_operand") >> >> > (unspec:VSTRUCT_4Q [ >> >> > - (match_operand:VSTRUCT_4Q 1 >> >> "aarch64_simd_struct_operand")] >> >> > + (match_operand:VSTRUCT_4Q 1 "memory_operand")] >> >> > UNSPEC_LD4))] >> >> > "TARGET_SIMD" >> >> > { >> >> > @@ -7022,7 +7022,7 @@ (define_expand >> >> "vec_load_lanes" >> >> > }) >> >> > >> >> > (define_insn "aarch64_simd_st4" >> >> > - [(set (match_operand:VSTRUCT_4Q 0 >> "aarch64_simd_struct_operand" >> >> > "=3DUtv") >> >> > + [(set (match_operand:VSTRUCT_4Q 0 "memory_operand" "=3DUtv") >> >> > (unspec:VSTRUCT_4Q [ >> >> > (match_operand:VSTRUCT_4Q 1 "register_operand" "w")] >> >> > UNSPEC_ST4))] >> >> > @@ -7033,7 +7033,7 @@ (define_insn "aarch64_simd_st4" >> >> > >> >> > ;; RTL uses GCC vector extension indices, so flip only for assembl= y. >> >> > (define_insn "aarch64_vec_store_lanes_lane" >> >> > - [(set (match_operand:BLK 0 "aarch64_simd_struct_operand" "=3DUtv= ") >> >> > + [(set (match_operand:BLK 0 "memory_operand" "=3DUtv") >> >> > (unspec:BLK [(match_operand:VSTRUCT_4QD 1 "register_operand" >> >> "w") >> >> > (match_operand:SI 2 "immediate_operand" "i")] >> >> > UNSPEC_ST4_LANE))] >> >> > @@ -7047,7 +7047,7 @@ (define_insn >> >> "aarch64_vec_store_lanes_lane" >> >> > ) >> >> > >> >> > (define_expand "vec_store_lanes" >> >> > - [(set (match_operand:VSTRUCT_4Q 0 >> "aarch64_simd_struct_operand") >> >> > + [(set (match_operand:VSTRUCT_4Q 0 "memory_operand") >> >> > (unspec:VSTRUCT_4Q [(match_operand:VSTRUCT_4Q 1 >> >> "register_operand")] >> >> > UNSPEC_ST4))] >> >> > "TARGET_SIMD" >> >> > @@ -7138,7 +7138,7 @@ (define_expand "aarch64_ld1x3" >> >> > (define_insn "aarch64_ld1_x3_" >> >> > [(set (match_operand:VSTRUCT_3QD 0 "register_operand" "=3Dw") >> >> > (unspec:VSTRUCT_3QD >> >> > - [(match_operand:VSTRUCT_3QD 1 "aarch64_simd_struct_operand" >> >> "Utv")] >> >> > + [(match_operand:VSTRUCT_3QD 1 "memory_operand" "Utv")] >> >> > UNSPEC_LD1))] >> >> > "TARGET_SIMD" >> >> > "ld1\\t{%S0. - %U0.}, %1" >> >> > @@ -7158,7 +7158,7 @@ (define_expand "aarch64_ld1x4" >> >> > (define_insn "aarch64_ld1_x4_" >> >> > [(set (match_operand:VSTRUCT_4QD 0 "register_operand" "=3Dw") >> >> > (unspec:VSTRUCT_4QD >> >> > - [(match_operand:VSTRUCT_4QD 1 "aarch64_simd_struct_operand" >> >> "Utv")] >> >> > + [(match_operand:VSTRUCT_4QD 1 "memory_operand" "Utv")] >> >> > UNSPEC_LD1))] >> >> > "TARGET_SIMD" >> >> > "ld1\\t{%S0. - %V0.}, %1" >> >> > @@ -7176,7 +7176,7 @@ (define_expand "aarch64_st1x2" >> >> > }) >> >> > >> >> > (define_insn "aarch64_st1_x2_" >> >> > - [(set (match_operand:VSTRUCT_2QD 0 >> "aarch64_simd_struct_operand" >> >> > "=3DUtv") >> >> > + [(set (match_operand:VSTRUCT_2QD 0 "memory_operand" "=3DUtv") >> >> > (unspec:VSTRUCT_2QD >> >> > [(match_operand:VSTRUCT_2QD 1 "register_operand" "w")] >> >> > UNSPEC_ST1))] >> >> > @@ -7196,7 +7196,7 @@ (define_expand "aarch64_st1x3" >> >> > }) >> >> > >> >> > (define_insn "aarch64_st1_x3_" >> >> > - [(set (match_operand:VSTRUCT_3QD 0 >> "aarch64_simd_struct_operand" >> >> > "=3DUtv") >> >> > + [(set (match_operand:VSTRUCT_3QD 0 "memory_operand" "=3DUtv") >> >> > (unspec:VSTRUCT_3QD >> >> > [(match_operand:VSTRUCT_3QD 1 "register_operand" "w")] >> >> > UNSPEC_ST1))] >> >> > @@ -7216,7 +7216,7 @@ (define_expand "aarch64_st1x4" >> >> > }) >> >> > >> >> > (define_insn "aarch64_st1_x4_" >> >> > - [(set (match_operand:VSTRUCT_4QD 0 >> "aarch64_simd_struct_operand" >> >> > "=3DUtv") >> >> > + [(set (match_operand:VSTRUCT_4QD 0 "memory_operand" "=3DUtv") >> >> > (unspec:VSTRUCT_4QD >> >> > [(match_operand:VSTRUCT_4QD 1 "register_operand" "w")] >> >> > UNSPEC_ST1))] >> >> > @@ -7268,7 +7268,7 @@ (define_insn "*aarch64_movv8di" >> >> > (define_insn "aarch64_be_ld1" >> >> > [(set (match_operand:VALLDI_F16 0 "register_operand" "=3Dw") >> >> > (unspec:VALLDI_F16 [(match_operand:VALLDI_F16 1 >> >> > - "aarch64_simd_struct_operand" "Utv")] >> >> > + "memory_operand" "Utv")] >> >> > UNSPEC_LD1))] >> >> > "TARGET_SIMD" >> >> > "ld1\\t{%0}, %1" >> >> > @@ -7276,7 +7276,7 @@ (define_insn "aarch64_be_ld1" >> >> > ) >> >> > >> >> > (define_insn "aarch64_be_st1" >> >> > - [(set (match_operand:VALLDI_F16 0 "aarch64_simd_struct_operand" >> >> > "=3DUtv") >> >> > + [(set (match_operand:VALLDI_F16 0 "memory_operand" "=3DUtv") >> >> > (unspec:VALLDI_F16 [(match_operand:VALLDI_F16 1 >> >> "register_operand" "w")] >> >> > UNSPEC_ST1))] >> >> > "TARGET_SIMD" >> >> > @@ -7551,7 +7551,7 @@ (define_expand >> >> "aarch64_ldr" >> >> > (define_insn "aarch64_ld2_dreg" >> >> > [(set (match_operand:VSTRUCT_2DNX 0 "register_operand" "=3Dw") >> >> > (unspec:VSTRUCT_2DNX [ >> >> > - (match_operand:VSTRUCT_2DNX 1 >> >> "aarch64_simd_struct_operand" "Utv")] >> >> > + (match_operand:VSTRUCT_2DNX 1 "memory_operand" "Utv")] >> >> > UNSPEC_LD2_DREG))] >> >> > "TARGET_SIMD" >> >> > "ld2\\t{%S0. - %T0.}, %1" >> >> > @@ -7561,7 +7561,7 @@ (define_insn "aarch64_ld2_dreg" >> >> > (define_insn "aarch64_ld2_dreg" >> >> > [(set (match_operand:VSTRUCT_2DX 0 "register_operand" "=3Dw") >> >> > (unspec:VSTRUCT_2DX [ >> >> > - (match_operand:VSTRUCT_2DX 1 "aarch64_simd_struct_operand" >> >> "Utv")] >> >> > + (match_operand:VSTRUCT_2DX 1 "memory_operand" "Utv")] >> >> > UNSPEC_LD2_DREG))] >> >> > "TARGET_SIMD" >> >> > "ld1\\t{%S0.1d - %T0.1d}, %1" >> >> > @@ -7571,7 +7571,7 @@ (define_insn "aarch64_ld2_dreg" >> >> > (define_insn "aarch64_ld3_dreg" >> >> > [(set (match_operand:VSTRUCT_3DNX 0 "register_operand" "=3Dw") >> >> > (unspec:VSTRUCT_3DNX [ >> >> > - (match_operand:VSTRUCT_3DNX 1 >> >> "aarch64_simd_struct_operand" "Utv")] >> >> > + (match_operand:VSTRUCT_3DNX 1 "memory_operand" "Utv")] >> >> > UNSPEC_LD3_DREG))] >> >> > "TARGET_SIMD" >> >> > "ld3\\t{%S0. - %U0.}, %1" >> >> > @@ -7581,7 +7581,7 @@ (define_insn "aarch64_ld3_dreg" >> >> > (define_insn "aarch64_ld3_dreg" >> >> > [(set (match_operand:VSTRUCT_3DX 0 "register_operand" "=3Dw") >> >> > (unspec:VSTRUCT_3DX [ >> >> > - (match_operand:VSTRUCT_3DX 1 "aarch64_simd_struct_operand" >> >> "Utv")] >> >> > + (match_operand:VSTRUCT_3DX 1 "memory_operand" "Utv")] >> >> > UNSPEC_LD3_DREG))] >> >> > "TARGET_SIMD" >> >> > "ld1\\t{%S0.1d - %U0.1d}, %1" >> >> > @@ -7591,7 +7591,7 @@ (define_insn "aarch64_ld3_dreg" >> >> > (define_insn "aarch64_ld4_dreg" >> >> > [(set (match_operand:VSTRUCT_4DNX 0 "register_operand" "=3Dw") >> >> > (unspec:VSTRUCT_4DNX [ >> >> > - (match_operand:VSTRUCT_4DNX 1 >> >> "aarch64_simd_struct_operand" "Utv")] >> >> > + (match_operand:VSTRUCT_4DNX 1 "memory_operand" "Utv")] >> >> > UNSPEC_LD4_DREG))] >> >> > "TARGET_SIMD" >> >> > "ld4\\t{%S0. - %V0.}, %1" >> >> > @@ -7601,7 +7601,7 @@ (define_insn "aarch64_ld4_dreg" >> >> > (define_insn "aarch64_ld4_dreg" >> >> > [(set (match_operand:VSTRUCT_4DX 0 "register_operand" "=3Dw") >> >> > (unspec:VSTRUCT_4DX [ >> >> > - (match_operand:VSTRUCT_4DX 1 "aarch64_simd_struct_operand" >> >> "Utv")] >> >> > + (match_operand:VSTRUCT_4DX 1 "memory_operand" "Utv")] >> >> > UNSPEC_LD4_DREG))] >> >> > "TARGET_SIMD" >> >> > "ld1\\t{%S0.1d - %V0.1d}, %1" >> >> > @@ -7841,7 +7841,7 @@ (define_insn >> >> "aarch64_rev" >> >> > ) >> >> > >> >> > (define_insn "aarch64_st2_dreg" >> >> > - [(set (match_operand:VSTRUCT_2DNX 0 >> >> "aarch64_simd_struct_operand" >> >> > "=3DUtv") >> >> > + [(set (match_operand:VSTRUCT_2DNX 0 "memory_operand" "=3DUtv") >> >> > (unspec:VSTRUCT_2DNX [ >> >> > (match_operand:VSTRUCT_2DNX 1 "register_operand" "w")] >> >> > UNSPEC_ST2))] >> >> > @@ -7851,7 +7851,7 @@ (define_insn "aarch64_st2_dreg" >> >> > ) >> >> > >> >> > (define_insn "aarch64_st2_dreg" >> >> > - [(set (match_operand:VSTRUCT_2DX 0 >> "aarch64_simd_struct_operand" >> >> > "=3DUtv") >> >> > + [(set (match_operand:VSTRUCT_2DX 0 "memory_operand" "=3DUtv") >> >> > (unspec:VSTRUCT_2DX [ >> >> > (match_operand:VSTRUCT_2DX 1 "register_operand" "w")] >> >> > UNSPEC_ST2))] >> >> > @@ -7861,7 +7861,7 @@ (define_insn "aarch64_st2_dreg" >> >> > ) >> >> > >> >> > (define_insn "aarch64_st3_dreg" >> >> > - [(set (match_operand:VSTRUCT_3DNX 0 >> >> "aarch64_simd_struct_operand" >> >> > "=3DUtv") >> >> > + [(set (match_operand:VSTRUCT_3DNX 0 "memory_operand" "=3DUtv") >> >> > (unspec:VSTRUCT_3DNX [ >> >> > (match_operand:VSTRUCT_3DNX 1 "register_operand" "w")] >> >> > UNSPEC_ST3))] >> >> > @@ -7871,7 +7871,7 @@ (define_insn "aarch64_st3_dreg" >> >> > ) >> >> > >> >> > (define_insn "aarch64_st3_dreg" >> >> > - [(set (match_operand:VSTRUCT_3DX 0 >> "aarch64_simd_struct_operand" >> >> > "=3DUtv") >> >> > + [(set (match_operand:VSTRUCT_3DX 0 "memory_operand" "=3DUtv") >> >> > (unspec:VSTRUCT_3DX [ >> >> > (match_operand:VSTRUCT_3DX 1 "register_operand" "w")] >> >> > UNSPEC_ST3))] >> >> > @@ -7881,7 +7881,7 @@ (define_insn "aarch64_st3_dreg" >> >> > ) >> >> > >> >> > (define_insn "aarch64_st4_dreg" >> >> > - [(set (match_operand:VSTRUCT_4DNX 0 >> >> "aarch64_simd_struct_operand" >> >> > "=3DUtv") >> >> > + [(set (match_operand:VSTRUCT_4DNX 0 "memory_operand" "=3DUtv") >> >> > (unspec:VSTRUCT_4DNX [ >> >> > (match_operand:VSTRUCT_4DNX 1 "register_operand" "w")] >> >> > UNSPEC_ST4))] >> >> > @@ -7891,7 +7891,7 @@ (define_insn "aarch64_st4_dreg" >> >> > ) >> >> > >> >> > (define_insn "aarch64_st4_dreg" >> >> > - [(set (match_operand:VSTRUCT_4DX 0 >> "aarch64_simd_struct_operand" >> >> > "=3DUtv") >> >> > + [(set (match_operand:VSTRUCT_4DX 0 "memory_operand" "=3DUtv") >> >> > (unspec:VSTRUCT_4DX [ >> >> > (match_operand:VSTRUCT_4DX 1 "register_operand" "w")] >> >> > UNSPEC_ST4))] >> >> > @@ -7974,7 +7974,7 @@ (define_expand "vec_init" >> >> > (define_insn "*aarch64_simd_ld1r" >> >> > [(set (match_operand:VALL_F16 0 "register_operand" "=3Dw") >> >> > (vec_duplicate:VALL_F16 >> >> > - (match_operand: 1 "aarch64_simd_struct_operand" "Utv")))] >> >> > + (match_operand: 1 "memory_operand" "Utv")))] >> >> > "TARGET_SIMD" >> >> > "ld1r\\t{%0.}, %1" >> >> > [(set_attr "type" "neon_load1_all_lanes")] @@ -7983,7 +7983,7 @@ >> >> > (define_insn "*aarch64_simd_ld1r" >> >> > (define_insn "aarch64_simd_ld1_x2" >> >> > [(set (match_operand:VSTRUCT_2QD 0 "register_operand" "=3Dw") >> >> > (unspec:VSTRUCT_2QD [ >> >> > - (match_operand:VSTRUCT_2QD 1 "aarch64_simd_struct_operand" >> >> "Utv")] >> >> > + (match_operand:VSTRUCT_2QD 1 "memory_operand" "Utv")] >> >> > UNSPEC_LD1))] >> >> > "TARGET_SIMD" >> >> > "ld1\\t{%S0. - %T0.}, %1" >> >> > diff --git a/gcc/config/aarch64/predicates.md >> >> > b/gcc/config/aarch64/predicates.md >> >> > index >> >> > >> >> >> c308015ac2c13d24cd6bcec71247ec45df8cf5e6..6b70a364530c8108457091bfec >> >> 12 >> >> > fe549f722149 100644 >> >> > --- a/gcc/config/aarch64/predicates.md >> >> > +++ b/gcc/config/aarch64/predicates.md >> >> > @@ -494,10 +494,6 @@ (define_predicate >> >> "aarch64_simd_reg_or_minus_one" >> >> > (ior (match_operand 0 "register_operand") >> >> > (match_operand 0 "aarch64_simd_imm_minus_one"))) >> >> > >> >> > -(define_predicate "aarch64_simd_struct_operand" >> >> > - (and (match_code "mem") >> >> > - (match_test "TARGET_SIMD && aarch64_simd_mem_operand_p >> >> (op)"))) >> >> > - >> >> > ;; Like general_operand but allow only valid SIMD addressing modes. >> >> > (define_predicate "aarch64_simd_general_operand" >> >> > (and (match_operand 0 "general_operand") diff --git >> >> > a/gcc/testsuite/gcc.target/aarch64/vld1r.c >> >> > b/gcc/testsuite/gcc.target/aarch64/vld1r.c >> >> > new file mode 100644 >> >> > index >> >> > >> >> >> 0000000000000000000000000000000000000000..72c505c403e9e239771379b7ca >> >> dd >> >> > 8a9473f06113 >> >> > --- /dev/null >> >> > +++ b/gcc/testsuite/gcc.target/aarch64/vld1r.c >> >> > @@ -0,0 +1,26 @@ >> >> > +/* { dg-do compile } */ >> >> > +/* { dg-additional-options "-O" } */ >> >> > +/* { dg-final { check-function-bodies "**" "" "" { target { le } } >> >> > +} } */ >> >> > + >> >> > +#include >> >> > + >> >> > +/* >> >> > +** f1: >> >> > +** add x0, x0, 1 >> >> > +** ld1r {v0.8b}, \[x0\] >> >> > +** ret >> >> > +*/ >> >> > +uint8x8_t f1(const uint8_t *in) { >> >> > + return vld1_dup_u8(&in[1]); >> >> > +} >> >> > + >> >> > +/* >> >> > +** f2: >> >> > +** ldr s1, \[x0, 4\] >> >> > +** fmla v0.4s, v0.4s, v1.s\[0\] >> >> > +** ret >> >> > +*/ >> >> > +float32x4_t f2(const float32_t *in, float32x4_t a) { >> >> > + float32x4_t dup =3D vld1q_dup_f32(&in[1]); >> >> > + return vfmaq_laneq_f32 (a, a, dup, 1); }