Re: [PATCH]AArch64 relax predicate on load structure load instructions

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Richard Sandiford <richard.sandiford@arm.com>
To: Tamar Christina <tamar.christina@arm.com>
Cc: gcc-patches@gcc.gnu.org, nd@arm.com, Richard.Earnshaw@arm.com,
	Marcus.Shawcroft@arm.com, Kyrylo.Tkachov@arm.com
Subject: Re: [PATCH]AArch64 relax predicate on load structure load instructions
Date: Wed, 08 Jun 2022 11:31:22 +0100	[thread overview]
Message-ID: <mptbkv3igzp.fsf@arm.com> (raw)
In-Reply-To: <patch-15781-tamar@arm.com> (Tamar Christina's message of "Wed, 8 Jun 2022 10:38:20 +0100")

Tamar Christina <tamar.christina@arm.com> writes:
> Hi All,
>
> At some point in time we started lowering the ld1r instructions in gimple.
>
> That is:
>
> uint8x8_t f1(const uint8_t *in) {
>     return vld1_dup_u8(&in[1]);
> }
>
> generates at gimple:
>
>   _3 = MEM[(const uint8_t *)in_1(D) + 1B];
>   _4 = {_3, _3, _3, _3, _3, _3, _3, _3};
>
> Which is good, but we then generate:
>
> f1:
> 	ldr     b0, [x0, 1]
> 	dup     v0.8b, v0.b[0]
> 	ret
>
> instead of ld1r.
>
> The reason for this is because the load instructions have a too restrictive
> predicate on them which causes combine not to be able to combine the
> instructions due to the predicate only accepting simple addressing modes.
>
> This patch relaxes the predicate to accept any memory operand and relies on
> LRA to legitimize the address when it needs to as the constraint still only
> allows the simple addressing mode.  Reload is always able to legitimize to
> these.
>
> Secondly since we are now actually generating more ld1r it became clear that the
> lane instructions suffer from a similar issue.
>
> i.e.
>
> float32x4_t f2(const float32_t *in, float32x4_t a) {
>     float32x4_t dup = vld1q_dup_f32(&in[1]);
>     return vfmaq_laneq_f32 (a, a, dup, 1);
> }
>
> would generate ld1r + vector fmla instead of ldr + lane fmla.
>
> The reason for this is similar to the ld1r issue.  The predicate is too
> restrictive in only acception register operands but not memory.
>
> This relaxes it to accept register and/or memory while leaving the constraint
> to only accept registers.  This will have LRA generate a reload if needed
> forcing the memory to registers using the standard patterns.
>
> These two changes allow combine and reload to generate the right sequences.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

This is going against the general direction of travel, which is to make
the instruction's predicates and conditions enforce the constraints as
much as possible (making optimistic assumptions about pseudo registers).

The RA *can* deal with things like:

  (match_operand:M N "general_operand" "r")

but it's best avoided, for a few reasons:

(1) The fix-up will be done in LRA, so IRA will not see the temporary
    registers.  This can make the allocation of those temporaries
    suboptimal but (more importantly) it might require other
    previously-allocated registers to be spilled late due to the
    unexpected increase in register pressure.

(2) It ends up hiding instructions from the pre-RA optimisers.

(3) It can also prevent combine opportunities (as well as create them),
    unless the loose predicates in an insn I are propagated to all
    patterns that might result from combining I with something else.

It sounds like the first problem (not generating ld1r) could be fixed
by (a) combining aarch64_simd_dup<mode> and *aarch64_simd_ld1r<mode>,
so that the register and memory alternatives are in the same pattern
and (b) using the merged instruction(s) to implement the vec_duplicate
optab.  Target-independent code should then make the address satisfy
the predicate, simplifying the address where necessary.

I'm not sure whether fixing the ld1r problem that way will avoid the
vfmaq_laneq_f32 problem; let me know if not.

Thanks,
Richard

> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> 	* config/aarch64/aarch64-simd.md (mul_lane<mode>3, mul_laneq<mode>3,
> 	mul_n<mode>3, *aarch64_mul3_elt_to_64v2df, *aarch64_mla_elt<mode>,
> 	*aarch64_mla_elt_<vswap_width_name><mode>, aarch64_mla_n<mode>,
> 	*aarch64_mls_elt<mode>, *aarch64_mls_elt_<vswap_width_name><mode>,
> 	aarch64_mls_n<mode>, *aarch64_fma4_elt<mode>,
> 	*aarch64_fma4_elt_<vswap_width_name><mode>,
> 	*aarch64_fma4_elt_from_dup<mode>, *aarch64_fma4_elt_to_64v2df,
> 	*aarch64_fnma4_elt<mode>, *aarch64_fnma4_elt_<vswap_width_name><mode>,
> 	*aarch64_fnma4_elt_from_dup<mode>, *aarch64_fnma4_elt_to_64v2df,
> 	*aarch64_mulx_elt_<vswap_width_name><mode>,
> 	*aarch64_mulx_elt<mode>, *aarch64_mulx_elt_from_dup<mode>,
> 	*aarch64_vgetfmulx<mode>): Relax register_operand to
> 	nonimmediate_operand.
> 	(aarch64_simd_ld2<vstruct_elt>, aarch64_simd_ld2r<vstruct_elt>,
> 	aarch64_vec_load_lanes<mode>_lane<vstruct_elt>,
> 	vec_load_lanes<mode><vstruct_elt>, aarch64_simd_st2<vstruct_elt>,
> 	aarch64_vec_store_lanes<mode>_lane<vstruct_elt>,
> 	vec_store_lanes<mode><vstruct_elt>, aarch64_simd_ld3<vstruct_elt>,
> 	aarch64_simd_ld3r<vstruct_elt>,
> 	aarch64_vec_load_lanes<mode>_lane<vstruct_elt>,
> 	vec_load_lanes<mode><vstruct_elt>, aarch64_simd_st3<vstruct_elt>,
> 	aarch64_vec_store_lanes<mode>_lane<vstruct_elt>,
> 	vec_store_lanes<mode><vstruct_elt>, aarch64_simd_ld4<vstruct_elt>,
> 	aarch64_simd_ld4r<vstruct_elt>,
> 	aarch64_vec_load_lanes<mode>_lane<vstruct_elt>,
> 	vec_load_lanes<mode><vstruct_elt>, aarch64_simd_st4<vstruct_elt>,
> 	aarch64_vec_store_lanes<mode>_lane<vstruct_elt>,
> 	vec_store_lanes<mode><vstruct_elt>, aarch64_ld1_x3_<vstruct_elt>,
> 	aarch64_ld1_x4_<vstruct_elt>, aarch64_st1_x2_<vstruct_elt>,
> 	aarch64_st1_x3_<vstruct_elt>, aarch64_st1_x4_<vstruct_elt>,
> 	aarch64_be_ld1<mode>, aarch64_be_st1<mode>,
> 	aarch64_ld2<vstruct_elt>_dreg, aarch64_ld2<vstruct_elt>_dreg,
> 	aarch64_ld3<vstruct_elt>_dreg, aarch64_ld3<vstruct_elt>_dreg,
> 	aarch64_ld4<vstruct_elt>_dreg, aarch64_ld4<vstruct_elt>_dreg,
> 	aarch64_st2<vstruct_elt>_dreg, aarch64_st2<vstruct_elt>_dreg,
> 	aarch64_st3<vstruct_elt>_dreg, aarch64_st3<vstruct_elt>_dreg,
> 	aarch64_st4<vstruct_elt>_dreg, aarch64_st4<vstruct_elt>_dreg,
> 	*aarch64_simd_ld1r<mode>, aarch64_simd_ld1<vstruct_elt>_x2): Relax
> 	aarch64_simd_struct_operand to memory_operand.
> 	* config/aarch64/predicates.md (aarch64_simd_struct_operand): Remove.
>
> gcc/testsuite/ChangeLog:
>
> 	* gcc.target/aarch64/vld1r.c: New test.
>
> --- inline copy of patch -- 
> diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
> index be5c70bbb7520ae93d19c4a432ce34863e5b9a64..24e3274ddda2ea76c83571fada8ff4c953b752a1 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -712,7 +712,7 @@ (define_insn "mul_lane<mode>3"
>         (mult:VMULD
>  	 (vec_duplicate:VMULD
>  	   (vec_select:<VEL>
> -	     (match_operand:<VCOND> 2 "register_operand" "<h_con>")
> +	     (match_operand:<VCOND> 2 "nonimmediate_operand" "<h_con>")
>  	     (parallel [(match_operand:SI 3 "immediate_operand" "i")])))
>  	 (match_operand:VMULD 1 "register_operand" "w")))]
>    "TARGET_SIMD"
> @@ -728,7 +728,7 @@ (define_insn "mul_laneq<mode>3"
>       (mult:VMUL
>         (vec_duplicate:VMUL
>  	  (vec_select:<VEL>
> -	    (match_operand:<VCONQ> 2 "register_operand" "<h_con>")
> +	    (match_operand:<VCONQ> 2 "nonimmediate_operand" "<h_con>")
>  	    (parallel [(match_operand:SI 3 "immediate_operand")])))
>        (match_operand:VMUL 1 "register_operand" "w")))]
>    "TARGET_SIMD"
> @@ -743,7 +743,7 @@ (define_insn "mul_n<mode>3"
>   [(set (match_operand:VMUL 0 "register_operand" "=w")
>         (mult:VMUL
>  	 (vec_duplicate:VMUL
> -	   (match_operand:<VEL> 2 "register_operand" "<h_con>"))
> +	   (match_operand:<VEL> 2 "nonimmediate_operand" "<h_con>"))
>  	 (match_operand:VMUL 1 "register_operand" "w")))]
>    "TARGET_SIMD"
>    "<f>mul\t%0.<Vtype>, %1.<Vtype>, %2.<Vetype>[0]";
> @@ -789,7 +789,7 @@ (define_insn "*aarch64_mul3_elt_to_64v2df"
>    [(set (match_operand:DF 0 "register_operand" "=w")
>       (mult:DF
>         (vec_select:DF
> -	 (match_operand:V2DF 1 "register_operand" "w")
> +	 (match_operand:V2DF 1 "nonimmediate_operand" "w")
>  	 (parallel [(match_operand:SI 2 "immediate_operand")]))
>         (match_operand:DF 3 "register_operand" "w")))]
>    "TARGET_SIMD"
> @@ -1406,7 +1406,7 @@ (define_insn "*aarch64_mla_elt<mode>"
>  	 (mult:VDQHS
>  	   (vec_duplicate:VDQHS
>  	      (vec_select:<VEL>
> -		(match_operand:VDQHS 1 "register_operand" "<h_con>")
> +		(match_operand:VDQHS 1 "nonimmediate_operand" "<h_con>")
>  		  (parallel [(match_operand:SI 2 "immediate_operand")])))
>  	   (match_operand:VDQHS 3 "register_operand" "w"))
>  	 (match_operand:VDQHS 4 "register_operand" "0")))]
> @@ -1424,7 +1424,7 @@ (define_insn "*aarch64_mla_elt_<vswap_width_name><mode>"
>  	 (mult:VDQHS
>  	   (vec_duplicate:VDQHS
>  	      (vec_select:<VEL>
> -		(match_operand:<VSWAP_WIDTH> 1 "register_operand" "<h_con>")
> +		(match_operand:<VSWAP_WIDTH> 1 "nonimmediate_operand" "<h_con>")
>  		  (parallel [(match_operand:SI 2 "immediate_operand")])))
>  	   (match_operand:VDQHS 3 "register_operand" "w"))
>  	 (match_operand:VDQHS 4 "register_operand" "0")))]
> @@ -1441,7 +1441,7 @@ (define_insn "aarch64_mla_n<mode>"
>  	(plus:VDQHS
>  	  (mult:VDQHS
>  	    (vec_duplicate:VDQHS
> -	      (match_operand:<VEL> 3 "register_operand" "<h_con>"))
> +	      (match_operand:<VEL> 3 "nonimmediate_operand" "<h_con>"))
>  	    (match_operand:VDQHS 2 "register_operand" "w"))
>  	  (match_operand:VDQHS 1 "register_operand" "0")))]
>   "TARGET_SIMD"
> @@ -1466,7 +1466,7 @@ (define_insn "*aarch64_mls_elt<mode>"
>  	 (mult:VDQHS
>  	   (vec_duplicate:VDQHS
>  	      (vec_select:<VEL>
> -		(match_operand:VDQHS 1 "register_operand" "<h_con>")
> +		(match_operand:VDQHS 1 "nonimmediate_operand" "<h_con>")
>  		  (parallel [(match_operand:SI 2 "immediate_operand")])))
>  	   (match_operand:VDQHS 3 "register_operand" "w"))))]
>   "TARGET_SIMD"
> @@ -1484,7 +1484,7 @@ (define_insn "*aarch64_mls_elt_<vswap_width_name><mode>"
>  	 (mult:VDQHS
>  	   (vec_duplicate:VDQHS
>  	      (vec_select:<VEL>
> -		(match_operand:<VSWAP_WIDTH> 1 "register_operand" "<h_con>")
> +		(match_operand:<VSWAP_WIDTH> 1 "nonimmediate_operand" "<h_con>")
>  		  (parallel [(match_operand:SI 2 "immediate_operand")])))
>  	   (match_operand:VDQHS 3 "register_operand" "w"))))]
>   "TARGET_SIMD"
> @@ -1501,7 +1501,7 @@ (define_insn "aarch64_mls_n<mode>"
>  	  (match_operand:VDQHS 1 "register_operand" "0")
>  	  (mult:VDQHS
>  	    (vec_duplicate:VDQHS
> -	      (match_operand:<VEL> 3 "register_operand" "<h_con>"))
> +	      (match_operand:<VEL> 3 "nonimmediate_operand" "<h_con>"))
>  	    (match_operand:VDQHS 2 "register_operand" "w"))))]
>    "TARGET_SIMD"
>    "mls\t%0.<Vtype>, %2.<Vtype>, %3.<Vetype>[0]"
> @@ -2882,7 +2882,7 @@ (define_insn "*aarch64_fma4_elt<mode>"
>      (fma:VDQF
>        (vec_duplicate:VDQF
>  	(vec_select:<VEL>
> -	  (match_operand:VDQF 1 "register_operand" "<h_con>")
> +	  (match_operand:VDQF 1 "nonimmediate_operand" "<h_con>")
>  	  (parallel [(match_operand:SI 2 "immediate_operand")])))
>        (match_operand:VDQF 3 "register_operand" "w")
>        (match_operand:VDQF 4 "register_operand" "0")))]
> @@ -2899,7 +2899,7 @@ (define_insn "*aarch64_fma4_elt_<vswap_width_name><mode>"
>      (fma:VDQSF
>        (vec_duplicate:VDQSF
>  	(vec_select:<VEL>
> -	  (match_operand:<VSWAP_WIDTH> 1 "register_operand" "<h_con>")
> +	  (match_operand:<VSWAP_WIDTH> 1 "nonimmediate_operand" "<h_con>")
>  	  (parallel [(match_operand:SI 2 "immediate_operand")])))
>        (match_operand:VDQSF 3 "register_operand" "w")
>        (match_operand:VDQSF 4 "register_operand" "0")))]
> @@ -2915,7 +2915,7 @@ (define_insn "*aarch64_fma4_elt_from_dup<mode>"
>    [(set (match_operand:VMUL 0 "register_operand" "=w")
>      (fma:VMUL
>        (vec_duplicate:VMUL
> -	  (match_operand:<VEL> 1 "register_operand" "<h_con>"))
> +	  (match_operand:<VEL> 1 "nonimmediate_operand" "<h_con>"))
>        (match_operand:VMUL 2 "register_operand" "w")
>        (match_operand:VMUL 3 "register_operand" "0")))]
>    "TARGET_SIMD"
> @@ -2927,7 +2927,7 @@ (define_insn "*aarch64_fma4_elt_to_64v2df"
>    [(set (match_operand:DF 0 "register_operand" "=w")
>      (fma:DF
>  	(vec_select:DF
> -	  (match_operand:V2DF 1 "register_operand" "w")
> +	  (match_operand:V2DF 1 "nonimmediate_operand" "w")
>  	  (parallel [(match_operand:SI 2 "immediate_operand")]))
>        (match_operand:DF 3 "register_operand" "w")
>        (match_operand:DF 4 "register_operand" "0")))]
> @@ -2957,7 +2957,7 @@ (define_insn "*aarch64_fnma4_elt<mode>"
>          (match_operand:VDQF 3 "register_operand" "w"))
>        (vec_duplicate:VDQF
>  	(vec_select:<VEL>
> -	  (match_operand:VDQF 1 "register_operand" "<h_con>")
> +	  (match_operand:VDQF 1 "nonimmediate_operand" "<h_con>")
>  	  (parallel [(match_operand:SI 2 "immediate_operand")])))
>        (match_operand:VDQF 4 "register_operand" "0")))]
>    "TARGET_SIMD"
> @@ -2975,7 +2975,7 @@ (define_insn "*aarch64_fnma4_elt_<vswap_width_name><mode>"
>          (match_operand:VDQSF 3 "register_operand" "w"))
>        (vec_duplicate:VDQSF
>  	(vec_select:<VEL>
> -	  (match_operand:<VSWAP_WIDTH> 1 "register_operand" "<h_con>")
> +	  (match_operand:<VSWAP_WIDTH> 1 "nonimmediate_operand" "<h_con>")
>  	  (parallel [(match_operand:SI 2 "immediate_operand")])))
>        (match_operand:VDQSF 4 "register_operand" "0")))]
>    "TARGET_SIMD"
> @@ -2992,7 +2992,7 @@ (define_insn "*aarch64_fnma4_elt_from_dup<mode>"
>        (neg:VMUL
>          (match_operand:VMUL 2 "register_operand" "w"))
>        (vec_duplicate:VMUL
> -	(match_operand:<VEL> 1 "register_operand" "<h_con>"))
> +	(match_operand:<VEL> 1 "nonimmediate_operand" "<h_con>"))
>        (match_operand:VMUL 3 "register_operand" "0")))]
>    "TARGET_SIMD"
>    "fmls\t%0.<Vtype>, %2.<Vtype>, %1.<Vetype>[0]"
> @@ -3003,7 +3003,7 @@ (define_insn "*aarch64_fnma4_elt_to_64v2df"
>    [(set (match_operand:DF 0 "register_operand" "=w")
>      (fma:DF
>        (vec_select:DF
> -	(match_operand:V2DF 1 "register_operand" "w")
> +	(match_operand:V2DF 1 "nonimmediate_operand" "w")
>  	(parallel [(match_operand:SI 2 "immediate_operand")]))
>        (neg:DF
>          (match_operand:DF 3 "register_operand" "w"))
> @@ -4934,7 +4934,7 @@ (define_insn "*aarch64_mulx_elt_<vswap_width_name><mode>"
>  	 [(match_operand:VDQSF 1 "register_operand" "w")
>  	  (vec_duplicate:VDQSF
>  	   (vec_select:<VEL>
> -	    (match_operand:<VSWAP_WIDTH> 2 "register_operand" "w")
> +	    (match_operand:<VSWAP_WIDTH> 2 "nonimmediate_operand" "w")
>  	    (parallel [(match_operand:SI 3 "immediate_operand" "i")])))]
>  	 UNSPEC_FMULX))]
>    "TARGET_SIMD"
> @@ -4953,7 +4953,7 @@ (define_insn "*aarch64_mulx_elt<mode>"
>  	 [(match_operand:VDQF 1 "register_operand" "w")
>  	  (vec_duplicate:VDQF
>  	   (vec_select:<VEL>
> -	    (match_operand:VDQF 2 "register_operand" "w")
> +	    (match_operand:VDQF 2 "nonimmediate_operand" "w")
>  	    (parallel [(match_operand:SI 3 "immediate_operand" "i")])))]
>  	 UNSPEC_FMULX))]
>    "TARGET_SIMD"
> @@ -4971,7 +4971,7 @@ (define_insn "*aarch64_mulx_elt_from_dup<mode>"
>  	(unspec:VHSDF
>  	 [(match_operand:VHSDF 1 "register_operand" "w")
>  	  (vec_duplicate:VHSDF
> -	    (match_operand:<VEL> 2 "register_operand" "<h_con>"))]
> +	    (match_operand:<VEL> 2 "nonimmediate_operand" "<h_con>"))]
>  	 UNSPEC_FMULX))]
>    "TARGET_SIMD"
>    "fmulx\t%0.<Vtype>, %1.<Vtype>, %2.<Vetype>[0]";
> @@ -4987,7 +4987,7 @@ (define_insn "*aarch64_vgetfmulx<mode>"
>  	(unspec:<VEL>
>  	 [(match_operand:<VEL> 1 "register_operand" "w")
>  	  (vec_select:<VEL>
> -	   (match_operand:VDQF 2 "register_operand" "w")
> +	   (match_operand:VDQF 2 "nonimmediate_operand" "w")
>  	    (parallel [(match_operand:SI 3 "immediate_operand" "i")]))]
>  	 UNSPEC_FMULX))]
>    "TARGET_SIMD"
> @@ -6768,7 +6768,7 @@ (define_insn "*sqrt<mode>2"
>  (define_insn "aarch64_simd_ld2<vstruct_elt>"
>    [(set (match_operand:VSTRUCT_2Q 0 "register_operand" "=w")
>  	(unspec:VSTRUCT_2Q [
> -	  (match_operand:VSTRUCT_2Q 1 "aarch64_simd_struct_operand" "Utv")]
> +	  (match_operand:VSTRUCT_2Q 1 "memory_operand" "Utv")]
>  	  UNSPEC_LD2))]
>    "TARGET_SIMD"
>    "ld2\\t{%S0.<Vtype> - %T0.<Vtype>}, %1"
> @@ -6778,7 +6778,7 @@ (define_insn "aarch64_simd_ld2<vstruct_elt>"
>  (define_insn "aarch64_simd_ld2r<vstruct_elt>"
>    [(set (match_operand:VSTRUCT_2QD 0 "register_operand" "=w")
>  	(unspec:VSTRUCT_2QD [
> -	  (match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")]
> +	  (match_operand:BLK 1 "memory_operand" "Utv")]
>            UNSPEC_LD2_DUP))]
>    "TARGET_SIMD"
>    "ld2r\\t{%S0.<Vtype> - %T0.<Vtype>}, %1"
> @@ -6788,7 +6788,7 @@ (define_insn "aarch64_simd_ld2r<vstruct_elt>"
>  (define_insn "aarch64_vec_load_lanes<mode>_lane<vstruct_elt>"
>    [(set (match_operand:VSTRUCT_2QD 0 "register_operand" "=w")
>  	(unspec:VSTRUCT_2QD [
> -		(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")
> +		(match_operand:BLK 1 "memory_operand" "Utv")
>  		(match_operand:VSTRUCT_2QD 2 "register_operand" "0")
>  		(match_operand:SI 3 "immediate_operand" "i")]
>  		UNSPEC_LD2_LANE))]
> @@ -6804,7 +6804,7 @@ (define_insn "aarch64_vec_load_lanes<mode>_lane<vstruct_elt>"
>  (define_expand "vec_load_lanes<mode><vstruct_elt>"
>    [(set (match_operand:VSTRUCT_2Q 0 "register_operand")
>  	(unspec:VSTRUCT_2Q [
> -		(match_operand:VSTRUCT_2Q 1 "aarch64_simd_struct_operand")]
> +		(match_operand:VSTRUCT_2Q 1 "memory_operand")]
>  		UNSPEC_LD2))]
>    "TARGET_SIMD"
>  {
> @@ -6822,7 +6822,7 @@ (define_expand "vec_load_lanes<mode><vstruct_elt>"
>  })
>  
>  (define_insn "aarch64_simd_st2<vstruct_elt>"
> -  [(set (match_operand:VSTRUCT_2Q 0 "aarch64_simd_struct_operand" "=Utv")
> +  [(set (match_operand:VSTRUCT_2Q 0 "memory_operand" "=Utv")
>  	(unspec:VSTRUCT_2Q [
>  		(match_operand:VSTRUCT_2Q 1 "register_operand" "w")]
>                  UNSPEC_ST2))]
> @@ -6833,7 +6833,7 @@ (define_insn "aarch64_simd_st2<vstruct_elt>"
>  
>  ;; RTL uses GCC vector extension indices, so flip only for assembly.
>  (define_insn "aarch64_vec_store_lanes<mode>_lane<vstruct_elt>"
> -  [(set (match_operand:BLK 0 "aarch64_simd_struct_operand" "=Utv")
> +  [(set (match_operand:BLK 0 "memory_operand" "=Utv")
>  	(unspec:BLK [(match_operand:VSTRUCT_2QD 1 "register_operand" "w")
>  		     (match_operand:SI 2 "immediate_operand" "i")]
>  		     UNSPEC_ST2_LANE))]
> @@ -6847,7 +6847,7 @@ (define_insn "aarch64_vec_store_lanes<mode>_lane<vstruct_elt>"
>  )
>  
>  (define_expand "vec_store_lanes<mode><vstruct_elt>"
> -  [(set (match_operand:VSTRUCT_2Q 0 "aarch64_simd_struct_operand")
> +  [(set (match_operand:VSTRUCT_2Q 0 "memory_operand")
>  	(unspec:VSTRUCT_2Q [(match_operand:VSTRUCT_2Q 1 "register_operand")]
>                     UNSPEC_ST2))]
>    "TARGET_SIMD"
> @@ -6868,7 +6868,7 @@ (define_expand "vec_store_lanes<mode><vstruct_elt>"
>  (define_insn "aarch64_simd_ld3<vstruct_elt>"
>    [(set (match_operand:VSTRUCT_3Q 0 "register_operand" "=w")
>  	(unspec:VSTRUCT_3Q [
> -	  (match_operand:VSTRUCT_3Q 1 "aarch64_simd_struct_operand" "Utv")]
> +	  (match_operand:VSTRUCT_3Q 1 "memory_operand" "Utv")]
>  	  UNSPEC_LD3))]
>    "TARGET_SIMD"
>    "ld3\\t{%S0.<Vtype> - %U0.<Vtype>}, %1"
> @@ -6878,7 +6878,7 @@ (define_insn "aarch64_simd_ld3<vstruct_elt>"
>  (define_insn "aarch64_simd_ld3r<vstruct_elt>"
>    [(set (match_operand:VSTRUCT_3QD 0 "register_operand" "=w")
>  	(unspec:VSTRUCT_3QD [
> -	  (match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")]
> +	  (match_operand:BLK 1 "memory_operand" "Utv")]
>            UNSPEC_LD3_DUP))]
>    "TARGET_SIMD"
>    "ld3r\\t{%S0.<Vtype> - %U0.<Vtype>}, %1"
> @@ -6888,7 +6888,7 @@ (define_insn "aarch64_simd_ld3r<vstruct_elt>"
>  (define_insn "aarch64_vec_load_lanes<mode>_lane<vstruct_elt>"
>    [(set (match_operand:VSTRUCT_3QD 0 "register_operand" "=w")
>  	(unspec:VSTRUCT_3QD [
> -		(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")
> +		(match_operand:BLK 1 "memory_operand" "Utv")
>  		(match_operand:VSTRUCT_3QD 2 "register_operand" "0")
>  		(match_operand:SI 3 "immediate_operand" "i")]
>  		UNSPEC_LD3_LANE))]
> @@ -6904,7 +6904,7 @@ (define_insn "aarch64_vec_load_lanes<mode>_lane<vstruct_elt>"
>  (define_expand "vec_load_lanes<mode><vstruct_elt>"
>    [(set (match_operand:VSTRUCT_3Q 0 "register_operand")
>  	(unspec:VSTRUCT_3Q [
> -		(match_operand:VSTRUCT_3Q 1 "aarch64_simd_struct_operand")]
> +		(match_operand:VSTRUCT_3Q 1 "memory_operand")]
>  		UNSPEC_LD3))]
>    "TARGET_SIMD"
>  {
> @@ -6922,7 +6922,7 @@ (define_expand "vec_load_lanes<mode><vstruct_elt>"
>  })
>  
>  (define_insn "aarch64_simd_st3<vstruct_elt>"
> -  [(set (match_operand:VSTRUCT_3Q 0 "aarch64_simd_struct_operand" "=Utv")
> +  [(set (match_operand:VSTRUCT_3Q 0 "memory_operand" "=Utv")
>  	(unspec:VSTRUCT_3Q [(match_operand:VSTRUCT_3Q 1 "register_operand" "w")]
>                     UNSPEC_ST3))]
>    "TARGET_SIMD"
> @@ -6932,7 +6932,7 @@ (define_insn "aarch64_simd_st3<vstruct_elt>"
>  
>  ;; RTL uses GCC vector extension indices, so flip only for assembly.
>  (define_insn "aarch64_vec_store_lanes<mode>_lane<vstruct_elt>"
> -  [(set (match_operand:BLK 0 "aarch64_simd_struct_operand" "=Utv")
> +  [(set (match_operand:BLK 0 "memory_operand" "=Utv")
>  	(unspec:BLK [(match_operand:VSTRUCT_3QD 1 "register_operand" "w")
>  		     (match_operand:SI 2 "immediate_operand" "i")]
>  		     UNSPEC_ST3_LANE))]
> @@ -6946,7 +6946,7 @@ (define_insn "aarch64_vec_store_lanes<mode>_lane<vstruct_elt>"
>  )
>  
>  (define_expand "vec_store_lanes<mode><vstruct_elt>"
> -  [(set (match_operand:VSTRUCT_3Q 0 "aarch64_simd_struct_operand")
> +  [(set (match_operand:VSTRUCT_3Q 0 "memory_operand")
>  	(unspec:VSTRUCT_3Q [
>  		(match_operand:VSTRUCT_3Q 1 "register_operand")]
>                  UNSPEC_ST3))]
> @@ -6968,7 +6968,7 @@ (define_expand "vec_store_lanes<mode><vstruct_elt>"
>  (define_insn "aarch64_simd_ld4<vstruct_elt>"
>    [(set (match_operand:VSTRUCT_4Q 0 "register_operand" "=w")
>  	(unspec:VSTRUCT_4Q [
> -	  (match_operand:VSTRUCT_4Q 1 "aarch64_simd_struct_operand" "Utv")]
> +	  (match_operand:VSTRUCT_4Q 1 "memory_operand" "Utv")]
>  	  UNSPEC_LD4))]
>    "TARGET_SIMD"
>    "ld4\\t{%S0.<Vtype> - %V0.<Vtype>}, %1"
> @@ -6978,7 +6978,7 @@ (define_insn "aarch64_simd_ld4<vstruct_elt>"
>  (define_insn "aarch64_simd_ld4r<vstruct_elt>"
>    [(set (match_operand:VSTRUCT_4QD 0 "register_operand" "=w")
>  	(unspec:VSTRUCT_4QD [
> -	  (match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")]
> +	  (match_operand:BLK 1 "memory_operand" "Utv")]
>            UNSPEC_LD4_DUP))]
>    "TARGET_SIMD"
>    "ld4r\\t{%S0.<Vtype> - %V0.<Vtype>}, %1"
> @@ -6988,7 +6988,7 @@ (define_insn "aarch64_simd_ld4r<vstruct_elt>"
>  (define_insn "aarch64_vec_load_lanes<mode>_lane<vstruct_elt>"
>    [(set (match_operand:VSTRUCT_4QD 0 "register_operand" "=w")
>  	(unspec:VSTRUCT_4QD [
> -		(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")
> +		(match_operand:BLK 1 "memory_operand" "Utv")
>  		(match_operand:VSTRUCT_4QD 2 "register_operand" "0")
>  		(match_operand:SI 3 "immediate_operand" "i")]
>  		UNSPEC_LD4_LANE))]
> @@ -7004,7 +7004,7 @@ (define_insn "aarch64_vec_load_lanes<mode>_lane<vstruct_elt>"
>  (define_expand "vec_load_lanes<mode><vstruct_elt>"
>    [(set (match_operand:VSTRUCT_4Q 0 "register_operand")
>  	(unspec:VSTRUCT_4Q [
> -		(match_operand:VSTRUCT_4Q 1 "aarch64_simd_struct_operand")]
> +		(match_operand:VSTRUCT_4Q 1 "memory_operand")]
>  		UNSPEC_LD4))]
>    "TARGET_SIMD"
>  {
> @@ -7022,7 +7022,7 @@ (define_expand "vec_load_lanes<mode><vstruct_elt>"
>  })
>  
>  (define_insn "aarch64_simd_st4<vstruct_elt>"
> -  [(set (match_operand:VSTRUCT_4Q 0 "aarch64_simd_struct_operand" "=Utv")
> +  [(set (match_operand:VSTRUCT_4Q 0 "memory_operand" "=Utv")
>  	(unspec:VSTRUCT_4Q [
>  		(match_operand:VSTRUCT_4Q 1 "register_operand" "w")]
>                  UNSPEC_ST4))]
> @@ -7033,7 +7033,7 @@ (define_insn "aarch64_simd_st4<vstruct_elt>"
>  
>  ;; RTL uses GCC vector extension indices, so flip only for assembly.
>  (define_insn "aarch64_vec_store_lanes<mode>_lane<vstruct_elt>"
> -  [(set (match_operand:BLK 0 "aarch64_simd_struct_operand" "=Utv")
> +  [(set (match_operand:BLK 0 "memory_operand" "=Utv")
>  	(unspec:BLK [(match_operand:VSTRUCT_4QD 1 "register_operand" "w")
>  		     (match_operand:SI 2 "immediate_operand" "i")]
>  		     UNSPEC_ST4_LANE))]
> @@ -7047,7 +7047,7 @@ (define_insn "aarch64_vec_store_lanes<mode>_lane<vstruct_elt>"
>  )
>  
>  (define_expand "vec_store_lanes<mode><vstruct_elt>"
> -  [(set (match_operand:VSTRUCT_4Q 0 "aarch64_simd_struct_operand")
> +  [(set (match_operand:VSTRUCT_4Q 0 "memory_operand")
>  	(unspec:VSTRUCT_4Q [(match_operand:VSTRUCT_4Q 1 "register_operand")]
>                     UNSPEC_ST4))]
>    "TARGET_SIMD"
> @@ -7138,7 +7138,7 @@ (define_expand "aarch64_ld1x3<vstruct_elt>"
>  (define_insn "aarch64_ld1_x3_<vstruct_elt>"
>    [(set (match_operand:VSTRUCT_3QD 0 "register_operand" "=w")
>          (unspec:VSTRUCT_3QD
> -	  [(match_operand:VSTRUCT_3QD 1 "aarch64_simd_struct_operand" "Utv")]
> +	  [(match_operand:VSTRUCT_3QD 1 "memory_operand" "Utv")]
>  	  UNSPEC_LD1))]
>    "TARGET_SIMD"
>    "ld1\\t{%S0.<Vtype> - %U0.<Vtype>}, %1"
> @@ -7158,7 +7158,7 @@ (define_expand "aarch64_ld1x4<vstruct_elt>"
>  (define_insn "aarch64_ld1_x4_<vstruct_elt>"
>    [(set (match_operand:VSTRUCT_4QD 0 "register_operand" "=w")
>  	(unspec:VSTRUCT_4QD
> -	  [(match_operand:VSTRUCT_4QD 1 "aarch64_simd_struct_operand" "Utv")]
> +	  [(match_operand:VSTRUCT_4QD 1 "memory_operand" "Utv")]
>  	UNSPEC_LD1))]
>    "TARGET_SIMD"
>    "ld1\\t{%S0.<Vtype> - %V0.<Vtype>}, %1"
> @@ -7176,7 +7176,7 @@ (define_expand "aarch64_st1x2<vstruct_elt>"
>  })
>  
>  (define_insn "aarch64_st1_x2_<vstruct_elt>"
> -  [(set (match_operand:VSTRUCT_2QD 0 "aarch64_simd_struct_operand" "=Utv")
> +  [(set (match_operand:VSTRUCT_2QD 0 "memory_operand" "=Utv")
>  	(unspec:VSTRUCT_2QD
>  		[(match_operand:VSTRUCT_2QD 1 "register_operand" "w")]
>  		UNSPEC_ST1))]
> @@ -7196,7 +7196,7 @@ (define_expand "aarch64_st1x3<vstruct_elt>"
>  })
>  
>  (define_insn "aarch64_st1_x3_<vstruct_elt>"
> -  [(set (match_operand:VSTRUCT_3QD 0 "aarch64_simd_struct_operand" "=Utv")
> +  [(set (match_operand:VSTRUCT_3QD 0 "memory_operand" "=Utv")
>  	(unspec:VSTRUCT_3QD
>  		[(match_operand:VSTRUCT_3QD 1 "register_operand" "w")]
>  		UNSPEC_ST1))]
> @@ -7216,7 +7216,7 @@ (define_expand "aarch64_st1x4<vstruct_elt>"
>  })
>  
>  (define_insn "aarch64_st1_x4_<vstruct_elt>"
> -  [(set (match_operand:VSTRUCT_4QD 0 "aarch64_simd_struct_operand" "=Utv")
> +  [(set (match_operand:VSTRUCT_4QD 0 "memory_operand" "=Utv")
>  	(unspec:VSTRUCT_4QD
>  		[(match_operand:VSTRUCT_4QD 1 "register_operand" "w")]
>  		UNSPEC_ST1))]
> @@ -7268,7 +7268,7 @@ (define_insn "*aarch64_movv8di"
>  (define_insn "aarch64_be_ld1<mode>"
>    [(set (match_operand:VALLDI_F16 0	"register_operand" "=w")
>  	(unspec:VALLDI_F16 [(match_operand:VALLDI_F16 1
> -			     "aarch64_simd_struct_operand" "Utv")]
> +			     "memory_operand" "Utv")]
>  	UNSPEC_LD1))]
>    "TARGET_SIMD"
>    "ld1\\t{%0<Vmtype>}, %1"
> @@ -7276,7 +7276,7 @@ (define_insn "aarch64_be_ld1<mode>"
>  )
>  
>  (define_insn "aarch64_be_st1<mode>"
> -  [(set (match_operand:VALLDI_F16 0 "aarch64_simd_struct_operand" "=Utv")
> +  [(set (match_operand:VALLDI_F16 0 "memory_operand" "=Utv")
>  	(unspec:VALLDI_F16 [(match_operand:VALLDI_F16 1 "register_operand" "w")]
>  	UNSPEC_ST1))]
>    "TARGET_SIMD"
> @@ -7551,7 +7551,7 @@ (define_expand "aarch64_ld<nregs>r<vstruct_elt>"
>  (define_insn "aarch64_ld2<vstruct_elt>_dreg"
>    [(set (match_operand:VSTRUCT_2DNX 0 "register_operand" "=w")
>  	(unspec:VSTRUCT_2DNX [
> -	  (match_operand:VSTRUCT_2DNX 1 "aarch64_simd_struct_operand" "Utv")]
> +	  (match_operand:VSTRUCT_2DNX 1 "memory_operand" "Utv")]
>  	  UNSPEC_LD2_DREG))]
>    "TARGET_SIMD"
>    "ld2\\t{%S0.<Vtype> - %T0.<Vtype>}, %1"
> @@ -7561,7 +7561,7 @@ (define_insn "aarch64_ld2<vstruct_elt>_dreg"
>  (define_insn "aarch64_ld2<vstruct_elt>_dreg"
>    [(set (match_operand:VSTRUCT_2DX 0 "register_operand" "=w")
>  	(unspec:VSTRUCT_2DX [
> -	  (match_operand:VSTRUCT_2DX 1 "aarch64_simd_struct_operand" "Utv")]
> +	  (match_operand:VSTRUCT_2DX 1 "memory_operand" "Utv")]
>  	  UNSPEC_LD2_DREG))]
>    "TARGET_SIMD"
>    "ld1\\t{%S0.1d - %T0.1d}, %1"
> @@ -7571,7 +7571,7 @@ (define_insn "aarch64_ld2<vstruct_elt>_dreg"
>  (define_insn "aarch64_ld3<vstruct_elt>_dreg"
>    [(set (match_operand:VSTRUCT_3DNX 0 "register_operand" "=w")
>  	(unspec:VSTRUCT_3DNX [
> -	  (match_operand:VSTRUCT_3DNX 1 "aarch64_simd_struct_operand" "Utv")]
> +	  (match_operand:VSTRUCT_3DNX 1 "memory_operand" "Utv")]
>  	  UNSPEC_LD3_DREG))]
>    "TARGET_SIMD"
>    "ld3\\t{%S0.<Vtype> - %U0.<Vtype>}, %1"
> @@ -7581,7 +7581,7 @@ (define_insn "aarch64_ld3<vstruct_elt>_dreg"
>  (define_insn "aarch64_ld3<vstruct_elt>_dreg"
>    [(set (match_operand:VSTRUCT_3DX 0 "register_operand" "=w")
>  	(unspec:VSTRUCT_3DX [
> -	  (match_operand:VSTRUCT_3DX 1 "aarch64_simd_struct_operand" "Utv")]
> +	  (match_operand:VSTRUCT_3DX 1 "memory_operand" "Utv")]
>  	  UNSPEC_LD3_DREG))]
>    "TARGET_SIMD"
>    "ld1\\t{%S0.1d - %U0.1d}, %1"
> @@ -7591,7 +7591,7 @@ (define_insn "aarch64_ld3<vstruct_elt>_dreg"
>  (define_insn "aarch64_ld4<vstruct_elt>_dreg"
>    [(set (match_operand:VSTRUCT_4DNX 0 "register_operand" "=w")
>  	(unspec:VSTRUCT_4DNX [
> -	  (match_operand:VSTRUCT_4DNX 1 "aarch64_simd_struct_operand" "Utv")]
> +	  (match_operand:VSTRUCT_4DNX 1 "memory_operand" "Utv")]
>  	  UNSPEC_LD4_DREG))]
>    "TARGET_SIMD"
>    "ld4\\t{%S0.<Vtype> - %V0.<Vtype>}, %1"
> @@ -7601,7 +7601,7 @@ (define_insn "aarch64_ld4<vstruct_elt>_dreg"
>  (define_insn "aarch64_ld4<vstruct_elt>_dreg"
>    [(set (match_operand:VSTRUCT_4DX 0 "register_operand" "=w")
>  	(unspec:VSTRUCT_4DX [
> -	  (match_operand:VSTRUCT_4DX 1 "aarch64_simd_struct_operand" "Utv")]
> +	  (match_operand:VSTRUCT_4DX 1 "memory_operand" "Utv")]
>  	  UNSPEC_LD4_DREG))]
>    "TARGET_SIMD"
>    "ld1\\t{%S0.1d - %V0.1d}, %1"
> @@ -7841,7 +7841,7 @@ (define_insn "aarch64_rev<REVERSE:rev_op><mode>"
>  )
>  
>  (define_insn "aarch64_st2<vstruct_elt>_dreg"
> -  [(set (match_operand:VSTRUCT_2DNX 0 "aarch64_simd_struct_operand" "=Utv")
> +  [(set (match_operand:VSTRUCT_2DNX 0 "memory_operand" "=Utv")
>  	(unspec:VSTRUCT_2DNX [
>  		(match_operand:VSTRUCT_2DNX 1 "register_operand" "w")]
>  		UNSPEC_ST2))]
> @@ -7851,7 +7851,7 @@ (define_insn "aarch64_st2<vstruct_elt>_dreg"
>  )
>  
>  (define_insn "aarch64_st2<vstruct_elt>_dreg"
> -  [(set (match_operand:VSTRUCT_2DX 0 "aarch64_simd_struct_operand" "=Utv")
> +  [(set (match_operand:VSTRUCT_2DX 0 "memory_operand" "=Utv")
>  	(unspec:VSTRUCT_2DX [
>  		(match_operand:VSTRUCT_2DX 1 "register_operand" "w")]
>  		UNSPEC_ST2))]
> @@ -7861,7 +7861,7 @@ (define_insn "aarch64_st2<vstruct_elt>_dreg"
>  )
>  
>  (define_insn "aarch64_st3<vstruct_elt>_dreg"
> -  [(set (match_operand:VSTRUCT_3DNX 0 "aarch64_simd_struct_operand" "=Utv")
> +  [(set (match_operand:VSTRUCT_3DNX 0 "memory_operand" "=Utv")
>  	(unspec:VSTRUCT_3DNX [
>  		(match_operand:VSTRUCT_3DNX 1 "register_operand" "w")]
>  		UNSPEC_ST3))]
> @@ -7871,7 +7871,7 @@ (define_insn "aarch64_st3<vstruct_elt>_dreg"
>  )
>  
>  (define_insn "aarch64_st3<vstruct_elt>_dreg"
> -  [(set (match_operand:VSTRUCT_3DX 0 "aarch64_simd_struct_operand" "=Utv")
> +  [(set (match_operand:VSTRUCT_3DX 0 "memory_operand" "=Utv")
>  	(unspec:VSTRUCT_3DX [
>  		(match_operand:VSTRUCT_3DX 1 "register_operand" "w")]
>  		UNSPEC_ST3))]
> @@ -7881,7 +7881,7 @@ (define_insn "aarch64_st3<vstruct_elt>_dreg"
>  )
>  
>  (define_insn "aarch64_st4<vstruct_elt>_dreg"
> -  [(set (match_operand:VSTRUCT_4DNX 0 "aarch64_simd_struct_operand" "=Utv")
> +  [(set (match_operand:VSTRUCT_4DNX 0 "memory_operand" "=Utv")
>  	(unspec:VSTRUCT_4DNX [
>  		(match_operand:VSTRUCT_4DNX 1 "register_operand" "w")]
>  		UNSPEC_ST4))]
> @@ -7891,7 +7891,7 @@ (define_insn "aarch64_st4<vstruct_elt>_dreg"
>  )
>  
>  (define_insn "aarch64_st4<vstruct_elt>_dreg"
> -  [(set (match_operand:VSTRUCT_4DX 0 "aarch64_simd_struct_operand" "=Utv")
> +  [(set (match_operand:VSTRUCT_4DX 0 "memory_operand" "=Utv")
>  	(unspec:VSTRUCT_4DX [
>  		(match_operand:VSTRUCT_4DX 1 "register_operand" "w")]
>  		UNSPEC_ST4))]
> @@ -7974,7 +7974,7 @@ (define_expand "vec_init<mode><Vhalf>"
>  (define_insn "*aarch64_simd_ld1r<mode>"
>    [(set (match_operand:VALL_F16 0 "register_operand" "=w")
>  	(vec_duplicate:VALL_F16
> -	  (match_operand:<VEL> 1 "aarch64_simd_struct_operand" "Utv")))]
> +	  (match_operand:<VEL> 1 "memory_operand" "Utv")))]
>    "TARGET_SIMD"
>    "ld1r\\t{%0.<Vtype>}, %1"
>    [(set_attr "type" "neon_load1_all_lanes")]
> @@ -7983,7 +7983,7 @@ (define_insn "*aarch64_simd_ld1r<mode>"
>  (define_insn "aarch64_simd_ld1<vstruct_elt>_x2"
>    [(set (match_operand:VSTRUCT_2QD 0 "register_operand" "=w")
>  	(unspec:VSTRUCT_2QD [
> -	    (match_operand:VSTRUCT_2QD 1 "aarch64_simd_struct_operand" "Utv")]
> +	    (match_operand:VSTRUCT_2QD 1 "memory_operand" "Utv")]
>  	    UNSPEC_LD1))]
>    "TARGET_SIMD"
>    "ld1\\t{%S0.<Vtype> - %T0.<Vtype>}, %1"
> diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
> index c308015ac2c13d24cd6bcec71247ec45df8cf5e6..6b70a364530c8108457091bfec12fe549f722149 100644
> --- a/gcc/config/aarch64/predicates.md
> +++ b/gcc/config/aarch64/predicates.md
> @@ -494,10 +494,6 @@ (define_predicate "aarch64_simd_reg_or_minus_one"
>    (ior (match_operand 0 "register_operand")
>         (match_operand 0 "aarch64_simd_imm_minus_one")))
>  
> -(define_predicate "aarch64_simd_struct_operand"
> -  (and (match_code "mem")
> -       (match_test "TARGET_SIMD && aarch64_simd_mem_operand_p (op)")))
> -
>  ;; Like general_operand but allow only valid SIMD addressing modes.
>  (define_predicate "aarch64_simd_general_operand"
>    (and (match_operand 0 "general_operand")
> diff --git a/gcc/testsuite/gcc.target/aarch64/vld1r.c b/gcc/testsuite/gcc.target/aarch64/vld1r.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..72c505c403e9e239771379b7cadd8a9473f06113
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/vld1r.c
> @@ -0,0 +1,26 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O" } */
> +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */
> +
> +#include <arm_neon.h>
> +
> +/*
> +** f1:
> +** 	add	x0, x0, 1
> +** 	ld1r	{v0.8b}, \[x0\]
> +** 	ret
> +*/
> +uint8x8_t f1(const uint8_t *in) {
> +    return vld1_dup_u8(&in[1]);
> +}
> +
> +/*
> +** f2:
> +** 	ldr	s1, \[x0, 4\]
> +** 	fmla	v0.4s, v0.4s, v1.s\[0\]
> +** 	ret
> +*/
> +float32x4_t f2(const float32_t *in, float32x4_t a) {
> +    float32x4_t dup = vld1q_dup_f32(&in[1]);
> +    return vfmaq_laneq_f32 (a, a, dup, 1);
> +}

next prev parent reply	other threads:[~2022-06-08 10:31 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-08  9:38 Tamar Christina
2022-06-08 10:31 ` Richard Sandiford [this message]
2022-06-08 13:51   ` Tamar Christina
2022-06-08 14:35     ` Richard Sandiford
2022-06-09  7:42       ` Tamar Christina
2022-06-09  8:22         ` Richard Sandiford
2022-06-09  8:43           ` Tamar Christina
2022-06-13  8:00       ` Richard Biener
2022-06-13  8:26         ` Richard Sandiford
2022-06-13  8:38           ` Richard Biener
2022-06-13  9:51             ` Tamar Christina
2022-06-13 11:50               ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mptbkv3igzp.fsf@arm.com \
    --to=richard.sandiford@arm.com \
    --cc=Kyrylo.Tkachov@arm.com \
    --cc=Marcus.Shawcroft@arm.com \
    --cc=Richard.Earnshaw@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=nd@arm.com \
    --cc=tamar.christina@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).